[FAKE] GMM IC PR for comment #43

bdpedigo · 2023-05-30T14:32:34Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

bdpedigo · 2023-05-30T14:34:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    Different combinations of initialization, GMM,
+    and cluster numbers are used and the clustering
+    with the best selection criterion (BIC or AIC) is chosen.


suggest making this match LassoLarsIC a bit closer, eg "Such criteria are useful to select the value of the regularization parameter by making a trade-off between the goodness of fit and the complexity of the model." could basically replace "regularization parameter" with "gaussian mixture parameters"

bdpedigo · 2023-05-30T14:35:13Z

sklearn/mixture/_gaussian_mixture_ic.py

+    n_init : int, optional (default = 1)
+        If ``n_init`` is larger than 1, additional
+        ``n_init``-1 runs of :class:`sklearn.mixture.GaussianMixture`
+        initialized with k-means will be performed


not necessarily initialized with k-means, right?

bdpedigo · 2023-05-30T14:35:41Z

sklearn/mixture/_gaussian_mixture_ic.py

+        initialized with k-means will be performed
+        for all covariance parameters in ``covariance_type``.
+
+    init_params : {‘kmeans’ (default), ‘k-means++’, ‘random’, ‘random_from_data’}


perhaps worth explaining the options, mainly i dont know what random_from_data is from this description

also, is kmeans ++ not the default? if not, why not? i think it is in sklearn if i remember correctly

yeah, not sure, apparently kmeans is the default in GaussianMixture

bdpedigo · 2023-05-30T14:37:30Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+    Attributes
+    ----------
+    best_criterion_ : float


lasso lars IC calls this "criterion_"

bdpedigo · 2023-05-30T14:38:59Z

sklearn/mixture/_gaussian_mixture_ic.py

+    covariance_type_ : str
+        Covariance type for the model with the best bic/aic.
+
+    best_model_ : :class:`sklearn.mixture.GaussianMixture`


in lassolarsIC, there is no "sub-object" with the best model; rather the whole class just operates as if it is that model. does that make sense? while i cant speak for them, my guess is this is closer to what they'd be expecting

I add the attributes like weights_, means_ from GaussianMixture into GaussianMixtureIC, but I found that I still need to save the best model (I call best_estimator_ in the newest version) in order to all predict. Did I understand you correctly?

bdpedigo · 2023-05-30T14:39:35Z

sklearn/mixture/_gaussian_mixture_ic.py

+    best_model_ : :class:`sklearn.mixture.GaussianMixture`
+        Object with the best bic/aic.
+
+    labels_ : array-like, shape (n_samples,)


not a property of GaussianMixture, recommend not storing

bdpedigo · 2023-05-30T14:40:50Z

sklearn/mixture/_gaussian_mixture_ic.py

+        self.criterion = criterion
+        self.n_jobs = n_jobs
+
+    def _check_multi_comp_inputs(self, input, name, default):


i usually make any methods that dont access self into functions

bdpedigo · 2023-05-30T14:41:55Z

sklearn/mixture/_gaussian_mixture_ic.py

+            name="min_components",
+            target_type=int,
+        )
+        check_scalar(


min value could be "min_components"?

bdpedigo · 2023-05-30T14:42:54Z

sklearn/mixture/_gaussian_mixture_ic.py

+        else:
+            criterion_value = model.aic(X)
+
+        # change the precision of "criterion_value" based on sample size


could you explain this?

bdpedigo · 2023-05-30T14:45:46Z

sklearn/mixture/_gaussian_mixture_ic.py

+        )
+        best_criter = [result.criterion for result in results]
+
+        if sum(best_criter == np.min(best_criter)) == 1:


this all seems fine but just a suggestion - https://numpy.org/doc/stable/reference/generated/numpy.argmin.html
docs imply that for ties, argmin gives the first. so in other words if results are sorted in order of complexity, just using argmin would do what you want. (can even leave a comment to this effect, if you go this route).

note that i think having the results sorted by complexity anyway is probably desireable?

bdpedigo · 2023-05-30T14:47:34Z

sklearn/mixture/_gaussian_mixture_ic.py

+
+
+
+class _CollectResults:


this is effectively a dictionary - recommend just using one, or a named tuple? i am just anti classes that only store data and dont have any methods, but that is just my style :)

bdpedigo · 2023-05-30T14:51:45Z

sklearn/mixture/_gaussian_mixture_ic.py

+        param_grid = dict(
+            covariance_type=covariance_type,
+            n_components=range(self.min_components, self.max_components + 1),
+        )
+        param_grid = list(ParameterGrid(param_grid))
+
+        seeds = random_state.randint(np.iinfo(np.int32).max, size=len(param_grid))
+
+        if parse_version(joblib.__version__) < parse_version("0.12"):
+            parallel_kwargs = {"backend": "threading"}
+        else:
+            parallel_kwargs = {"prefer": "threads"}
+
+        results = Parallel(n_jobs=self.n_jobs, verbose=self.verbose, **parallel_kwargs)(
+            delayed(self._fit_cluster)(X, gm_params, seed)
+            for gm_params, seed in zip(param_grid, seeds)
+        )
+        best_criter = [result.criterion for result in results]


why not just use GridSearchCV as in their example? https://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_selection.html#sphx-glr-auto-examples-mixture-plot-gmm-selection-py

it would abstract away some of the stuff you have to do to make parallel work, for instance

github-actions · 2023-06-21T16:51:49Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/mixture/_gaussian_mixture_ic.py:1:1: CPY001 Missing copyright notice at top of file
sklearn/mixture/_gaussian_mixture_ic.py:10:1: TID252 Prefer absolute imports over relative imports
   |
 8 | import numpy as np
 9 |
10 | from ..base import BaseEstimator, ClusterMixin
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TID252
11 | from ..model_selection import GridSearchCV
12 | from ..utils._param_validation import (
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:10:1: TID252 Prefer absolute imports over relative imports
   |
 8 | import numpy as np
 9 |
10 | from ..base import BaseEstimator, ClusterMixin
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TID252
11 | from ..model_selection import GridSearchCV
12 | from ..utils._param_validation import (
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:11:1: TID252 Prefer absolute imports over relative imports
   |
10 | from ..base import BaseEstimator, ClusterMixin
11 | from ..model_selection import GridSearchCV
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TID252
12 | from ..utils._param_validation import (
13 |     Integral,
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:12:1: TID252 Prefer absolute imports over relative imports
   |
10 |   from ..base import BaseEstimator, ClusterMixin
11 |   from ..model_selection import GridSearchCV
12 | / from ..utils._param_validation import (
13 | |     Integral,
14 | |     Interval,
15 | |     InvalidParameterError,
16 | |     StrOptions,
17 | | )
   | |_^ TID252
18 |   from ..utils.validation import check_is_fitted
19 |   from . import GaussianMixture
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:12:1: TID252 Prefer absolute imports over relative imports
   |
10 |   from ..base import BaseEstimator, ClusterMixin
11 |   from ..model_selection import GridSearchCV
12 | / from ..utils._param_validation import (
13 | |     Integral,
14 | |     Interval,
15 | |     InvalidParameterError,
16 | |     StrOptions,
17 | | )
   | |_^ TID252
18 |   from ..utils.validation import check_is_fitted
19 |   from . import GaussianMixture
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:12:1: TID252 Prefer absolute imports over relative imports
   |
10 |   from ..base import BaseEstimator, ClusterMixin
11 |   from ..model_selection import GridSearchCV
12 | / from ..utils._param_validation import (
13 | |     Integral,
14 | |     Interval,
15 | |     InvalidParameterError,
16 | |     StrOptions,
17 | | )
   | |_^ TID252
18 |   from ..utils.validation import check_is_fitted
19 |   from . import GaussianMixture
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:12:1: TID252 Prefer absolute imports over relative imports
   |
10 |   from ..base import BaseEstimator, ClusterMixin
11 |   from ..model_selection import GridSearchCV
12 | / from ..utils._param_validation import (
13 | |     Integral,
14 | |     Interval,
15 | |     InvalidParameterError,
16 | |     StrOptions,
17 | | )
   | |_^ TID252
18 |   from ..utils.validation import check_is_fitted
19 |   from . import GaussianMixture
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:18:1: TID252 Prefer absolute imports over relative imports
   |
16 |     StrOptions,
17 | )
18 | from ..utils.validation import check_is_fitted
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TID252
19 | from . import GaussianMixture
   |
   = help: Replace relative imports with absolute imports

sklearn/mixture/_gaussian_mixture_ic.py:19:1: TID252 Prefer absolute imports over relative imports
   |
17 | )
18 | from ..utils.validation import check_is_fitted
19 | from . import GaussianMixture
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TID252
   |
   = help: Replace relative imports with absolute imports

Found 10 errors.
No fixes available (9 hidden fixes can be enabled with the `--unsafe-fixes` option).

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/mixture/_gaussian_mixture_ic.py
+++ sklearn/mixture/_gaussian_mixture_ic.py
@@ -4,7 +4,6 @@
 #          Thomas Athey <tathey1@jhmi.edu>
 #          Benjamin Pedigo <bpedigo@jhu.edu>
 
-
 import numpy as np
 
 from ..base import BaseEstimator, ClusterMixin

1 file would be reformatted, 927 files already formatted

_{Generated for commit: a08d428. Link to the linter CI: here}

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

… loop (scikit-learn#31956)

…learn#32019) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Virgil Chan <virchan.math@gmail.com>

… solver (scikit-learn#32039) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…pped (scikit-learn#32017)

…n#32038)

…earn#31898) Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>

…32063) Co-authored-by: Lock file bot <noreply@github.com>

…32065) Co-authored-by: Lock file bot <noreply@github.com>

…arn#32064) Co-authored-by: Lock file bot <noreply@github.com>

Co-authored-by: Lock file bot <noreply@github.com>

…cikit-learn#31951) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

…32040)

…n#32023)

…ely Skewed Data (scikit-learn#29307) Co-authored-by: rnmourao <robertonunesmourao@yahoo.com.br>

…2035) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

…cikit-learn#32059) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…31342)

…display (scikit-learn#31564) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

…#32082)

add basic gmIC

45a75c1

bdpedigo commented May 30, 2023

View reviewed changes

update code

87cc2e8

tingshanL and others added 23 commits June 29, 2023 18:49

Merge branch 'main' into gmIC

1a731e7

Merge branch 'scikit-learn:main' into gmIC

91d8b3e

fix linting

e3b98f7

fix linting

a37949e

fix tests

a6ee201

Update _gaussian_mixture_ic.py

ebb86fe

Merge branch 'main' into gmIC

7b27ff2

Merge branch 'main' into gmIC

6b92c5a

Merge branch 'scikit-learn:main' into gmIC

a2d20f0

Merge branch 'main' into gmIC

1c86814

fix docstring

4c962d5

fix attributes

c6074f3

fix attribute typo

e9044d3

fix docstring example mismatch

9eda4d2

update docstring example

1106558

Merge branch 'main' into gmIC

b2a3234

Merge branch 'main' into gmIC

fc2b97d

fix clustering mismatch

f7c8773

Update v1.6.rst

b31fc57

fix linting

24fc234

fix linting

67378b0

fix docstring

a3e0966

Update _parameter_constraints

a393fa3

StefanieSenger and others added 30 commits August 28, 2025 18:24

CI add codecov to GitHub Action workflow (scikit-learn#31941)

5736956

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

ENH speedup coordinate descent by avoiding calls to axpy in innermost…

00acd12

… loop (scikit-learn#31956)

MNT np.nan_to_num -> xpx.nan_to_num (scikit-learn#32033)

ef4885f

DOC Add TargetEncoder to Categorical Feature Support example (scikit-…

2bcfd2e

…learn#32019) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Virgil Chan <virchan.math@gmail.com>

MNT fix typo and internal documentation in LinearModelLoss and Newton…

0eba4d4

… solver (scikit-learn#32039) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC Build website with a Scikit-learn logo that is complete - not cro…

2e4e40b

…pped (scikit-learn#32017)

MNT Add changelog README and PR checklist to PR template (scikit-lear…

98f9eec

…n#32038)

DOC Use un-cropped image for thumbnails (scikit-learn#32037)

1a783c9

CI Use pytest-xdist in debian 32 build (scikit-learn#32031)

59c4b7a

MNT remove PA_C from SGD and (re-) use eta0 (scikit-learn#31932)

b5c5130

FIX make sure _PassthroughScorer works with meta-estimators (scikit-l…

285883c

…earn#31898) Co-authored-by: Stefanie Senger <91849487+StefanieSenger@users.noreply.github.com>

🔒 🤖 CI Update lock files for scipy-dev CI build(s) 🔒 🤖 (scikit-learn#…

db3e21b

…32063) Co-authored-by: Lock file bot <noreply@github.com>

🔒 🤖 CI Update lock files for array-api CI build(s) 🔒 🤖 (scikit-learn#…

b7b8dd7

…32065) Co-authored-by: Lock file bot <noreply@github.com>

🔒 🤖 CI Update lock files for free-threaded CI build(s) 🔒 🤖 (scikit-le…

de0e21e

…arn#32064) Co-authored-by: Lock file bot <noreply@github.com>

🔒 🤖 CI Update lock files for main CI build(s) 🔒 🤖 (scikit-learn#32066)

6d233b9

Co-authored-by: Lock file bot <noreply@github.com>

TST Add option to use strict xfail mode in parametrize_with_checks (s…

6c86237

…cikit-learn#31951) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

MAINT remove useless np.abs in test (scikit-learn#32069)

8a12e07

MNT Improve metadata routing warning message (scikit-learn#32070)

f2d793b

CI Revert Python 3.13.7 work arounds in wheels (scikit-learn#32068)

0c984ae

MNT Remove xfail now that array-api-strict >2.3.1 (scikit-learn#32052)

42b6fc8

MNT remove the steps attribute from _BaseComposition (scikit-learn#…

e3b383a

…32040)

CI Run free-threaded test suite with pytest-run-parallel (scikit-lear…

ed0a98a

…n#32023)

MRG Add Warning for NaNs in Yeo-Johnson Inverse Transform with Extrem…

96f48da

…ely Skewed Data (scikit-learn#29307) Co-authored-by: rnmourao <robertonunesmourao@yahoo.com.br>

TST fix platform sensitive test: test_float_precision (scikit-learn#3…

c7866e6

…2035) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

CI Add Python 3.14 free-threaded wheels (scikit-learn#32079)

b138521

DOC improve docstring of LogisticRegression and LogisticRegressionCV (s…

30b98cd

…cikit-learn#32059) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

MNT Mark cython extensions as free-threaded compatible (scikit-learn#…

90338a4

…31342)

ENH Add a link + tooltip to each parameter docstring in params table …

3edc4d6

…display (scikit-learn#31564) Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai> Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

DOC review comments for LogisticRegressionCV docstrings (scikit-learn…

835355a

…#32082)

Merge branch 'main' into gmIC

a08d428

[FAKE] GMM IC PR for comment #43

Are you sure you want to change the base?

[FAKE] GMM IC PR for comment #43

Uh oh!

Conversation

bdpedigo commented May 30, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Linting issues

ruff check

ruff format

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

59 participants

github-actions bot commented Jun 21, 2023 •

edited

Loading

`ruff check`

`ruff format`