@@ -68,33 +68,36 @@ full covariance.
68
68
* See :ref: `sphx_glr_auto_examples_mixture_plot_gmm_pdf.py ` for an example on plotting the
69
69
density estimation.
70
70
71
- Pros and cons of class :class: `GaussianMixture `
72
- -----------------------------------------------
71
+ |details-start |
72
+ **Pros and cons of class GaussianMixture **
73
+ |details-split |
73
74
74
- Pros
75
- ....
75
+ .. topic :: Pros:
76
76
77
- :Speed: It is the fastest algorithm for learning mixture models
77
+ :Speed: It is the fastest algorithm for learning mixture models
78
78
79
- :Agnostic: As this algorithm maximizes only the likelihood, it
80
- will not bias the means towards zero, or bias the cluster sizes to
81
- have specific structures that might or might not apply.
79
+ :Agnostic: As this algorithm maximizes only the likelihood, it
80
+ will not bias the means towards zero, or bias the cluster sizes to
81
+ have specific structures that might or might not apply.
82
82
83
- Cons
84
- ....
83
+ .. topic :: Cons:
85
84
86
- :Singularities: When one has insufficiently many points per
87
- mixture, estimating the covariance matrices becomes difficult,
88
- and the algorithm is known to diverge and find solutions with
89
- infinite likelihood unless one regularizes the covariances artificially.
85
+ :Singularities: When one has insufficiently many points per
86
+ mixture, estimating the covariance matrices becomes difficult,
87
+ and the algorithm is known to diverge and find solutions with
88
+ infinite likelihood unless one regularizes the covariances artificially.
90
89
91
- :Number of components: This algorithm will always use all the
92
- components it has access to, needing held-out data
93
- or information theoretical criteria to decide how many components to use
94
- in the absence of external cues.
90
+ :Number of components: This algorithm will always use all the
91
+ components it has access to, needing held-out data
92
+ or information theoretical criteria to decide how many components to use
93
+ in the absence of external cues.
95
94
96
- Selecting the number of components in a classical Gaussian Mixture Model
97
- ------------------------------------------------------------------------
95
+ |details-end |
96
+
97
+
98
+ |details-start |
99
+ **Selecting the number of components in a classical Gaussian Mixture model **
100
+ |details-split |
98
101
99
102
The BIC criterion can be used to select the number of components in a Gaussian
100
103
Mixture in an efficient way. In theory, it recovers the true number of
@@ -114,10 +117,13 @@ model.
114
117
* See :ref: `sphx_glr_auto_examples_mixture_plot_gmm_selection.py ` for an example
115
118
of model selection performed with classical Gaussian mixture.
116
119
120
+ |details-end |
121
+
117
122
.. _expectation_maximization :
118
123
119
- Estimation algorithm Expectation-maximization
120
- -----------------------------------------------
124
+ |details-start |
125
+ **Estimation algorithm expectation-maximization **
126
+ |details-split |
121
127
122
128
The main difficulty in learning Gaussian mixture models from unlabeled
123
129
data is that one usually doesn't know which points came from
@@ -135,8 +141,11 @@ parameters to maximize the likelihood of the data given those
135
141
assignments. Repeating this process is guaranteed to always converge
136
142
to a local optimum.
137
143
138
- Choice of the Initialization Method
139
- -----------------------------------
144
+ |details-end |
145
+
146
+ |details-start |
147
+ **Choice of the Initialization method **
148
+ |details-split |
140
149
141
150
There is a choice of four initialization methods (as well as inputting user defined
142
151
initial means) to generate the initial centers for the model components:
@@ -172,6 +181,8 @@ random
172
181
* See :ref: `sphx_glr_auto_examples_mixture_plot_gmm_init.py ` for an example of
173
182
using different initializations in Gaussian Mixture.
174
183
184
+ |details-end |
185
+
175
186
.. _bgmm :
176
187
177
188
Variational Bayesian Gaussian Mixture
@@ -183,8 +194,7 @@ similar to the one defined by :class:`GaussianMixture`.
183
194
184
195
.. _variational_inference :
185
196
186
- Estimation algorithm: variational inference
187
- ---------------------------------------------
197
+ **Estimation algorithm: variational inference **
188
198
189
199
Variational inference is an extension of expectation-maximization that
190
200
maximizes a lower bound on model evidence (including
@@ -282,48 +292,47 @@ from the two resulting mixtures.
282
292
``weight_concentration_prior_type `` for different values of the parameter
283
293
``weight_concentration_prior ``.
284
294
295
+ |details-start |
296
+ **Pros and cons of variational inference with BayesianGaussianMixture **
297
+ |details-split |
285
298
286
- Pros and cons of variational inference with :class: `BayesianGaussianMixture `
287
- ----------------------------------------------------------------------------
288
-
289
- Pros
290
- .....
299
+ .. topic :: Pros:
291
300
292
- :Automatic selection: when ``weight_concentration_prior `` is small enough and
293
- ``n_components `` is larger than what is found necessary by the model, the
294
- Variational Bayesian mixture model has a natural tendency to set some mixture
295
- weights values close to zero. This makes it possible to let the model choose
296
- a suitable number of effective components automatically. Only an upper bound
297
- of this number needs to be provided. Note however that the "ideal" number of
298
- active components is very application specific and is typically ill-defined
299
- in a data exploration setting.
301
+ :Automatic selection: when ``weight_concentration_prior `` is small enough and
302
+ ``n_components `` is larger than what is found necessary by the model, the
303
+ Variational Bayesian mixture model has a natural tendency to set some mixture
304
+ weights values close to zero. This makes it possible to let the model choose
305
+ a suitable number of effective components automatically. Only an upper bound
306
+ of this number needs to be provided. Note however that the "ideal" number of
307
+ active components is very application specific and is typically ill-defined
308
+ in a data exploration setting.
300
309
301
- :Less sensitivity to the number of parameters: unlike finite models, which will
302
- almost always use all components as much as they can, and hence will produce
303
- wildly different solutions for different numbers of components, the
304
- variational inference with a Dirichlet process prior
305
- (``weight_concentration_prior_type='dirichlet_process' ``) won't change much
306
- with changes to the parameters, leading to more stability and less tuning.
310
+ :Less sensitivity to the number of parameters: unlike finite models, which will
311
+ almost always use all components as much as they can, and hence will produce
312
+ wildly different solutions for different numbers of components, the
313
+ variational inference with a Dirichlet process prior
314
+ (``weight_concentration_prior_type='dirichlet_process' ``) won't change much
315
+ with changes to the parameters, leading to more stability and less tuning.
307
316
308
- :Regularization: due to the incorporation of prior information,
309
- variational solutions have less pathological special cases than
310
- expectation-maximization solutions.
317
+ :Regularization: due to the incorporation of prior information,
318
+ variational solutions have less pathological special cases than
319
+ expectation-maximization solutions.
311
320
312
321
313
- Cons
314
- .....
322
+ .. topic :: Cons:
315
323
316
- :Speed: the extra parametrization necessary for variational inference makes
317
- inference slower, although not by much.
324
+ :Speed: the extra parametrization necessary for variational inference makes
325
+ inference slower, although not by much.
318
326
319
- :Hyperparameters: this algorithm needs an extra hyperparameter
320
- that might need experimental tuning via cross-validation.
327
+ :Hyperparameters: this algorithm needs an extra hyperparameter
328
+ that might need experimental tuning via cross-validation.
321
329
322
- :Bias: there are many implicit biases in the inference algorithms (and also in
323
- the Dirichlet process if used), and whenever there is a mismatch between
324
- these biases and the data it might be possible to fit better models using a
325
- finite mixture.
330
+ :Bias: there are many implicit biases in the inference algorithms (and also in
331
+ the Dirichlet process if used), and whenever there is a mismatch between
332
+ these biases and the data it might be possible to fit better models using a
333
+ finite mixture.
326
334
335
+ |details-end |
327
336
328
337
.. _dirichlet_process :
329
338
0 commit comments