Add more info to README.md

adonath · adonath · commit 8881934077dc · 2024-12-18T09:36:18.000-05:00
diff --git a/README.md b/README.md
@@ -17,6 +17,8 @@ A minimal implementation of Gaussian Mixture Models in Jax
 
 ## Installation
 
+`gmmx` can be installed via pip:
+
 ```bash
 pip install gmmx
 ```
@@ -29,26 +31,44 @@ from gmmx import GaussianMixtureModelJax, EMFitter
 # Create a Gaussian Mixture Model with 16 components and 32 features
 gmm = GaussianMixtureModelJax.create(n_components=16, n_features=32)
 
+# Draw samples from the model
 n_samples = 10_000
 x = gmm.sample(n_samples)
 
 # Fit the model to the data
-em_fitter = EMFitter()
-gmm_fitted = em_fitter.fit(gmm, x)
+em_fitter = EMFitter(tol=1e-3, max_iter=100)
+gmm_fitted = em_fitter.fit(x=x, gmm=gmm)
 ```
 
+## Why Gaussian Mixture models?
+
+What are Gaussian Mixture Models (GMM) useful for in the age of deep learning? GMMs might have come out of fashion for classification tasks, but they still
+have a few properties that make them useful in certain scenarios:
+
+- They are universal approximators, meaning that given enough components they can approximate any distribution.
+- Their likelihood can be evaluated in closed form, which makes them useful for generative modeling.
+- They are rather fast to train and evaluate.
+
+One of these applications is in the context of image reconstruction, where GMMs can be used to model the distribution and pixel correlations of local (patch based)
+image features. This can be useful for tasks like image denoising or inpainting. One of these methods I have used them for is [Jolideco](https://github.com/jolideco/jolideco).
+Speed up the training of O(10^6) patches was the main motivation for `gmmx`.
+
 ## Benchmarks
 
 Here are some results from the benchmarks in the `benchmarks` folder comparing against Scikit-Learn. The benchmarks were run on a 2021 MacBook Pro with an M1 Pro chip.
 
-### Prediction Time
+### Prediction
 
 | Time vs. Number of Components                                                   | Time vs. Number of Samples                                                | Time vs. Number of Features                                                 |
 | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------- | --------------------------------------------------------------------------- |
 | ![Time vs. Number of Components](docs/_static/time-vs-n-components-predict.png) | ![Time vs. Number of Samples](docs/_static/time-vs-n-samples-predict.png) | ![Time vs. Number of Features](docs/_static/time-vs-n-features-predict.png) |
 
+For prediction the speedup is around 2x for varying number of components and features. For the number of samples the cross-over point is around O(10^4) samples.
+
 ### Training Time
 
 | Time vs. Number of Components                                               | Time vs. Number of Samples                                            | Time vs. Number of Features                                             |
 | --------------------------------------------------------------------------- | --------------------------------------------------------------------- | ----------------------------------------------------------------------- |
 | ![Time vs. Number of Components](docs/_static/time-vs-n-components-fit.png) | ![Time vs. Number of Samples](docs/_static/time-vs-n-samples-fit.png) | ![Time vs. Number of Features](docs/_static/time-vs-n-features-fit.png) |
+
+For training the speedup is around 10x on the same architecture. However there is no guarantee that it will converge to the same solution as Scikit-Learn. But there are some tests in the `tests` folder that compare the results of the two implementations.