From a68060463f4cb4b4468c092e58cd9e47acebc14a Mon Sep 17 00:00:00 2001 From: stephantul Date: Wed, 30 Apr 2025 19:03:02 +0200 Subject: [PATCH 1/2] docs --- README.md | 2 ++ docs/usage.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+) diff --git a/README.md b/README.md index ff643b89..09b74bee 100644 --- a/README.md +++ b/README.md @@ -123,6 +123,8 @@ For advanced usage, please refer to our [usage documentation](https://github.com ## Updates & Announcements +- **01/05/2024**: We released backend support for `BPE` and `Unigram` tokenizers, along with quantization and dimensionality reduction. New Model2Vec models are now 50% of the original models, and can be quantized to int8 to be 25% of the size, without loss of performance. + - **12/02/2024**: We released **Model2Vec training**, allowing you to fine-tune your own classification models on top of Model2Vec models. Find out more in our [training documentation](https://github.com/MinishLab/model2vec/blob/main/model2vec/train/README.md) and [results](results/README.md#training-results). - **30/01/2024**: We released two new models: [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) and [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M). [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) is our most performant model to date, using a larger vocabulary and higher dimensions. [potion-retrieval-32M](https://huggingface.co/minishlab/potion-retrieval-32M) is a finetune of [potion-base-32M](https://huggingface.co/minishlab/potion-base-32M) that is optimized for retrieval tasks, and is the best performing static retrieval model currently available. diff --git a/docs/usage.md b/docs/usage.md index b2b9b214..7d6e7263 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -126,6 +126,54 @@ m2v_model = distill(model_name=model_name, vocabulary=vocabulary, use_subword=Fa **Important note:** we assume the passed vocabulary is sorted in rank frequency. i.e., we don't care about the actual word frequencies, but do assume that the most frequent word is first, and the least frequent word is last. If you're not sure whether this is case, set `apply_zipf` to `False`. This disables the weighting, but will also make performance a little bit worse. +### Quantization + +Models can be quantized to `float16` (default) or `int8` during distillation, or when loading from disk. + +```python +from model2vec.distill import distill + +# Distill a Sentence Transformer model and quantize is to int8 +m2v_model = distill(model_name="BAAI/bge-base-en-v1.5", quantize_to="int8") + +# Save the model. This model is now 25% of the size of a normal model. +m2v_model.save_pretrained("m2v_model") +``` + +You can also quantize during loading. + +```python +from model2vec import StaticModel + +model = StaticModel.from_pretrained("minishlab/potion-base-8m", quantize_to="int8") +``` + +### Dimensionality reduction + +Because almost all Model2Vec models have been distilled using PCA, and because PCA explicitly orders dimensions from most informative to least informative, we can perform dimensionality reduction during loading. This is very similar to how matryoshka embeddings work. + +```python +from model2vec import StaticModel + +model = StaticModel.from_pretrained("minishlab/potion-base-8m", dimensionality=32) + +print(model.embedding.shape) + +``` + +### Combining quantization and dimensionality reduction + +Combining these tricks can lead to extremely small models. For example, using this, we can reduce the size of `potion-base-8m`, which is now 30MB, to only 1MB: + +```python +model = StaticModel.from_pretrained("minishlab/potion-base-8m", + dimensionality=32, + quantize_to="int8") +print(model.embedding.nbytes) +# 944896 bytes = 944kb +``` + +This should be enough to satisfy even the strongest hardware constraints. ## Training From 99d7ca29ae896a9a716ae69b9d349811ba8667ef Mon Sep 17 00:00:00 2001 From: stephantul Date: Wed, 30 Apr 2025 19:22:21 +0200 Subject: [PATCH 2/2] add dim --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 7d6e7263..931987b6 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -158,7 +158,7 @@ from model2vec import StaticModel model = StaticModel.from_pretrained("minishlab/potion-base-8m", dimensionality=32) print(model.embedding.shape) - +# (29528, 32) ``` ### Combining quantization and dimensionality reduction