@@ -67,10 +67,10 @@ with absolute positions in `text-embeddings-inference`.
67
67
68
68
Examples of supported models:
69
69
70
- | MTEB Rank | Model Type | Example Model ID |
70
+ | MTEB Rank | Model Type | Model ID |
71
71
| -----------| -------------| --------------------------------------------------------------------------------------------------|
72
72
| 6 | Bert | [ WhereIsAI/UAE-Large-V1] ( https://hf.co/WhereIsAI/UAE-Large-V1 ) |
73
- | 1O | XLM-RoBERTa | [ intfloat/multilingual-e5-large-instruct] ( https://hf.co/intfloat/multilingual-e5-large-instruct ) |
73
+ | 10 | XLM-RoBERTa | [ intfloat/multilingual-e5-large-instruct] ( https://hf.co/intfloat/multilingual-e5-large-instruct ) |
74
74
| N/A | NomicBert | [ nomic-ai/nomic-embed-text-v1] ( https://hf.co/nomic-ai/nomic-embed-text-v1 ) |
75
75
| N/A | NomicBert | [ nomic-ai/nomic-embed-text-v1.5] ( https://hf.co/nomic-ai/nomic-embed-text-v1.5 ) |
76
76
| N/A | JinaBERT | [ jinaai/jina-embeddings-v2-base-en] ( https://hf.co/jinaai/jina-embeddings-v2-base-en ) |
@@ -80,7 +80,7 @@ models [here](https://huggingface.co/spaces/mteb/leaderboard).
80
80
81
81
#### Sequence Classification and Re-Ranking
82
82
83
- ` text-embeddings-inference ` v0.4.0 added support for CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
83
+ ` text-embeddings-inference ` v0.4.0 added support for Bert, CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
84
84
85
85
Example of supported sequence classification models:
86
86
@@ -97,7 +97,7 @@ model=BAAI/bge-large-en-v1.5
97
97
revision=refs/pr/5
98
98
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
99
99
100
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
100
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
101
101
```
102
102
103
103
And then you can make requests like
@@ -242,13 +242,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
242
242
243
243
| Architecture | Image |
244
244
| -------------------------------------| -------------------------------------------------------------------------|
245
- | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .6 |
245
+ | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-1 .0 |
246
246
| Volta | NOT SUPPORTED |
247
- | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .6 (experimental) |
248
- | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.6 |
249
- | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.6 |
250
- | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.6 |
251
- | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .6 (experimental) |
247
+ | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-1 .0 (experimental) |
248
+ | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
249
+ | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
250
+ | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
251
+ | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-1 .0 (experimental) |
252
252
253
253
** Warning** : Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
254
254
You can turn Flash Attention v1 ON by using the ` USE_FLASH_ATTENTION=True ` environment variable.
@@ -277,7 +277,7 @@ model=<your private model>
277
277
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
278
278
token=< your cli READ token>
279
279
280
- docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
280
+ docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
281
281
```
282
282
283
283
### Using Re-rankers models
@@ -295,7 +295,7 @@ model=BAAI/bge-reranker-large
295
295
revision=refs/pr/4
296
296
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
297
297
298
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
298
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
299
299
```
300
300
301
301
And then you can rank the similarity between a query and a list of texts with:
@@ -315,7 +315,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
315
315
model=SamLowe/roberta-base-go_emotions
316
316
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
317
317
318
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
318
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
319
319
```
320
320
321
321
Once you have deployed the model you can use the ` predict ` endpoint to get the emotions most associated with an input:
@@ -344,7 +344,7 @@ model=BAAI/bge-large-en-v1.5
344
344
revision=refs/pr/5
345
345
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
346
346
347
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 -grpc --model-id $model --revision $revision
347
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 -grpc --model-id $model --revision $revision
348
348
```
349
349
350
350
``` shell
0 commit comments