@@ -53,23 +53,26 @@ such as:
53
53
54
54
### Supported Models
55
55
56
- You can use any BERT, CamemBERT or XLM-RoBERTa model with absolute positions in ` text-embeddings-inference ` .
56
+ You can use any JinaBERT model with Alibi or absolute positions or any BERT, CamemBERT or XLM-RoBERTa model with
57
+ absolute positions in ` text-embeddings-inference ` .
57
58
58
59
** Support for other model types will be added in the future.**
59
60
60
61
Examples of supported models:
61
62
62
- | MTEB Rank | Model Type | Model ID |
63
- | -----------| --------------| --------------------------------------------------------------------------------|
64
- | 1 | Bert | [ BAAI/bge-large-en-v1.5] ( https://hf.co/BAAI/bge-large-en-v1.5 ) |
65
- | 2 | | [ BAAI/bge-base-en-v1.5] ( https://hf.co/BAAI/bge-base-en-v1.5 ) |
66
- | 3 | | [ llmrails/ember-v1] ( https://hf.co/llmrails/ember-v1 ) |
67
- | 4 | | [ thenlper/gte-large] ( https://hf.co/thenlper/gte-large ) |
68
- | 5 | | [ thenlper/gte-base] ( https://hf.co/thenlper/gte-base ) |
69
- | 6 | | [ intfloat/e5-large-v2] ( https://hf.co/intfloat/e5-large-v2 ) |
70
- | 7 | | [ BAAI/bge-small-en-v1.5] ( https://hf.co/BAAI/bge-small-en-v1.5 ) |
71
- | 10 | | [ intfloat/e5-base-v2] ( https://hf.co/intfloat/e5-base-v2 ) |
72
- | 11 | XLM-RoBERTa | [ intfloat/multilingual-e5-large] ( https://hf.co/intfloat/multilingual-e5-large ) |
63
+ | MTEB Rank | Model Type | Model ID |
64
+ | -----------| -------------| ----------------------------------------------------------------------------------------|
65
+ | 1 | Bert | [ BAAI/bge-large-en-v1.5] ( https://hf.co/BAAI/bge-large-en-v1.5 ) |
66
+ | 2 | | [ BAAI/bge-base-en-v1.5] ( https://hf.co/BAAI/bge-base-en-v1.5 ) |
67
+ | 3 | | [ llmrails/ember-v1] ( https://hf.co/llmrails/ember-v1 ) |
68
+ | 4 | | [ thenlper/gte-large] ( https://hf.co/thenlper/gte-large ) |
69
+ | 5 | | [ thenlper/gte-base] ( https://hf.co/thenlper/gte-base ) |
70
+ | 6 | | [ intfloat/e5-large-v2] ( https://hf.co/intfloat/e5-large-v2 ) |
71
+ | 7 | | [ BAAI/bge-small-en-v1.5] ( https://hf.co/BAAI/bge-small-en-v1.5 ) |
72
+ | 10 | | [ intfloat/e5-base-v2] ( https://hf.co/intfloat/e5-base-v2 ) |
73
+ | 11 | XLM-RoBERTa | [ intfloat/multilingual-e5-large] ( https://hf.co/intfloat/multilingual-e5-large ) |
74
+ | N/A | JinaBERT | [ jinaai/jina-embeddings-v2-base-en] ( https://hf.co/jinaai/jina-embeddings-v2-base-en ) |
75
+ | N/A | JinaBERT | [ jinaai/jina-embeddings-v2-small-en] ( https://hf.co/jinaai/jina-embeddings-v2-small-en ) |
73
76
74
77
75
78
You can explore the list of best performing text embeddings models [ here] ( https://huggingface.co/spaces/mteb/leaderboard ) .
@@ -81,7 +84,7 @@ model=BAAI/bge-large-en-v1.5
81
84
revision=refs/pr/5
82
85
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
83
86
84
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.2.2 --model-id $model --revision $revision
87
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3.0 --model-id $model --revision $revision
85
88
```
86
89
87
90
And then you can make requests like
@@ -223,15 +226,15 @@ Options:
223
226
224
227
Text Embeddings Inference ships with multiple Docker images that you can use to target a specific backend:
225
228
226
- | Architecture | Image |
227
- | -------------------------------------| ------------------------------------------------------------|
228
- | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .2.2 |
229
- | Volta | NOT SUPPORTED |
230
- | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .2.2 |
231
- | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.2.2 |
232
- | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.2.2 |
233
- | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.2.2 |
234
- | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .2.2 |
229
+ | Architecture | Image |
230
+ | -------------------------------------| --------------------------------------------------------------------------- |
231
+ | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-0 .3.0 |
232
+ | Volta | NOT SUPPORTED |
233
+ | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-0 .3.0 (experimental) |
234
+ | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.3.0 |
235
+ | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.3.0 |
236
+ | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.3.0 |
237
+ | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-0 .3.0 (experimental) |
235
238
236
239
### API documentation
237
240
@@ -256,7 +259,7 @@ model=<your private model>
256
259
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
257
260
token=< your cli READ token>
258
261
259
- docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.2.2 --model-id $model
262
+ docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.3.0 --model-id $model
260
263
```
261
264
262
265
### Distributed Tracing
0 commit comments