Skip to content

Commit 41b692d

Browse files
v1.0.0 (#168)
1 parent f1e50df commit 41b692d

File tree

11 files changed

+44
-44
lines changed

11 files changed

+44
-44
lines changed

Cargo.lock

Lines changed: 7 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ members = [
1111
resolver = "2"
1212

1313
[workspace.package]
14-
version = "0.6.0"
14+
version = "1.0.0"
1515
edition = "2021"
1616
authors = ["Olivier Dehaene"]
1717
homepage = "https://github.com/huggingface/text-embeddings-inference"

README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,10 @@ with absolute positions in `text-embeddings-inference`.
6767

6868
Examples of supported models:
6969

70-
| MTEB Rank | Model Type | Example Model ID |
70+
| MTEB Rank | Model Type | Model ID |
7171
|-----------|-------------|--------------------------------------------------------------------------------------------------|
7272
| 6 | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) |
73-
| 1O | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
73+
| 10 | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
7474
| N/A | NomicBert | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) |
7575
| N/A | NomicBert | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) |
7676
| N/A | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) |
@@ -80,7 +80,7 @@ models [here](https://huggingface.co/spaces/mteb/leaderboard).
8080

8181
#### Sequence Classification and Re-Ranking
8282

83-
`text-embeddings-inference` v0.4.0 added support for CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
83+
`text-embeddings-inference` v0.4.0 added support for Bert, CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
8484

8585
Example of supported sequence classification models:
8686

@@ -97,7 +97,7 @@ model=BAAI/bge-large-en-v1.5
9797
revision=refs/pr/5
9898
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
9999

100-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
100+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
101101
```
102102

103103
And then you can make requests like
@@ -242,13 +242,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
242242

243243
| Architecture | Image |
244244
|-------------------------------------|-------------------------------------------------------------------------|
245-
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-0.6 |
245+
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 |
246246
| Volta | NOT SUPPORTED |
247-
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-0.6 (experimental) |
248-
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:0.6 |
249-
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-0.6 |
250-
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-0.6 |
251-
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-0.6 (experimental) |
247+
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.0 (experimental) |
248+
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
249+
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
250+
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
251+
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.0 (experimental) |
252252

253253
**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
254254
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.
@@ -277,7 +277,7 @@ model=<your private model>
277277
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
278278
token=<your cli READ token>
279279

280-
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
280+
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
281281
```
282282

283283
### Using Re-rankers models
@@ -295,7 +295,7 @@ model=BAAI/bge-reranker-large
295295
revision=refs/pr/4
296296
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
297297

298-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
298+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
299299
```
300300

301301
And then you can rank the similarity between a query and a list of texts with:
@@ -315,7 +315,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
315315
model=SamLowe/roberta-base-go_emotions
316316
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
317317

318-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
318+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
319319
```
320320

321321
Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
@@ -344,7 +344,7 @@ model=BAAI/bge-large-en-v1.5
344344
revision=refs/pr/5
345345
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
346346

347-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6-grpc --model-id $model --revision $revision
347+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0-grpc --model-id $model --revision $revision
348348
```
349349

350350
```shell

docs/openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"license": {
1010
"name": "HFOIL"
1111
},
12-
"version": "0.6.0"
12+
"version": "1.0.0"
1313
},
1414
"paths": {
1515
"/embed": {

docs/source/en/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@ TEI offers multiple features tailored to optimize the deployment process and enh
2323

2424
**Key Features:**
2525

26-
* **Streamlined Deployment:** TEI eliminates the need for a model graph compilation step for a more efficient deployment process.
26+
* **Streamlined Deployment:** TEI eliminates the need for a model graph compilation step for an easier deployment process.
2727
* **Efficient Resource Utilization:** Benefit from small Docker images and rapid boot times, allowing for true serverless capabilities.
2828
* **Dynamic Batching:** TEI incorporates token-based dynamic batching thus optimizing resource utilization during inference.
2929
* **Optimized Inference:** TEI leverages [Flash Attention](https://github.com/HazyResearch/flash-attention), [Candle](https://github.com/huggingface/candle), and [cuBLASLt](https://docs.nvidia.com/cuda/cublas/#using-the-cublaslt-api) by using optimized transformers code for inference.
30-
* **Safetensors weight loading:** TEI loads [Safetensors](https://github.com/huggingface/safetensors) weights to enable tensor parallelism.
31-
* **Production-Ready:** TEI supports distributed tracing through Open Telemetry and Prometheus metrics.
30+
* **Safetensors weight loading:** TEI loads [Safetensors](https://github.com/huggingface/safetensors) weights for faster boot times.
31+
* **Production-Ready:** TEI supports distributed tracing through Open Telemetry and exports Prometheus metrics.
3232

3333
**Benchmarks**
3434

docs/source/en/local_cpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ You can install `text-embeddings-inference` locally to run it on your own machin
2020

2121
## Step 1: Install Rust
2222

23-
[Install Rust]((https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
23+
[Install Rust](https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
2424

2525
```shell
2626
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

docs/source/en/local_gpu.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ export PATH=$PATH:/usr/local/cuda/bin
3131

3232
## Step 2: Install Rust
3333

34-
[Install Rust]((https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
34+
[Install Rust](https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
3535

3636
```shell
3737
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

docs/source/en/local_metal.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Here are the step-by-step instructions for installation:
2121

2222
## Step 1: Install Rust
2323

24-
[Install Rust]((https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
24+
[Install Rust](https://rustup.rs/) on your machine by run the following in your terminal, then following the instructions:
2525

2626
```shell
2727
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

docs/source/en/private_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,5 @@ model=<your private model>
3737
volume=$PWD/data
3838
token=<your cli Hugging Face Hub token>
3939

40-
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
40+
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
4141
```

docs/source/en/quick_tour.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ model=BAAI/bge-large-en-v1.5
3434
revision=refs/pr/5
3535
volume=$PWD/data
3636

37-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
37+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
3838
```
3939

4040
<Tip>
@@ -69,7 +69,7 @@ model=BAAI/bge-reranker-large
6969
revision=refs/pr/4
7070
volume=$PWD/data
7171

72-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model --revision $revision
72+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
7373
```
7474

7575
Once you have deployed a model you can use the `rerank` endpoint to rank the similarity between a query and a list
@@ -90,7 +90,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
9090
model=SamLowe/roberta-base-go_emotions
9191
volume=$PWD/data
9292

93-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:0.6 --model-id $model
93+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
9494
```
9595

9696
Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:

0 commit comments

Comments
 (0)