Skip to content

Commit 7b85d8c

Browse files
fix: docs
1 parent ca99fa3 commit 7b85d8c

File tree

4 files changed

+21
-21
lines changed

4 files changed

+21
-21
lines changed

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ model=BAAI/bge-large-en-v1.5
9797
revision=refs/pr/5
9898
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
9999

100-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
100+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
101101
```
102102

103103
And then you can make requests like
@@ -245,13 +245,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
245245

246246
| Architecture | Image |
247247
|-------------------------------------|-------------------------------------------------------------------------|
248-
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 |
248+
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.1 |
249249
| Volta | NOT SUPPORTED |
250-
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.0 (experimental) |
251-
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
252-
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
253-
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
254-
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.0 (experimental) |
250+
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.1 (experimental) |
251+
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.1 |
252+
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.1 |
253+
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.1 |
254+
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.1 (experimental) |
255255

256256
**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
257257
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.
@@ -280,7 +280,7 @@ model=<your private model>
280280
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
281281
token=<your cli READ token>
282282

283-
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
283+
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
284284
```
285285

286286
### Using Re-rankers models
@@ -298,7 +298,7 @@ model=BAAI/bge-reranker-large
298298
revision=refs/pr/4
299299
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
300300

301-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
301+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
302302
```
303303

304304
And then you can rank the similarity between a query and a list of texts with:
@@ -318,7 +318,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
318318
model=SamLowe/roberta-base-go_emotions
319319
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
320320

321-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
321+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
322322
```
323323

324324
Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
@@ -347,7 +347,7 @@ model=BAAI/bge-large-en-v1.5
347347
revision=refs/pr/5
348348
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
349349

350-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0-grpc --model-id $model --revision $revision
350+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1-grpc --model-id $model --revision $revision
351351
```
352352

353353
```shell

docs/source/en/private_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,5 +37,5 @@ model=<your private model>
3737
volume=$PWD/data
3838
token=<your cli Hugging Face Hub token>
3939

40-
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
40+
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
4141
```

docs/source/en/quick_tour.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ model=BAAI/bge-large-en-v1.5
3434
revision=refs/pr/5
3535
volume=$PWD/data
3636

37-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
37+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
3838
```
3939

4040
<Tip>
@@ -69,7 +69,7 @@ model=BAAI/bge-reranker-large
6969
revision=refs/pr/4
7070
volume=$PWD/data
7171

72-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
72+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
7373
```
7474

7575
Once you have deployed a model you can use the `rerank` endpoint to rank the similarity between a query and a list
@@ -90,7 +90,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
9090
model=SamLowe/roberta-base-go_emotions
9191
volume=$PWD/data
9292

93-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
93+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
9494
```
9595

9696
Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:

docs/source/en/supported_models.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,13 +63,13 @@ Find the appropriate Docker image for your hardware in the following table:
6363

6464
| Architecture | Image |
6565
|-------------------------------------|--------------------------------------------------------------------------|
66-
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 |
66+
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.1 |
6767
| Volta | NOT SUPPORTED |
68-
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.0 (experimental) |
69-
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
70-
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
71-
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
72-
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.0 (experimental) |
68+
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.1 (experimental) |
69+
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.1 |
70+
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.1 |
71+
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.1 |
72+
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.1 (experimental) |
7373

7474
**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
7575
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.

0 commit comments

Comments
 (0)