@@ -33,6 +33,7 @@ length of 512 tokens:
33
33
- [ Docker Images] ( #docker-images )
34
34
- [ API Documentation] ( #api-documentation )
35
35
- [ Using a private or gated model] ( #using-a-private-or-gated-model )
36
+ - [ Air gapped deployment] ( #air-gapped-deployment )
36
37
- [ Using Re-rankers models] ( #using-re-rankers-models )
37
38
- [ Using Sequence Classification models] ( #using-sequence-classification-models )
38
39
- [ Using SPLADE pooling] ( #using-splade-pooling )
@@ -100,11 +101,10 @@ Below are some examples of the currently supported models:
100
101
### Docker
101
102
102
103
``` shell
103
- model=BAAI/bge-large-en-v1.5
104
- revision=refs/pr/5
104
+ model=Alibaba-NLP/gte-base-en-v1.5
105
105
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
106
106
107
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model --revision $revision
107
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model
108
108
```
109
109
110
110
And then you can make requests like
@@ -347,6 +347,29 @@ token=<your cli READ token>
347
347
docker run --gpus all -e HF_API_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model
348
348
```
349
349
350
+ ### Air gapped deployment
351
+
352
+ To deploy Text Embeddings Inference in an air-gapped environment, first download the weights and then mount them inside
353
+ the container using a volume.
354
+
355
+ For example:
356
+
357
+ ``` shell
358
+ # (Optional) create a `models` directory
359
+ mkdir models
360
+ cd models
361
+
362
+ # Make sure you have git-lfs installed (https://git-lfs.com)
363
+ git lfs install
364
+ git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
365
+
366
+ # Set the models directory as the volume path
367
+ volume=$PWD
368
+
369
+ # Mount the models directory inside the container with a volume and set the model ID
370
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id /data/gte-base-en-v1.5
371
+ ```
372
+
350
373
### Using Re-rankers models
351
374
352
375
` text-embeddings-inference ` v0.4.0 added support for CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
@@ -428,11 +451,10 @@ found [here](https://github.com/huggingface/text-embeddings-inference/blob/main/
428
451
You can use the gRPC API by adding the ` -grpc ` tag to any TEI Docker image. For example:
429
452
430
453
``` shell
431
- model=BAAI/bge-large-en-v1.5
432
- revision=refs/pr/5
454
+ model=Alibaba-NLP/gte-base-en-v1.5
433
455
volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
434
456
435
- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4-grpc --model-id $model --revision $revision
457
+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4-grpc --model-id $model
436
458
```
437
459
438
460
``` shell
@@ -463,10 +485,9 @@ cargo install --path router -F metal
463
485
You can now launch Text Embeddings Inference on CPU with:
464
486
465
487
``` shell
466
- model=BAAI/bge-large-en-v1.5
467
- revision=refs/pr/5
488
+ model=Alibaba-NLP/gte-base-en-v1.5
468
489
469
- text-embeddings-router --model-id $model --revision $revision -- port 8080
490
+ text-embeddings-router --model-id $model --port 8080
470
491
```
471
492
472
493
** Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
@@ -502,10 +523,9 @@ cargo install --path router -F candle-cuda -F http --no-default-features
502
523
You can now launch Text Embeddings Inference on GPU with:
503
524
504
525
``` shell
505
- model=BAAI/bge-large-en-v1.5
506
- revision=refs/pr/5
526
+ model=Alibaba-NLP/gte-base-en-v1.5
507
527
508
- text-embeddings-router --model-id $model --revision $revision -- port 8080
528
+ text-embeddings-router --model-id $model --port 8080
509
529
```
510
530
511
531
## Docker build
0 commit comments