Skip to content

Commit 1e076c7

Browse files
docs: air-gapped deployments (#326)
1 parent acbbb92 commit 1e076c7

File tree

3 files changed

+57
-14
lines changed

3 files changed

+57
-14
lines changed

README.md

Lines changed: 32 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ length of 512 tokens:
3333
- [Docker Images](#docker-images)
3434
- [API Documentation](#api-documentation)
3535
- [Using a private or gated model](#using-a-private-or-gated-model)
36+
- [Air gapped deployment](#air-gapped-deployment)
3637
- [Using Re-rankers models](#using-re-rankers-models)
3738
- [Using Sequence Classification models](#using-sequence-classification-models)
3839
- [Using SPLADE pooling](#using-splade-pooling)
@@ -100,11 +101,10 @@ Below are some examples of the currently supported models:
100101
### Docker
101102

102103
```shell
103-
model=BAAI/bge-large-en-v1.5
104-
revision=refs/pr/5
104+
model=Alibaba-NLP/gte-base-en-v1.5
105105
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
106106

107-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model --revision $revision
107+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model
108108
```
109109

110110
And then you can make requests like
@@ -347,6 +347,29 @@ token=<your cli READ token>
347347
docker run --gpus all -e HF_API_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id $model
348348
```
349349

350+
### Air gapped deployment
351+
352+
To deploy Text Embeddings Inference in an air-gapped environment, first download the weights and then mount them inside
353+
the container using a volume.
354+
355+
For example:
356+
357+
```shell
358+
# (Optional) create a `models` directory
359+
mkdir models
360+
cd models
361+
362+
# Make sure you have git-lfs installed (https://git-lfs.com)
363+
git lfs install
364+
git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
365+
366+
# Set the models directory as the volume path
367+
volume=$PWD
368+
369+
# Mount the models directory inside the container with a volume and set the model ID
370+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id /data/gte-base-en-v1.5
371+
```
372+
350373
### Using Re-rankers models
351374

352375
`text-embeddings-inference` v0.4.0 added support for CamemBERT, RoBERTa and XLM-RoBERTa Sequence Classification models.
@@ -428,11 +451,10 @@ found [here](https://github.com/huggingface/text-embeddings-inference/blob/main/
428451
You can use the gRPC API by adding the `-grpc` tag to any TEI Docker image. For example:
429452

430453
```shell
431-
model=BAAI/bge-large-en-v1.5
432-
revision=refs/pr/5
454+
model=Alibaba-NLP/gte-base-en-v1.5
433455
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
434456

435-
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4-grpc --model-id $model --revision $revision
457+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4-grpc --model-id $model
436458
```
437459

438460
```shell
@@ -463,10 +485,9 @@ cargo install --path router -F metal
463485
You can now launch Text Embeddings Inference on CPU with:
464486

465487
```shell
466-
model=BAAI/bge-large-en-v1.5
467-
revision=refs/pr/5
488+
model=Alibaba-NLP/gte-base-en-v1.5
468489

469-
text-embeddings-router --model-id $model --revision $revision --port 8080
490+
text-embeddings-router --model-id $model --port 8080
470491
```
471492

472493
**Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
@@ -502,10 +523,9 @@ cargo install --path router -F candle-cuda -F http --no-default-features
502523
You can now launch Text Embeddings Inference on GPU with:
503524

504525
```shell
505-
model=BAAI/bge-large-en-v1.5
506-
revision=refs/pr/5
526+
model=Alibaba-NLP/gte-base-en-v1.5
507527

508-
text-embeddings-router --model-id $model --revision $revision --port 8080
528+
text-embeddings-router --model-id $model --port 8080
509529
```
510530

511531
## Docker build

docs/source/en/quick_tour.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,3 +121,26 @@ curl 127.0.0.1:8080/predict \
121121
-d '{"inputs":[["I like you."], ["I hate pineapples"]]}' \
122122
-H 'Content-Type: application/json'
123123
```
124+
125+
## Air gapped deployment
126+
127+
To deploy Text Embeddings Inference in an air-gapped environment, first download the weights and then mount them inside
128+
the container using a volume.
129+
130+
For example:
131+
132+
```shell
133+
# (Optional) create a `models` directory
134+
mkdir models
135+
cd models
136+
137+
# Make sure you have git-lfs installed (https://git-lfs.com)
138+
git lfs install
139+
git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
140+
141+
# Set the models directory as the volume path
142+
volume=$PWD
143+
144+
# Mount the models directory inside the container with a volume and set the model ID
145+
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.4 --model-id /data/gte-base-en-v1.5
146+
```

router/src/main.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
1414
struct Args {
1515
/// The name of the model to load.
1616
/// Can be a MODEL_ID as listed on <https://hf.co/models> like
17-
/// `thenlper/gte-base`.
17+
/// `Alibaba-NLP/gte-base-en-v1.5`.
1818
/// Or it can be a local directory containing the necessary files
1919
/// as saved by `save_pretrained(...)` methods of transformers
20-
#[clap(default_value = "thenlper/gte-base", long, env)]
20+
#[clap(default_value = "Alibaba-NLP/gte-base-en-v1.5", long, env)]
2121
#[redact(partial)]
2222
model_id: String,
2323

0 commit comments

Comments
 (0)