Skip to content

Default to Qwen3 in README.md and docs/ examples #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ Below are some examples of the currently supported models:
### Docker

```shell
model=BAAI/bge-large-en-v1.5
model=Qwen/Qwen3-Embedding-0.6B
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
Expand Down Expand Up @@ -369,13 +369,13 @@ cd models

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/Alibaba-NLP/gte-base-en-v1.5
git clone https://huggingface.co/Qwen/Qwen3-Embedding-0.6B

# Set the models directory as the volume path
volume=$PWD

# Mount the models directory inside the container with a volume and set the model ID
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/gte-base-en-v1.5
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id /data/Qwen3-Embedding-0.6B
```

### Using Re-rankers models
Expand Down Expand Up @@ -458,7 +458,7 @@ found [here](https://github.com/huggingface/text-embeddings-inference/blob/main/
You can use the gRPC API by adding the `-grpc` tag to any TEI Docker image. For example:

```shell
model=BAAI/bge-large-en-v1.5
model=Qwen/Qwen3-Embedding-0.6B
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7-grpc --model-id $model
Expand Down Expand Up @@ -494,7 +494,7 @@ cargo install --path router -F metal
You can now launch Text Embeddings Inference on CPU with:

```shell
model=BAAI/bge-large-en-v1.5
model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --port 8080
```
Expand Down Expand Up @@ -532,7 +532,7 @@ cargo install --path router -F candle-cuda -F http --no-default-features
You can now launch Text Embeddings Inference on GPU with:

```shell
model=BAAI/bge-large-en-v1.5
model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --port 8080
```
Expand Down
6 changes: 3 additions & 3 deletions docs/source/en/intel_container.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_cpu_ipe
To deploy your model on an Intel® CPU, use the following command:

```shell
model='BAAI/bge-large-en-v1.5'
model='Qwen/Qwen3-Embedding-0.6B'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data tei_cpu_ipex --model-id $model
Expand All @@ -58,7 +58,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_xpu_ipe
To deploy your model on an Intel® XPU, use the following command:

```shell
model='BAAI/bge-large-en-v1.5'
model='Qwen/Qwen3-Embedding-0.6B'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --device=/dev/dri -v /dev/dri/by-path:/dev/dri/by-path tei_xpu_ipex --model-id $model --dtype float16
Expand All @@ -81,7 +81,7 @@ docker build . -f Dockerfile-intel --build-arg PLATFORM=$platform -t tei_hpu
To deploy your model on an Intel® HPU (Gaudi), use the following command:

```shell
model='BAAI/bge-large-en-v1.5'
model='Qwen/Qwen3-Embedding-0.6B'
volume=$PWD/data

docker run -p 8080:80 -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e MAX_WARMUP_SEQUENCE_LENGTH=512 tei_hpu --model-id $model --dtype bfloat16
Expand Down
5 changes: 2 additions & 3 deletions docs/source/en/local_cpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,9 @@ cargo install --path router -F metal
Once the installation is successfully complete, you can launch Text Embeddings Inference on CPU with the following command:

```shell
model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --revision $revision --port 8080
text-embeddings-router --model-id $model --port 8080
```

<Tip>
Expand Down
5 changes: 2 additions & 3 deletions docs/source/en/local_gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@ cargo install --path router -F candle-cuda -F http --no-default-features
You can now launch Text Embeddings Inference on GPU with:

```shell
model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --revision $revision --port 8080
text-embeddings-router --model-id $model --dtype float16 --port 8080
```
5 changes: 2 additions & 3 deletions docs/source/en/local_metal.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,9 @@ cargo install --path router -F metal
Once the installation is successfully complete, you can launch Text Embeddings Inference with Metal with the following command:

```shell
model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
model=Qwen/Qwen3-Embedding-0.6B

text-embeddings-router --model-id $model --revision $revision --port 8080
text-embeddings-router --model-id $model --port 8080
```

Now you are ready to use `text-embeddings-inference` locally on your machine.
4 changes: 2 additions & 2 deletions docs/source/en/quick_tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ Next, install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/

## Deploy

Next it's time to deploy your model. Let's say you want to use [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5). Here's how you can do this:
Next it's time to deploy your model. Let's say you want to use [`Qwen/Qwen3-Embedding-0.6B`](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B). Here's how you can do this:

```shell
model=BAAI/bge-large-en-v1.5
model=Qwen/Qwen3-Embedding-0.6B
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
Expand Down
22 changes: 13 additions & 9 deletions docs/source/en/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,21 +21,24 @@ We are continually expanding our support for other model types and plan to inclu
## Supported embeddings models

Text Embeddings Inference currently supports Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions, JinaBERT
model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, and ModernBERT.
model with Alibi positions and Mistral, Alibaba GTE, Qwen2 models with Rope positions, MPNet, ModernBERT, and Qwen3.

Below are some examples of the currently supported models:

| MTEB Rank | Model Size | Model Type | Model ID |
|-----------|---------------------|-------------|--------------------------------------------------------------------------------------------------|
| 3 | 7B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) |
| 11 | 1.5B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) |
| 14 | 7B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) |
| 20 | 0.3B | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) |
| 31 | 0.5B | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) |
| 37 | 0.3B | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) |
| 49 | 0.5B | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
| 2 | 8B (Very Expensive) | Qwen3 | [Qwen/Qwen3-Embedding-8B](https://hf.co/Qwen/Qwen3-Embedding-8B) |
| 4 | 0.6B | Qwen3 | [Qwen/Qwen3-Embedding-0.6B](https://hf.co/Qwen/Qwen3-Embedding-0.6B) |
| 6 | 7B (Very Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-7B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-7B-instruct) |
| 7 | 0.5B | XLM-RoBERTa | [intfloat/multilingual-e5-large-instruct](https://hf.co/intfloat/multilingual-e5-large-instruct) |
| 14 | 1.5B (Expensive) | Qwen2 | [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://hf.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) |
| 17 | 7B (Very Expensive) | Mistral | [Salesforce/SFR-Embedding-2_R](https://hf.co/Salesforce/SFR-Embedding-2_R) |
| 34 | 0.5B | XLM-RoBERTa | [Snowflake/snowflake-arctic-embed-l-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-l-v2.0) |
| 40 | 0.3B | Alibaba GTE | [Snowflake/snowflake-arctic-embed-m-v2.0](https://hf.co/Snowflake/snowflake-arctic-embed-m-v2.0) |
| 51 | 0.3B | Bert | [WhereIsAI/UAE-Large-V1](https://hf.co/WhereIsAI/UAE-Large-V1) |
| N/A | 0.4B | Alibaba GTE | [Alibaba-NLP/gte-large-en-v1.5](https://hf.co/Alibaba-NLP/gte-large-en-v1.5) |
| N/A | 0.4B | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) |
| N/A | 0.4B | ModernBERT | [answerdotai/ModernBERT-large](https://hf.co/answerdotai/ModernBERT-large) |
| N/A | 0.3B | NomicBert | [nomic-ai/nomic-embed-text-v2-moe](https://hf.co/nomic-ai/nomic-embed-text-v2-moe) |
| N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1](https://hf.co/nomic-ai/nomic-embed-text-v1) |
| N/A | 0.1B | NomicBert | [nomic-ai/nomic-embed-text-v1.5](https://hf.co/nomic-ai/nomic-embed-text-v1.5) |
| N/A | 0.1B | JinaBERT | [jinaai/jina-embeddings-v2-base-en](https://hf.co/jinaai/jina-embeddings-v2-base-en) |
Expand All @@ -56,6 +59,7 @@ Below are some examples of the currently supported models:
| Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) |
| Re-Ranking | XLM-RoBERTa | [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) |
| Re-Ranking | GTE | [Alibaba-NLP/gte-multilingual-reranker-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-reranker-base) |
| Re-Ranking | ModernBert | [Alibaba-NLP/gte-reranker-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base) |
| Sentiment Analysis | RoBERTa | [SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions) |

## Supported hardware
Expand Down
Loading