Skip to content

Commit be40b99

Browse files
authored
[Docs] Update quick tour (#574)
1 parent 72dac20 commit be40b99

File tree

1 file changed

+61
-18
lines changed

1 file changed

+61
-18
lines changed

docs/source/en/quick_tour.md

Lines changed: 61 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
22
33
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
44
the License. You may obtain a copy of the License at
@@ -16,18 +16,19 @@ rendered properly in your Markdown viewer.
1616

1717
# Quick Tour
1818

19-
## Text Embeddings
19+
## Set up
2020

2121
The easiest way to get started with TEI is to use one of the official Docker containers
2222
(see [Supported models and hardware](supported_models) to choose the right container).
2323

24-
After making sure that your hardware is supported, install the
25-
[NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) if you
26-
plan on utilizing GPUs. NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
24+
Hence one needs to install Docker following their [installation instructions](https://docs.docker.com/get-docker/).
2725

28-
Next, install Docker following their [installation instructions](https://docs.docker.com/get-docker/).
26+
TEI supports inference both on GPU and CPU. If you plan on using a GPU, make sure to check that your hardware is supported by checking [this table](https://github.com/huggingface/text-embeddings-inference?tab=readme-ov-file#docker-images).
27+
Next, install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher.
2928

30-
Finally, deploy your model. Let's say you want to use `BAAI/bge-large-en-v1.5`. Here's how you can do this:
29+
## Deploy
30+
31+
Next it's time to deploy your model. Let's say you want to use [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5). Here's how you can do this:
3132

3233
```shell
3334
model=BAAI/bge-large-en-v1.5
@@ -42,7 +43,13 @@ We also recommend sharing a volume with the Docker container (`volume=$PWD/data`
4243

4344
</Tip>
4445

45-
Once you have deployed a model, you can use the `embed` endpoint by sending requests:
46+
## Inference
47+
48+
Inference can be performed in 3 ways: using cURL, or via the `InferenceClient` or `OpenAI` Python SDKs.
49+
50+
#### cURL
51+
52+
To send a POST request to the TEI endpoint using cURL, you can run the following command:
4653

4754
```bash
4855
curl 127.0.0.1:8080/embed \
@@ -51,16 +58,53 @@ curl 127.0.0.1:8080/embed \
5158
-H 'Content-Type: application/json'
5259
```
5360

54-
## Re-rankers
61+
#### Python
62+
63+
To run inference using Python, you can either use the [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/en/index) Python SDK (recommended) or the `openai` Python SDK.
64+
65+
##### huggingface_hub
66+
67+
You can install it via pip as `pip install --upgrade --quiet huggingface_hub`, and then run:
68+
69+
```python
70+
from huggingface_hub import InferenceClient
71+
72+
client = InferenceClient()
73+
74+
embedding = client.feature_extraction("What is deep learning?",
75+
model="http://localhost:8080/embed")
76+
print(len(embedding[0]))
77+
```
78+
79+
#### OpenAI
80+
81+
You can install it via pip as `pip install --upgrade openai`, and then run:
82+
83+
```python
84+
import os
85+
from openai import OpenAI
86+
87+
client = OpenAI(base_url="http://localhost:8080/v1/embeddings")
88+
89+
response = client.embeddings.create(
90+
model="tei",
91+
input="What is deep learning?"
92+
)
93+
94+
print(response)
95+
```
96+
97+
## Re-rankers and sequence classification
98+
99+
TEI also supports re-ranker and classic sequence classification models.
55100

56-
Re-rankers models are Sequence Classification cross-encoders models with a single class that scores the similarity
57-
between a query and a text.
101+
### Re-rankers
58102

59-
See [this blogpost](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83) by
103+
Rerankers, also called cross-encoders, are sequence classification models with a single class that score the similarity between a query and a text. See [this blogpost](https://blog.llamaindex.ai/boosting-rag-picking-the-best-embedding-reranker-models-42d079022e83) by
60104
the LlamaIndex team to understand how you can use re-rankers models in your RAG pipeline to improve
61105
downstream performance.
62106

63-
Let's say you want to use `BAAI/bge-reranker-large`:
107+
Let's say you want to use [`BAAI/bge-reranker-large`](https://huggingface.co/BAAI/bge-reranker-large). First, you can deploy it like so:
64108

65109
```shell
66110
model=BAAI/bge-reranker-large
@@ -69,8 +113,7 @@ volume=$PWD/data
69113
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id $model
70114
```
71115

72-
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list
73-
of texts:
116+
Once you have deployed a model, you can use the `rerank` endpoint to rank the similarity between a query and a list of texts. With `cURL` this can be done like so:
74117

75118
```bash
76119
curl 127.0.0.1:8080/rerank \
@@ -79,9 +122,9 @@ curl 127.0.0.1:8080/rerank \
79122
-H 'Content-Type: application/json'
80123
```
81124

82-
## Sequence Classification
125+
### Sequence classification models
83126

84-
You can also use classic Sequence Classification models like `SamLowe/roberta-base-go_emotions`:
127+
You can also use classic Sequence Classification models like [`SamLowe/roberta-base-go_emotions`](https://huggingface.co/SamLowe/roberta-base-go_emotions):
85128

86129
```shell
87130
model=SamLowe/roberta-base-go_emotions
@@ -101,7 +144,7 @@ curl 127.0.0.1:8080/predict \
101144

102145
## Batching
103146

104-
You can send multiple inputs in a batch. For example, for embeddings
147+
You can send multiple inputs in a batch. For example, for embeddings:
105148

106149
```bash
107150
curl 127.0.0.1:8080/embed \

0 commit comments

Comments
 (0)