You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Production ready (distributed tracing with Open Telemetry, Prometheus metrics)
50
53
51
-
52
54
## Get Started
53
55
54
56
### Supported Models
55
57
56
-
You can use any JinaBERT model with Alibi or absolute positions or any BERT, CamemBERT, RoBERTa, or XLM-RoBERTa model with absolute positions in `text-embeddings-inference`.
58
+
#### Text Embeddings
59
+
60
+
You can use any JinaBERT model with Alibi or absolute positions or any BERT, CamemBERT, RoBERTa, or XLM-RoBERTa model
61
+
with absolute positions in `text-embeddings-inference`.
57
62
58
63
**Support for other model types will be added in the future.**
| Sentiment Analysis | RoBERTa |[SamLowe/roberta-base-go_emotions](https://huggingface.co/SamLowe/roberta-base-go_emotions)||
78
95
79
96
### Docker
80
97
@@ -95,7 +112,8 @@ curl 127.0.0.1:8080/embed \
95
112
-H 'Content-Type: application/json'
96
113
```
97
114
98
-
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
115
+
**Note:** To use GPUs, you need to install
116
+
the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
99
117
We also recommend using NVIDIA drivers with CUDA version 12.0 or higher.
100
118
101
119
To see all options to serve your models:
@@ -130,20 +148,18 @@ Options:
130
148
131
149
--dtype <DTYPE>
132
150
The dtype to be forced upon the model
133
-
134
-
If `dtype` is not set, it defaults to float32 on accelerate, and float16 for all other architectures
135
151
136
152
[env: DTYPE=]
137
153
[possible values: float16, float32]
138
154
139
155
--pooling <POOLING>
140
-
Optionally control the pooling method.
141
-
142
-
If `pooling` is not set, the pooling configuration will be parsed from the model `1_Pooling/config.json`
143
-
configuration.
144
-
156
+
Optionally control the pooling method for embedding models.
157
+
158
+
If `pooling` is not set, the pooling configuration will be parsed from the model `1_Pooling/config.json`
159
+
configuration.
160
+
145
161
If `pooling` is set, it will override the model pooling configuration
146
-
162
+
147
163
[env: POOLING=]
148
164
[possible values: cls, mean]
149
165
@@ -241,7 +257,8 @@ You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` envir
241
257
### API documentation
242
258
243
259
You can consult the OpenAPI documentation of the `text-embeddings-inference` REST API using the `/docs` route.
244
-
The Swagger UI is also available at: [https://huggingface.github.io/text-embeddings-inference](https://huggingface.github.io/text-embeddings-inference).
0 commit comments