You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+53-14Lines changed: 53 additions & 14 deletions
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,8 @@ Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1
33
33
-[API Documentation](#api-documentation)
34
34
-[Using a private or gated model](#using-a-private-or-gated-model)
35
35
-[Distributed Tracing](#distributed-tracing)
36
-
-[Local Install](#local-install)
36
+
-[Local Install](#local-install)
37
+
-[Docker Build](#docker-build)
37
38
38
39
- No compilation step
39
40
- Dynamic shapes
@@ -89,7 +90,7 @@ curl 127.0.0.1:8080/embed \
89
90
```
90
91
91
92
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
92
-
We also recommend using NVIDIA drivers with CUDA version 12 or higher.
93
+
We also recommend using NVIDIA drivers with CUDA version 12.2 or higher.
93
94
94
95
To see all options to serve your models:
95
96
@@ -123,9 +124,10 @@ Options:
123
124
124
125
--dtype <DTYPE>
125
126
The dtype to be forced upon the model
127
+
128
+
If `dtype` is not set, it defaults to float32 on accelerate, and float16 for all other architectures
126
129
127
130
[env: DTYPE=]
128
-
[default: float16]
129
131
[possible values: float16, float32]
130
132
131
133
--pooling <POOLING>
@@ -217,13 +219,14 @@ Options:
217
219
218
220
Text Embeddings Inference ships with multiple Docker images that you can use to target a specific backend:
0 commit comments