@@ -39,12 +39,12 @@ docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingf
39
39
40
40
<Tip >
41
41
42
- Here we pass a ` revision=refs/pr/5 ` , because the ` safetensors ` variant of this model is currently in a pull request.
42
+ Here we pass a ` revision=refs/pr/5 ` because the ` safetensors ` variant of this model is currently in a pull request.
43
43
We also recommend sharing a volume with the Docker container (` volume=$PWD/data ` ) to avoid downloading weights every run.
44
44
45
45
</Tip >
46
46
47
- Once you have deployed a model you can use the ` embed ` endpoint by sending requests:
47
+ Once you have deployed a model, you can use the ` embed ` endpoint by sending requests:
48
48
49
49
``` bash
50
50
curl 127.0.0.1:8080/embed \
@@ -72,7 +72,7 @@ volume=$PWD/data
72
72
docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model --revision $revision
73
73
```
74
74
75
- Once you have deployed a model you can use the ` rerank ` endpoint to rank the similarity between a query and a list
75
+ Once you have deployed a model, you can use the ` rerank ` endpoint to rank the similarity between a query and a list
76
76
of texts:
77
77
78
78
``` bash
@@ -101,3 +101,23 @@ curl 127.0.0.1:8080/predict \
101
101
-d ' {"inputs":"I like you."}' \
102
102
-H ' Content-Type: application/json'
103
103
```
104
+
105
+ ## Batching
106
+
107
+ You can send multiple inputs in a batch. For example, for embeddings
108
+
109
+ ``` bash
110
+ curl 127.0.0.1:8080/embed \
111
+ -X POST \
112
+ -d ' {"inputs":["Today is a nice day", "I like you"]}' \
113
+ -H ' Content-Type: application/json'
114
+ ```
115
+
116
+ And for Sequence Classification:
117
+
118
+ ``` bash
119
+ curl 127.0.0.1:8080/predict \
120
+ -X POST \
121
+ -d ' {"inputs":[["I like you."], ["I hate pineapples"]]}' \
122
+ -H ' Content-Type: application/json'
123
+ ```
0 commit comments