@@ -3,18 +3,17 @@ import TabItem from '@theme/TabItem';
3
3
4
4
# Infinity
5
5
6
- | Property | Details |
7
- | -------| -------|
8
- | Description | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip|
9
- | Provider Route on LiteLLM | ` infinity/ ` |
10
- | Supported Operations | ` /rerank ` |
11
- | Link to Provider Doc | [ Infinity ↗] ( https://github.com/michaelfeil/infinity ) |
12
-
6
+ | Property | Details |
7
+ | ------------------------- | ---------------------------------------------------------------------------------------------------------- |
8
+ | Description | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip |
9
+ | Provider Route on LiteLLM | ` infinity/ ` |
10
+ | Supported Operations | ` /rerank ` , ` /embeddings ` |
11
+ | Link to Provider Doc | [ Infinity ↗] ( https://github.com/michaelfeil/infinity ) |
13
12
14
13
## ** Usage - LiteLLM Python SDK**
15
14
16
15
``` python
17
- from litellm import rerank
16
+ from litellm import rerank, embedding
18
17
import os
19
18
20
19
os.environ[" INFINITY_API_BASE" ] = " http://localhost:8080"
@@ -39,8 +38,8 @@ model_list:
39
38
- model_name : custom-infinity-rerank
40
39
litellm_params :
41
40
model : infinity/rerank
42
- api_key : os.environ/INFINITY_API_KEY
43
41
api_base : https://localhost:8080
42
+ api_key : os.environ/INFINITY_API_KEY
44
43
` ` `
45
44
46
45
Start litellm
@@ -51,7 +50,9 @@ litellm --config /path/to/config.yaml
51
50
# RUNNING on http://0.0.0.0:4000
52
51
```
53
52
54
- Test request
53
+ ## Test request:
54
+
55
+ ### Rerank
55
56
56
57
``` bash
57
58
curl http://0.0.0.0:4000/rerank \
@@ -70,15 +71,14 @@ curl http://0.0.0.0:4000/rerank \
70
71
}'
71
72
```
72
73
74
+ #### Supported Cohere Rerank API Params
73
75
74
- ## Supported Cohere Rerank API Params
75
-
76
- | Param | Type | Description |
77
- | -------| -------| -------|
78
- | ` query ` | ` str ` | The query to rerank the documents against |
79
- | ` documents ` | ` list[str] ` | The documents to rerank |
80
- | ` top_n ` | ` int ` | The number of documents to return |
81
- | ` return_documents ` | ` bool ` | Whether to return the documents in the response |
76
+ | Param | Type | Description |
77
+ | ------------------ | ----------- | ----------------------------------------------- |
78
+ | ` query ` | ` str ` | The query to rerank the documents against |
79
+ | ` documents ` | ` list[str] ` | The documents to rerank |
80
+ | ` top_n ` | ` int ` | The number of documents to return |
81
+ | ` return_documents ` | ` bool ` | Whether to return the documents in the response |
82
82
83
83
### Usage - Return Documents
84
84
@@ -138,6 +138,7 @@ response = rerank(
138
138
raw_scores = True , # 👈 PROVIDER-SPECIFIC PARAM
139
139
)
140
140
```
141
+
141
142
</TabItem >
142
143
143
144
<TabItem value =" proxy " label =" PROXY " >
@@ -161,7 +162,7 @@ litellm --config /path/to/config.yaml
161
162
# RUNNING on http://0.0.0.0:4000
162
163
```
163
164
164
- 3 . Test it!
165
+ 3 . Test it!
165
166
166
167
``` bash
167
168
curl http://0.0.0.0:4000/rerank \
@@ -179,6 +180,121 @@ curl http://0.0.0.0:4000/rerank \
179
180
"raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
180
181
}'
181
182
```
183
+
182
184
</TabItem >
183
185
184
186
</Tabs >
187
+
188
+ ## Embeddings
189
+
190
+ LiteLLM provides an OpenAI api compatible ` /embeddings ` endpoint for embedding calls.
191
+
192
+ ** Setup**
193
+
194
+ Add this to your litellm proxy config.yaml
195
+
196
+ ``` yaml
197
+ model_list :
198
+ - model_name : custom-infinity-embedding
199
+ litellm_params :
200
+ model : infinity/provider/custom-embedding-v1
201
+ api_base : http://localhost:8080
202
+ api_key : os.environ/INFINITY_API_KEY
203
+ ` ` `
204
+
205
+ ### Test request:
206
+
207
+ ` ` ` bash
208
+ curl http://0.0.0.0:4000/embeddings \
209
+ -H "Authorization : Bearer sk-1234" \
210
+ -H "Content-Type : application/json" \
211
+ -d '{
212
+ " model " : " custom-infinity-embedding" ,
213
+ " input " : ["hello"]
214
+ }'
215
+ ```
216
+
217
+ #### Supported Embedding API Params
218
+
219
+ | Param | Type | Description |
220
+ | ----------------- | ----------- | ----------------------------------------------------------- |
221
+ | ` model ` | ` str ` | The embedding model to use |
222
+ | ` input ` | ` list[str] ` | The text inputs to generate embeddings for |
223
+ | ` encoding_format ` | ` str ` | The format to return embeddings in (e.g. "float", "base64") |
224
+ | ` modality ` | ` str ` | The type of input (e.g. "text", "image", "audio") |
225
+
226
+ ### Usage - Basic Examples
227
+
228
+ <Tabs >
229
+ <TabItem value =" sdk " label =" SDK " >
230
+
231
+ ``` python
232
+ from litellm import embedding
233
+ import os
234
+
235
+ os.environ[" INFINITY_API_BASE" ] = " http://localhost:8080"
236
+
237
+ response = embedding(
238
+ model = " infinity/bge-small" ,
239
+ input = [" good morning from litellm" ]
240
+ )
241
+
242
+ print (response.data[0 ][' embedding' ])
243
+ ```
244
+
245
+ </TabItem >
246
+
247
+ <TabItem value =" proxy " label =" PROXY " >
248
+
249
+ ``` bash
250
+ curl http://0.0.0.0:4000/embeddings \
251
+ -H " Authorization: Bearer sk-1234" \
252
+ -H " Content-Type: application/json" \
253
+ -d ' {
254
+ "model": "custom-infinity-embedding",
255
+ "input": ["hello"]
256
+ }'
257
+ ```
258
+
259
+ </TabItem >
260
+ </Tabs >
261
+
262
+ ### Usage - OpenAI Client
263
+
264
+ <Tabs >
265
+ <TabItem value =" sdk " label =" SDK " >
266
+
267
+ ``` python
268
+ from openai import OpenAI
269
+
270
+ client = OpenAI(
271
+ api_key = " <LITELLM_MASTER_KEY>" ,
272
+ base_url = " <LITELLM_URL>"
273
+ )
274
+
275
+ response = client.embeddings.create(
276
+ model = " bge-small" ,
277
+ input = [" The food was delicious and the waiter..." ],
278
+ encoding_format = " float"
279
+ )
280
+
281
+ print (response.data[0 ].embedding)
282
+ ```
283
+
284
+ </TabItem >
285
+
286
+ <TabItem value =" proxy " label =" PROXY " >
287
+
288
+ ``` bash
289
+ curl http://0.0.0.0:4000/embeddings \
290
+ -H " Authorization: Bearer sk-1234" \
291
+ -H " Content-Type: application/json" \
292
+ -d ' {
293
+ "model": "bge-small",
294
+ "input": ["The food was delicious and the waiter..."],
295
+ "encoding_format": "float"
296
+ }'
297
+ ```
298
+
299
+ </TabItem >
300
+ </Tabs >
0 commit comments