Skip to content

Commit 104e4cb

Browse files
[Feat] Add infinity embedding support (contributor pr) (#10196)
* Feature - infinity support for #8764 (#10009) * Added support for infinity embeddings * Added test cases * Fixed tests and api base * Updated docs and tests * Removed unused import * Updated signature * Added support for infinity embeddings * Added test cases * Fixed tests and api base * Updated docs and tests * Removed unused import * Updated signature * Updated validate params --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> * fix InfinityEmbeddingConfig --------- Co-authored-by: Prathamesh Saraf <pratamesh1867@gmail.com>
1 parent 0c2f705 commit 104e4cb

File tree

12 files changed

+529
-22
lines changed

12 files changed

+529
-22
lines changed

.env.example

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@ REPLICATE_API_TOKEN = ""
2020
ANTHROPIC_API_KEY = ""
2121
# Infisical
2222
INFISICAL_TOKEN = ""
23+
# INFINITY
24+
INFINITY_API_KEY = ""
2325

2426
# Development Configs
2527
LITELLM_MASTER_KEY = "sk-1234"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,4 +86,5 @@ litellm/proxy/db/migrations/0_init/migration.sql
8686
litellm/proxy/db/migrations/*
8787
litellm/proxy/migrations/*config.yaml
8888
litellm/proxy/migrations/*
89+
config.yaml
8990
tests/litellm/litellm_core_utils/llm_cost_calc/log.txt

docs/my-website/docs/providers/infinity.md

Lines changed: 135 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,17 @@ import TabItem from '@theme/TabItem';
33

44
# Infinity
55

6-
| Property | Details |
7-
|-------|-------|
8-
| Description | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip|
9-
| Provider Route on LiteLLM | `infinity/` |
10-
| Supported Operations | `/rerank` |
11-
| Link to Provider Doc | [Infinity ↗](https://github.com/michaelfeil/infinity) |
12-
6+
| Property | Details |
7+
| ------------------------- | ---------------------------------------------------------------------------------------------------------- |
8+
| Description | Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip |
9+
| Provider Route on LiteLLM | `infinity/` |
10+
| Supported Operations | `/rerank`, `/embeddings` |
11+
| Link to Provider Doc | [Infinity ↗](https://github.com/michaelfeil/infinity) |
1312

1413
## **Usage - LiteLLM Python SDK**
1514

1615
```python
17-
from litellm import rerank
16+
from litellm import rerank, embedding
1817
import os
1918

2019
os.environ["INFINITY_API_BASE"] = "http://localhost:8080"
@@ -39,8 +38,8 @@ model_list:
3938
- model_name: custom-infinity-rerank
4039
litellm_params:
4140
model: infinity/rerank
42-
api_key: os.environ/INFINITY_API_KEY
4341
api_base: https://localhost:8080
42+
api_key: os.environ/INFINITY_API_KEY
4443
```
4544
4645
Start litellm
@@ -51,7 +50,9 @@ litellm --config /path/to/config.yaml
5150
# RUNNING on http://0.0.0.0:4000
5251
```
5352

54-
Test request
53+
## Test request:
54+
55+
### Rerank
5556

5657
```bash
5758
curl http://0.0.0.0:4000/rerank \
@@ -70,15 +71,14 @@ curl http://0.0.0.0:4000/rerank \
7071
}'
7172
```
7273

74+
#### Supported Cohere Rerank API Params
7375

74-
## Supported Cohere Rerank API Params
75-
76-
| Param | Type | Description |
77-
|-------|-------|-------|
78-
| `query` | `str` | The query to rerank the documents against |
79-
| `documents` | `list[str]` | The documents to rerank |
80-
| `top_n` | `int` | The number of documents to return |
81-
| `return_documents` | `bool` | Whether to return the documents in the response |
76+
| Param | Type | Description |
77+
| ------------------ | ----------- | ----------------------------------------------- |
78+
| `query` | `str` | The query to rerank the documents against |
79+
| `documents` | `list[str]` | The documents to rerank |
80+
| `top_n` | `int` | The number of documents to return |
81+
| `return_documents` | `bool` | Whether to return the documents in the response |
8282

8383
### Usage - Return Documents
8484

@@ -138,6 +138,7 @@ response = rerank(
138138
raw_scores=True, # 👈 PROVIDER-SPECIFIC PARAM
139139
)
140140
```
141+
141142
</TabItem>
142143

143144
<TabItem value="proxy" label="PROXY">
@@ -161,7 +162,7 @@ litellm --config /path/to/config.yaml
161162
# RUNNING on http://0.0.0.0:4000
162163
```
163164

164-
3. Test it!
165+
3. Test it!
165166

166167
```bash
167168
curl http://0.0.0.0:4000/rerank \
@@ -179,6 +180,121 @@ curl http://0.0.0.0:4000/rerank \
179180
"raw_scores": True # 👈 PROVIDER-SPECIFIC PARAM
180181
}'
181182
```
183+
182184
</TabItem>
183185

184186
</Tabs>
187+
188+
## Embeddings
189+
190+
LiteLLM provides an OpenAI api compatible `/embeddings` endpoint for embedding calls.
191+
192+
**Setup**
193+
194+
Add this to your litellm proxy config.yaml
195+
196+
```yaml
197+
model_list:
198+
- model_name: custom-infinity-embedding
199+
litellm_params:
200+
model: infinity/provider/custom-embedding-v1
201+
api_base: http://localhost:8080
202+
api_key: os.environ/INFINITY_API_KEY
203+
```
204+
205+
### Test request:
206+
207+
```bash
208+
curl http://0.0.0.0:4000/embeddings \
209+
-H "Authorization: Bearer sk-1234" \
210+
-H "Content-Type: application/json" \
211+
-d '{
212+
"model": "custom-infinity-embedding",
213+
"input": ["hello"]
214+
}'
215+
```
216+
217+
#### Supported Embedding API Params
218+
219+
| Param | Type | Description |
220+
| ----------------- | ----------- | ----------------------------------------------------------- |
221+
| `model` | `str` | The embedding model to use |
222+
| `input` | `list[str]` | The text inputs to generate embeddings for |
223+
| `encoding_format` | `str` | The format to return embeddings in (e.g. "float", "base64") |
224+
| `modality` | `str` | The type of input (e.g. "text", "image", "audio") |
225+
226+
### Usage - Basic Examples
227+
228+
<Tabs>
229+
<TabItem value="sdk" label="SDK">
230+
231+
```python
232+
from litellm import embedding
233+
import os
234+
235+
os.environ["INFINITY_API_BASE"] = "http://localhost:8080"
236+
237+
response = embedding(
238+
model="infinity/bge-small",
239+
input=["good morning from litellm"]
240+
)
241+
242+
print(response.data[0]['embedding'])
243+
```
244+
245+
</TabItem>
246+
247+
<TabItem value="proxy" label="PROXY">
248+
249+
```bash
250+
curl http://0.0.0.0:4000/embeddings \
251+
-H "Authorization: Bearer sk-1234" \
252+
-H "Content-Type: application/json" \
253+
-d '{
254+
"model": "custom-infinity-embedding",
255+
"input": ["hello"]
256+
}'
257+
```
258+
259+
</TabItem>
260+
</Tabs>
261+
262+
### Usage - OpenAI Client
263+
264+
<Tabs>
265+
<TabItem value="sdk" label="SDK">
266+
267+
```python
268+
from openai import OpenAI
269+
270+
client = OpenAI(
271+
api_key="<LITELLM_MASTER_KEY>",
272+
base_url="<LITELLM_URL>"
273+
)
274+
275+
response = client.embeddings.create(
276+
model="bge-small",
277+
input=["The food was delicious and the waiter..."],
278+
encoding_format="float"
279+
)
280+
281+
print(response.data[0].embedding)
282+
```
283+
284+
</TabItem>
285+
286+
<TabItem value="proxy" label="PROXY">
287+
288+
```bash
289+
curl http://0.0.0.0:4000/embeddings \
290+
-H "Authorization: Bearer sk-1234" \
291+
-H "Content-Type: application/json" \
292+
-d '{
293+
"model": "bge-small",
294+
"input": ["The food was delicious and the waiter..."],
295+
"encoding_format": "float"
296+
}'
297+
```
298+
299+
</TabItem>
300+
</Tabs>

litellm/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,7 @@ def identify(event_details):
415415
azure_ai_models: List = []
416416
jina_ai_models: List = []
417417
voyage_models: List = []
418+
infinity_models: List = []
418419
databricks_models: List = []
419420
cloudflare_models: List = []
420421
codestral_models: List = []
@@ -556,6 +557,8 @@ def add_known_models():
556557
azure_ai_models.append(key)
557558
elif value.get("litellm_provider") == "voyage":
558559
voyage_models.append(key)
560+
elif value.get("litellm_provider") == "infinity":
561+
infinity_models.append(key)
559562
elif value.get("litellm_provider") == "databricks":
560563
databricks_models.append(key)
561564
elif value.get("litellm_provider") == "cloudflare":
@@ -644,6 +647,7 @@ def add_known_models():
644647
+ deepseek_models
645648
+ azure_ai_models
646649
+ voyage_models
650+
+ infinity_models
647651
+ databricks_models
648652
+ cloudflare_models
649653
+ codestral_models
@@ -699,6 +703,7 @@ def add_known_models():
699703
"mistral": mistral_chat_models,
700704
"azure_ai": azure_ai_models,
701705
"voyage": voyage_models,
706+
"infinity": infinity_models,
702707
"databricks": databricks_models,
703708
"cloudflare": cloudflare_models,
704709
"codestral": codestral_models,
@@ -946,6 +951,7 @@ def add_known_models():
946951
from litellm.llms.openai.completion.transformation import OpenAITextCompletionConfig
947952
from .llms.groq.chat.transformation import GroqChatConfig
948953
from .llms.voyage.embedding.transformation import VoyageEmbeddingConfig
954+
from .llms.infinity.embedding.transformation import InfinityEmbeddingConfig
949955
from .llms.azure_ai.chat.transformation import AzureAIStudioConfig
950956
from .llms.mistral.mistral_chat_transformation import MistralConfig
951957
from .llms.openai.responses.transformation import OpenAIResponsesAPIConfig

litellm/litellm_core_utils/get_supported_openai_params.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,8 @@ def get_supported_openai_params( # noqa: PLR0915
221221
return litellm.PredibaseConfig().get_supported_openai_params(model=model)
222222
elif custom_llm_provider == "voyage":
223223
return litellm.VoyageEmbeddingConfig().get_supported_openai_params(model=model)
224+
elif custom_llm_provider == "infinity":
225+
return litellm.InfinityEmbeddingConfig().get_supported_openai_params(model=model)
224226
elif custom_llm_provider == "triton":
225227
if request_type == "embeddings":
226228
return litellm.TritonEmbeddingConfig().get_supported_openai_params(

litellm/llms/infinity/rerank/common_utils.py renamed to litellm/llms/infinity/common_utils.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
1+
from typing import Union
12
import httpx
23

34
from litellm.llms.base_llm.chat.transformation import BaseLLMException
45

56

67
class InfinityError(BaseLLMException):
7-
def __init__(self, status_code, message):
8+
def __init__(
9+
self,
10+
status_code: int,
11+
message: str,
12+
headers: Union[dict, httpx.Headers] = {}
13+
):
814
self.status_code = status_code
915
self.message = message
1016
self.request = httpx.Request(
@@ -16,4 +22,5 @@ def __init__(self, status_code, message):
1622
message=message,
1723
request=self.request,
1824
response=self.response,
25+
headers=headers,
1926
) # Call the base class constructor with the parameters it needs
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
"""
2+
Infinity Embedding - uses `llm_http_handler.py` to make httpx requests
3+
4+
Request/Response transformation is handled in `transformation.py`
5+
"""

0 commit comments

Comments
 (0)