Skip to content

Commit c86e678

Browse files
authored
[Docs] v1.66.0-stable fixes (#9953)
* add categories for spend tracking improvements * xai reasoning usage * docs tag management * docs tag based routing * [Beta] Routing based * docs tag based routing * docs tag routing * docs enterprise web search
1 parent eb998ee commit c86e678

File tree

9 files changed

+335
-13
lines changed

9 files changed

+335
-13
lines changed

docs/my-website/docs/providers/vertex.md

Lines changed: 96 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@ Return a `list[Recipe]`
347347
completion(model="vertex_ai/gemini-1.5-flash-preview-0514", messages=messages, response_format={ "type": "json_object" })
348348
```
349349

350-
### **Grounding**
350+
### **Grounding - Web Search**
351351

352352
Add Google Search Result grounding to vertex ai calls.
353353

@@ -358,7 +358,7 @@ See the grounding metadata with `response_obj._hidden_params["vertex_ai_groundin
358358
<Tabs>
359359
<TabItem value="sdk" label="SDK">
360360

361-
```python
361+
```python showLineNumbers
362362
from litellm import completion
363363

364364
## SETUP ENVIRONMENT
@@ -377,14 +377,36 @@ print(resp)
377377
</TabItem>
378378
<TabItem value="proxy" label="PROXY">
379379

380-
```bash
380+
<Tabs>
381+
<TabItem value="openai" label="OpenAI Python SDK">
382+
383+
```python showLineNumbers
384+
from openai import OpenAI
385+
386+
client = OpenAI(
387+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
388+
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
389+
)
390+
391+
response = client.chat.completions.create(
392+
model="gemini-pro",
393+
messages=[{"role": "user", "content": "Who won the world cup?"}],
394+
tools=[{"googleSearchRetrieval": {}}],
395+
)
396+
397+
print(response)
398+
```
399+
</TabItem>
400+
<TabItem value="curl" label="cURL">
401+
402+
```bash showLineNumbers
381403
curl http://localhost:4000/v1/chat/completions \
382404
-H "Content-Type: application/json" \
383405
-H "Authorization: Bearer sk-1234" \
384406
-d '{
385407
"model": "gemini-pro",
386408
"messages": [
387-
{"role": "user", "content": "Hello, Claude!"}
409+
{"role": "user", "content": "Who won the world cup?"}
388410
],
389411
"tools": [
390412
{
@@ -394,12 +416,82 @@ curl http://localhost:4000/v1/chat/completions \
394416
}'
395417

396418
```
419+
</TabItem>
420+
</Tabs>
397421

398422
</TabItem>
399423
</Tabs>
400424

401425
You can also use the `enterpriseWebSearch` tool for an [enterprise compliant search](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/web-grounding-enterprise).
402426

427+
<Tabs>
428+
<TabItem value="sdk" label="SDK">
429+
430+
```python showLineNumbers
431+
from litellm import completion
432+
433+
## SETUP ENVIRONMENT
434+
# !gcloud auth application-default login - run this to add vertex credentials to your env
435+
436+
tools = [{"enterpriseWebSearch": {}}] # 👈 ADD GOOGLE ENTERPRISE SEARCH
437+
438+
resp = litellm.completion(
439+
model="vertex_ai/gemini-1.0-pro-001",
440+
messages=[{"role": "user", "content": "Who won the world cup?"}],
441+
tools=tools,
442+
)
443+
444+
print(resp)
445+
```
446+
</TabItem>
447+
<TabItem value="proxy" label="PROXY">
448+
449+
<Tabs>
450+
<TabItem value="openai" label="OpenAI Python SDK">
451+
452+
```python showLineNumbers
453+
from openai import OpenAI
454+
455+
client = OpenAI(
456+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
457+
base_url="http://0.0.0.0:4000/v1/" # point to litellm proxy
458+
)
459+
460+
response = client.chat.completions.create(
461+
model="gemini-pro",
462+
messages=[{"role": "user", "content": "Who won the world cup?"}],
463+
tools=[{"enterpriseWebSearch": {}}],
464+
)
465+
466+
print(response)
467+
```
468+
</TabItem>
469+
<TabItem value="curl" label="cURL">
470+
471+
```bash showLineNumbers
472+
curl http://localhost:4000/v1/chat/completions \
473+
-H "Content-Type: application/json" \
474+
-H "Authorization: Bearer sk-1234" \
475+
-d '{
476+
"model": "gemini-pro",
477+
"messages": [
478+
{"role": "user", "content": "Who won the world cup?"}
479+
],
480+
"tools": [
481+
{
482+
"enterpriseWebSearch": {}
483+
}
484+
]
485+
}'
486+
487+
```
488+
</TabItem>
489+
</Tabs>
490+
491+
</TabItem>
492+
</Tabs>
493+
494+
403495
#### **Moving from Vertex AI SDK to LiteLLM (GROUNDING)**
404496

405497

docs/my-website/docs/providers/xai.md

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,3 +176,81 @@ Here's how to call a XAI model with the LiteLLM Proxy Server
176176
</Tabs>
177177

178178

179+
## Reasoning Usage
180+
181+
LiteLLM supports reasoning usage for xAI models.
182+
183+
<Tabs>
184+
185+
<TabItem value="python" label="LiteLLM Python SDK">
186+
187+
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
188+
import litellm
189+
response = litellm.completion(
190+
model="xai/grok-3-mini-beta",
191+
messages=[{"role": "user", "content": "What is 101*3?"}],
192+
reasoning_effort="low",
193+
)
194+
195+
print("Reasoning Content:")
196+
print(response.choices[0].message.reasoning_content)
197+
198+
print("\nFinal Response:")
199+
print(completion.choices[0].message.content)
200+
201+
print("\nNumber of completion tokens (input):")
202+
print(completion.usage.completion_tokens)
203+
204+
print("\nNumber of reasoning tokens (input):")
205+
print(completion.usage.completion_tokens_details.reasoning_tokens)
206+
```
207+
</TabItem>
208+
209+
<TabItem value="curl" label="LiteLLM Proxy - OpenAI SDK Usage">
210+
211+
```python showLineNumbers title="reasoning with xai/grok-3-mini-beta"
212+
import openai
213+
client = openai.OpenAI(
214+
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
215+
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
216+
)
217+
218+
response = client.chat.completions.create(
219+
model="xai/grok-3-mini-beta",
220+
messages=[{"role": "user", "content": "What is 101*3?"}],
221+
reasoning_effort="low",
222+
)
223+
224+
print("Reasoning Content:")
225+
print(response.choices[0].message.reasoning_content)
226+
227+
print("\nFinal Response:")
228+
print(completion.choices[0].message.content)
229+
230+
print("\nNumber of completion tokens (input):")
231+
print(completion.usage.completion_tokens)
232+
233+
print("\nNumber of reasoning tokens (input):")
234+
print(completion.usage.completion_tokens_details.reasoning_tokens)
235+
```
236+
237+
</TabItem>
238+
</Tabs>
239+
240+
**Example Response:**
241+
242+
```shell
243+
Reasoning Content:
244+
Let me calculate 101 multiplied by 3:
245+
101 * 3 = 303.
246+
I can double-check that: 100 * 3 is 300, and 1 * 3 is 3, so 300 + 3 = 303. Yes, that's correct.
247+
248+
Final Response:
249+
The result of 101 multiplied by 3 is 303.
250+
251+
Number of completion tokens (input):
252+
14
253+
254+
Number of reasoning tokens (input):
255+
310
256+
```
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
import Image from '@theme/IdealImage';
2+
import Tabs from '@theme/Tabs';
3+
import TabItem from '@theme/TabItem';
4+
5+
# [Beta] Routing based on request metadata
6+
7+
Create routing rules based on request metadata.
8+
9+
## Setup
10+
11+
Add the following to your litellm proxy config yaml file.
12+
13+
```yaml showLineNumbers title="litellm proxy config.yaml"
14+
router_settings:
15+
enable_tag_filtering: True # 👈 Key Change
16+
```
17+
18+
## 1. Create a tag
19+
20+
On the LiteLLM UI, navigate to Experimental > Tag Management > Create Tag.
21+
22+
Create a tag called `private-data` and only select the allowed models for requests with this tag. Once created, you will see the tag in the Tag Management page.
23+
24+
<Image img={require('../../img/tag_create.png')} style={{ width: '800px', height: 'auto' }} />
25+
26+
27+
## 2. Test Tag Routing
28+
29+
Now we will test the tag based routing rules.
30+
31+
### 2.1 Invalid model
32+
33+
This request will fail since we send `tags=private-data` but the model `gpt-4o` is not in the allowed models for the `private-data` tag.
34+
35+
<Image img={require('../../img/tag_invalid.png')} style={{ width: '800px', height: 'auto' }} />
36+
37+
<br />
38+
39+
Here is an example sending the same request using the OpenAI Python SDK.
40+
<Tabs>
41+
<TabItem value="python" label="OpenAI Python SDK">
42+
43+
```python showLineNumbers
44+
from openai import OpenAI
45+
46+
client = OpenAI(
47+
api_key="sk-1234",
48+
base_url="http://0.0.0.0:4000/v1/"
49+
)
50+
51+
response = client.chat.completions.create(
52+
model="gpt-4o",
53+
messages=[
54+
{"role": "user", "content": "Hello, how are you?"}
55+
],
56+
extra_body={
57+
"tags": "private-data"
58+
}
59+
)
60+
```
61+
62+
</TabItem>
63+
<TabItem value="curl" label="cURL">
64+
65+
```bash
66+
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
67+
-H 'Content-Type: application/json' \
68+
-H 'Authorization: Bearer sk-1234' \
69+
-d '{
70+
"model": "gpt-4o",
71+
"messages": [
72+
{
73+
"role": "user",
74+
"content": "Hello, how are you?"
75+
}
76+
],
77+
"tags": "private-data"
78+
}'
79+
```
80+
81+
</TabItem>
82+
</Tabs>
83+
84+
<br />
85+
86+
### 2.2 Valid model
87+
88+
This request will succeed since we send `tags=private-data` and the model `us.anthropic.claude-3-7-sonnet-20250219-v1:0` is in the allowed models for the `private-data` tag.
89+
90+
<Image img={require('../../img/tag_valid.png')} style={{ width: '800px', height: 'auto' }} />
91+
92+
Here is an example sending the same request using the OpenAI Python SDK.
93+
94+
<Tabs>
95+
<TabItem value="python" label="OpenAI Python SDK">
96+
97+
```python showLineNumbers
98+
from openai import OpenAI
99+
100+
client = OpenAI(
101+
api_key="sk-1234",
102+
base_url="http://0.0.0.0:4000/v1/"
103+
)
104+
105+
response = client.chat.completions.create(
106+
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
107+
messages=[
108+
{"role": "user", "content": "Hello, how are you?"}
109+
],
110+
extra_body={
111+
"tags": "private-data"
112+
}
113+
)
114+
```
115+
116+
</TabItem>
117+
<TabItem value="curl" label="cURL">
118+
119+
```bash
120+
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
121+
-H 'Content-Type: application/json' \
122+
-H 'Authorization: Bearer sk-1234' \
123+
-d '{
124+
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
125+
"messages": [
126+
{
127+
"role": "user",
128+
"content": "Hello, how are you?"
129+
}
130+
],
131+
"tags": "private-data"
132+
}'
133+
```
134+
135+
</TabItem>
136+
</Tabs>
137+
138+
139+
140+
## Additional Tag Features
141+
- [Sending tags in request headers](https://docs.litellm.ai/docs/proxy/tag_routing#calling-via-request-header)
142+
- [Tag based routing](https://docs.litellm.ai/docs/proxy/tag_routing)
143+
- [Track spend per tag](cost_tracking#-custom-tags)
144+
- [Setup Budgets per Virtual Key, Team](users)
145+

docs/my-website/img/tag_create.png

250 KB
Loading

docs/my-website/img/tag_invalid.png

237 KB
Loading

docs/my-website/img/tag_valid.png

319 KB
Loading

0 commit comments

Comments
 (0)