Skip to content

Commit 3d5022b

Browse files
authored
[Feat] Support for all litellm providers on Responses API (works with Codex) - Anthropic, Bedrock API, VertexAI, Ollama (#10132)
* transform request * basic handler for LiteLLMCompletionTransformationHandler * complete transform litellm to responses api * fixes to test * fix stream=True * fix streaming iterator * fixes for transformation * fixes for anthropic codex support * fix pass response_api_optional_params * test anthropic responses api tools * update responses types * working codex with litellm * add session handler * fixes streaming iterator * fix handler * add litellm codex example * fix code quality * test fix * docs litellm codex * litellm codexdoc * docs openai codex with litellm * docs litellm openai codex * litellm codex * linting fixes for transforming responses API * fix import error * fix responses api test * add sync iterator support for responses api
1 parent 3e87ec4 commit 3d5022b

File tree

14 files changed

+1282
-53
lines changed

14 files changed

+1282
-53
lines changed
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
import Image from '@theme/IdealImage';
2+
import Tabs from '@theme/Tabs';
3+
import TabItem from '@theme/TabItem';
4+
5+
# Using LiteLLM with OpenAI Codex
6+
7+
This guide walks you through connecting OpenAI Codex to LiteLLM. Using LiteLLM with Codex allows teams to:
8+
- Access 100+ LLMs through the Codex interface
9+
- Use powerful models like Gemini through a familiar interface
10+
- Track spend and usage with LiteLLM's built-in analytics
11+
- Control model access with virtual keys
12+
13+
<Image img={require('../../img/litellm_codex.gif')} />
14+
15+
## Quickstart
16+
17+
Make sure to set up LiteLLM with the [LiteLLM Getting Started Guide](../proxy/docker_quick_start.md).
18+
19+
## 1. Install OpenAI Codex
20+
21+
Install the OpenAI Codex CLI tool globally using npm:
22+
23+
<Tabs>
24+
<TabItem value="npm" label="npm">
25+
26+
```bash showLineNumbers
27+
npm i -g @openai/codex
28+
```
29+
30+
</TabItem>
31+
<TabItem value="yarn" label="yarn">
32+
33+
```bash showLineNumbers
34+
yarn global add @openai/codex
35+
```
36+
37+
</TabItem>
38+
</Tabs>
39+
40+
## 2. Start LiteLLM Proxy
41+
42+
<Tabs>
43+
<TabItem value="docker" label="Docker">
44+
45+
```bash showLineNumbers
46+
docker run \
47+
-v $(pwd)/litellm_config.yaml:/app/config.yaml \
48+
-p 4000:4000 \
49+
ghcr.io/berriai/litellm:main-latest \
50+
--config /app/config.yaml
51+
```
52+
53+
</TabItem>
54+
<TabItem value="pip" label="LiteLLM CLI">
55+
56+
```bash showLineNumbers
57+
litellm --config /path/to/config.yaml
58+
```
59+
60+
</TabItem>
61+
</Tabs>
62+
63+
LiteLLM should now be running on [http://localhost:4000](http://localhost:4000)
64+
65+
## 3. Configure LiteLLM for Model Routing
66+
67+
Ensure your LiteLLM Proxy is properly configured to route to your desired models. Create a `litellm_config.yaml` file with the following content:
68+
69+
```yaml showLineNumbers
70+
model_list:
71+
- model_name: o3-mini
72+
litellm_params:
73+
model: openai/o3-mini
74+
api_key: os.environ/OPENAI_API_KEY
75+
- model_name: claude-3-7-sonnet-latest
76+
litellm_params:
77+
model: anthropic/claude-3-7-sonnet-latest
78+
api_key: os.environ/ANTHROPIC_API_KEY
79+
- model_name: gemini-2.0-flash
80+
litellm_params:
81+
model: gemini/gemini-2.0-flash
82+
api_key: os.environ/GEMINI_API_KEY
83+
84+
litellm_settings:
85+
drop_params: true
86+
```
87+
88+
This configuration enables routing to specific OpenAI, Anthropic, and Gemini models with explicit names.
89+
90+
## 4. Configure Codex to Use LiteLLM Proxy
91+
92+
Set the required environment variables to point Codex to your LiteLLM Proxy:
93+
94+
```bash
95+
# Point to your LiteLLM Proxy server
96+
export OPENAI_BASE_URL=http://0.0.0.0:4000
97+
98+
# Use your LiteLLM API key (if you've set up authentication)
99+
export OPENAI_API_KEY="sk-1234"
100+
```
101+
102+
## 5. Run Codex with Gemini
103+
104+
With everything configured, you can now run Codex with Gemini:
105+
106+
```bash showLineNumbers
107+
codex --model gemini-flash --full-auto
108+
```
109+
110+
<Image img={require('../../img/litellm_codex.gif')} />
111+
112+
The `--full-auto` flag allows Codex to automatically generate code without additional prompting.
113+
114+
## 6. Advanced Options
115+
116+
### Using Different Models
117+
118+
You can use any model configured in your LiteLLM proxy:
119+
120+
```bash
121+
# Use Claude models
122+
codex --model claude-3-7-sonnet-latest
123+
124+
# Use Google AI Studio Gemini models
125+
codex --model gemini/gemini-2.0-flash
126+
```
127+
128+
## Troubleshooting
129+
130+
- If you encounter connection issues, ensure your LiteLLM Proxy is running and accessible at the specified URL
131+
- Verify your LiteLLM API key is valid if you're using authentication
132+
- Check that your model routing configuration is correct
133+
- For model-specific errors, ensure the model is properly configured in your LiteLLM setup
134+
135+
## Additional Resources
136+
137+
- [LiteLLM Docker Quick Start Guide](../proxy/docker_quick_start.md)
138+
- [OpenAI Codex GitHub Repository](https://github.com/openai/codex)
139+
- [LiteLLM Virtual Keys and Authentication](../proxy/virtual_keys.md)

docs/my-website/img/litellm_codex.gif

12 MB
Loading

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -443,6 +443,7 @@ const sidebars = {
443443
label: "Tutorials",
444444
items: [
445445
"tutorials/openweb_ui",
446+
"tutorials/openai_codex",
446447
"tutorials/msft_sso",
447448
"tutorials/prompt_caching",
448449
"tutorials/tag_management",

litellm/proxy/proxy_config.yaml

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
model_list:
2-
- model_name: fake-openai-endpoint
2+
- model_name: openai/*
33
litellm_params:
4-
model: openai/fake
5-
api_key: fake-key
6-
api_base: https://exampleopenaiendpoint-production.up.railway.app/
4+
model: openai/*
5+
- model_name: anthropic/*
6+
litellm_params:
7+
model: anthropic/*
8+
- model_name: gemini/*
9+
litellm_params:
10+
model: gemini/*
11+
litellm_settings:
12+
drop_params: true
13+
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
"""
2+
Handler for transforming responses api requests to litellm.completion requests
3+
"""
4+
5+
from typing import Any, Coroutine, Optional, Union
6+
7+
import litellm
8+
from litellm.responses.litellm_completion_transformation.streaming_iterator import (
9+
LiteLLMCompletionStreamingIterator,
10+
)
11+
from litellm.responses.litellm_completion_transformation.transformation import (
12+
LiteLLMCompletionResponsesConfig,
13+
)
14+
from litellm.responses.streaming_iterator import BaseResponsesAPIStreamingIterator
15+
from litellm.types.llms.openai import (
16+
ResponseInputParam,
17+
ResponsesAPIOptionalRequestParams,
18+
ResponsesAPIResponse,
19+
)
20+
from litellm.types.utils import ModelResponse
21+
22+
23+
class LiteLLMCompletionTransformationHandler:
24+
25+
def response_api_handler(
26+
self,
27+
model: str,
28+
input: Union[str, ResponseInputParam],
29+
responses_api_request: ResponsesAPIOptionalRequestParams,
30+
custom_llm_provider: Optional[str] = None,
31+
_is_async: bool = False,
32+
stream: Optional[bool] = None,
33+
**kwargs,
34+
) -> Union[
35+
ResponsesAPIResponse,
36+
BaseResponsesAPIStreamingIterator,
37+
Coroutine[
38+
Any, Any, Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]
39+
],
40+
]:
41+
litellm_completion_request: dict = (
42+
LiteLLMCompletionResponsesConfig.transform_responses_api_request_to_chat_completion_request(
43+
model=model,
44+
input=input,
45+
responses_api_request=responses_api_request,
46+
custom_llm_provider=custom_llm_provider,
47+
stream=stream,
48+
**kwargs,
49+
)
50+
)
51+
52+
if _is_async:
53+
return self.async_response_api_handler(
54+
litellm_completion_request=litellm_completion_request,
55+
request_input=input,
56+
responses_api_request=responses_api_request,
57+
**kwargs,
58+
)
59+
60+
litellm_completion_response: Union[
61+
ModelResponse, litellm.CustomStreamWrapper
62+
] = litellm.completion(
63+
**litellm_completion_request,
64+
**kwargs,
65+
)
66+
67+
if isinstance(litellm_completion_response, ModelResponse):
68+
responses_api_response: ResponsesAPIResponse = (
69+
LiteLLMCompletionResponsesConfig.transform_chat_completion_response_to_responses_api_response(
70+
chat_completion_response=litellm_completion_response,
71+
request_input=input,
72+
responses_api_request=responses_api_request,
73+
)
74+
)
75+
76+
return responses_api_response
77+
78+
elif isinstance(litellm_completion_response, litellm.CustomStreamWrapper):
79+
return LiteLLMCompletionStreamingIterator(
80+
litellm_custom_stream_wrapper=litellm_completion_response,
81+
request_input=input,
82+
responses_api_request=responses_api_request,
83+
)
84+
85+
async def async_response_api_handler(
86+
self,
87+
litellm_completion_request: dict,
88+
request_input: Union[str, ResponseInputParam],
89+
responses_api_request: ResponsesAPIOptionalRequestParams,
90+
**kwargs,
91+
) -> Union[ResponsesAPIResponse, BaseResponsesAPIStreamingIterator]:
92+
litellm_completion_response: Union[
93+
ModelResponse, litellm.CustomStreamWrapper
94+
] = await litellm.acompletion(
95+
**litellm_completion_request,
96+
**kwargs,
97+
)
98+
99+
if isinstance(litellm_completion_response, ModelResponse):
100+
responses_api_response: ResponsesAPIResponse = (
101+
LiteLLMCompletionResponsesConfig.transform_chat_completion_response_to_responses_api_response(
102+
chat_completion_response=litellm_completion_response,
103+
request_input=request_input,
104+
responses_api_request=responses_api_request,
105+
)
106+
)
107+
108+
return responses_api_response
109+
110+
elif isinstance(litellm_completion_response, litellm.CustomStreamWrapper):
111+
return LiteLLMCompletionStreamingIterator(
112+
litellm_custom_stream_wrapper=litellm_completion_response,
113+
request_input=request_input,
114+
responses_api_request=responses_api_request,
115+
)
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
"""
2+
Responses API has previous_response_id, which is the id of the previous response.
3+
4+
LiteLLM needs to maintain a cache of the previous response input, output, previous_response_id, and model.
5+
6+
This class handles that cache.
7+
"""
8+
9+
from typing import List, Optional, Tuple, Union
10+
11+
from typing_extensions import TypedDict
12+
13+
from litellm.caching import InMemoryCache
14+
from litellm.types.llms.openai import ResponseInputParam, ResponsesAPIResponse
15+
16+
RESPONSES_API_PREVIOUS_RESPONSES_CACHE = InMemoryCache()
17+
MAX_PREV_SESSION_INPUTS = 50
18+
19+
20+
class ResponsesAPISessionElement(TypedDict, total=False):
21+
input: Union[str, ResponseInputParam]
22+
output: ResponsesAPIResponse
23+
response_id: str
24+
previous_response_id: Optional[str]
25+
26+
27+
class SessionHandler:
28+
29+
def add_completed_response_to_cache(
30+
self, response_id: str, session_element: ResponsesAPISessionElement
31+
):
32+
RESPONSES_API_PREVIOUS_RESPONSES_CACHE.set_cache(
33+
key=response_id, value=session_element
34+
)
35+
36+
def get_chain_of_previous_input_output_pairs(
37+
self, previous_response_id: str
38+
) -> List[Tuple[ResponseInputParam, ResponsesAPIResponse]]:
39+
response_api_inputs: List[Tuple[ResponseInputParam, ResponsesAPIResponse]] = []
40+
current_previous_response_id = previous_response_id
41+
42+
count_session_elements = 0
43+
while current_previous_response_id:
44+
if count_session_elements > MAX_PREV_SESSION_INPUTS:
45+
break
46+
session_element = RESPONSES_API_PREVIOUS_RESPONSES_CACHE.get_cache(
47+
key=current_previous_response_id
48+
)
49+
if session_element:
50+
response_api_inputs.append(
51+
(session_element.get("input"), session_element.get("output"))
52+
)
53+
current_previous_response_id = session_element.get(
54+
"previous_response_id"
55+
)
56+
else:
57+
break
58+
count_session_elements += 1
59+
return response_api_inputs

0 commit comments

Comments
 (0)