Per Provider HTTP Request Customisation #779
Replies: 2 comments 1 reply
-
I'm currently using a custom proxy server between Roo Code and an in-house LLM API. Unfortunately it's somewhat unreliable and I wish Roo Code was more flexible so I wouldn't need this, but here it is: in_house_handler.py# /// script
# requires-python = ">=3.9"
# dependencies = [
# "apscheduler",
# "backoff",
# "cryptography",
# "fastapi-sso",
# "litellm",
# "litellm-proxy-extras",
# "orjson",
# "python-multipart",
# "uvicorn",
# ]
# ///
"""This module defines a custom LiteLLM handler for the in-house GPT.
You can either pass this to the ``litellm`` command line
if you have the custom LiteLLM in-house configuration in a YAML file::
litellm --config=litellm-config.yaml
Or you can just run this as a stand-alone script using ``uv run``::
uv run in_house_handler.py
This will create the LiteLLM in-house configuration YAML file on the fly,
install all dependencies in a temporary virtual environment,
and invoke the LiteLLM proxy with the correct configuration.
"""
import os
from collections.abc import Iterator, AsyncIterator
from pathlib import Path
from pprint import pformat
from tempfile import TemporaryFile
from textwrap import dedent
import click
import httpx
from litellm import CustomLLM, ModelResponse, run_server
from litellm.types.utils import GenericStreamingChunk
class InvalidResponse(Exception):
def __init__(self, response_data):
self._response_data = response_data
def __str__(self):
return f"\n{pformat(self._response_data)}"
def convert_message_content(content):
if isinstance(content, str):
return content
if isinstance(content, dict):
return content["text"]
if isinstance(content, list):
return "\n".join(convert_message_content(item) for item in content)
message = f"Unknown message content structure {content}"
raise ValueError(message)
def convert_messages(messages):
return [
{
"role": message["role"],
"content": convert_message_content(message["content"]),
}
for message in messages
]
class InHouseGPT(CustomLLM):
def __init__(self) -> None:
super().__init__()
self.api_key = os.getenv("IN_HOUSE_GPT_API_KEY")
if not self.api_key:
msg = "IN_HOUSE_GPT_API_KEY environment variable is required"
raise ValueError(msg)
self.api_base = "https://api.in_house.com/int/in-house-gpt"
def _make_request(
self,
messages: list,
model: str = "claude-35",
) -> dict:
headers = {
"Authentication-Token": self.api_key,
"Content-Type": "application/json",
}
filtered_messages = [m for m in messages if m["role"] != "system"]
model_name = model.split("/")[-1] if "/" in model else model
model_name = model_name.replace("i-", "")
data = {
"messages": convert_messages(filtered_messages),
"stream": False,
"model": model_name,
}
with httpx.Client() as client:
response = client.post(
f"{self.api_base}/chat/completions",
headers=headers,
json=data,
timeout=30.0,
)
return response.json()
def completion(
self,
model: str,
messages: list,
**kwargs: dict,
) -> ModelResponse:
response_data = self._make_request(messages, model=model)
try:
return ModelResponse(
id=response_data["id"],
choices=response_data["choices"],
model=model,
usage=response_data["usage"],
)
except KeyError as exc_info:
raise InvalidResponse(response_data)
def streaming(
self,
model: str,
messages: list,
**kwargs: dict,
) -> Iterator[GenericStreamingChunk]:
response_data = self._make_request(messages, model=model)
content = response_data["choices"][0]["message"]["content"]
yield {
"finish_reason": None,
"index": 0,
"is_finished": False,
"text": content,
"tool_use": None,
"usage": None,
}
yield {
"finish_reason": "stop",
"index": 1,
"is_finished": True,
"text": "",
"tool_use": None,
"usage": response_data["usage"],
}
async def acompletion(
self,
*args: tuple,
**kwargs: dict,
) -> ModelResponse:
return self.completion(*args, **kwargs)
async def astreaming(
self,
*args: tuple,
**kwargs: dict,
) -> AsyncIterator[GenericStreamingChunk]:
for chunk in self.streaming(*args, **kwargs):
yield chunk
in_house_llm = InHouseGPT()
@click.command
@click.pass_context
def main(ctx):
with TemporaryFile(
mode="w", dir=Path(__file__).parent, suffix=".yaml", delete_on_close=False
) as config_yaml:
config_yaml.write(
dedent(
"""
model_list:
# Adding i- to the model name to avoid conflicts with other models
- model_name: GPT-4 (IGPT)
litellm_params:
model: in_house/i-gpt-4
- model_name: GPT-4o (IGPT)
litellm_params:
model: in_house/i-gpt-4o
- model_name: GPT-4o-mini (IGPT)
litellm_params:
model: in_house/i-gpt-4o-mini
- model_name: Llama-3 (IGPT)
litellm_params:
model: in_house/i-llama-3
- model_name: Claude-3.5 (IGPT)
litellm_params:
model: in_house/i-claude-35
litellm_settings:
provider_list: ["in_house"]
custom_llm_providers:
- in_house
custom_provider_map:
- provider: in_house
custom_handler: in_house_handler.in_house_llm
"""
)
)
config_yaml.close()
ctx.invoke(run_server, config=config_yaml.name)
if __name__ == "__main__":
main() |
Beta Was this translation helpful? Give feedback.
-
My company requires using an internal custom LLM proxy to contact cloud LLM apis. This allows for centralized budgeting and also scrubbing customer data from requests. That proxy requires adding custom HTTP headers for identification purposes. I would like a way to provide a set of extra headers with the OpenAI Compatible option to send along with all requests. FYI, this was recently implemented in Cline so we can start using it internally. cline#1136 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to be able to have a toggle section (defaults to off) per provider for customising the HTTP request.
We have custom base url, I'd like to see that enhanced to make it read "Customise HTTP Request" and the custom base url be one of the items we can adjust.
Other items would be:
My use case is the support of custom AI Gateways / APIs / Proxies that don't fit the standard API Providers. Often in use by enterprises, but sometimes just, for example, running your own proxy that examines a request and makes choices on models, or parameter tuning etc based on context.
Essentially a generic way to support as many weird edge cases as possible, in a safe defaults, toggle-able fashion, on a per provider basis.
Beta Was this translation helpful? Give feedback.
All reactions