Per Provider HTTP Request Customisation #779

Zate · 2025-02-05T00:59:14Z

Zate
Feb 5, 2025

I'd like to be able to have a toggle section (defaults to off) per provider for customising the HTTP request.

We have custom base url, I'd like to see that enhanced to make it read "Customise HTTP Request" and the custom base url be one of the items we can adjust.

Other items would be:

HTTP Headers
- Allow an array of key:value pairs for custom http headers
Host:Port
URI Parameters
- For example, ${modelId} in the URI would be replaced with the selected model name

My use case is the support of custom AI Gateways / APIs / Proxies that don't fit the standard API Providers. Often in use by enterprises, but sometimes just, for example, running your own proxy that examines a request and makes choices on models, or parameter tuning etc based on context.

Essentially a generic way to support as many weird edge cases as possible, in a safe defaults, toggle-able fashion, on a per provider basis.

akaihola · 2025-04-20T08:57:27Z

akaihola
Apr 20, 2025

I'm currently using a custom proxy server between Roo Code and an in-house LLM API. Unfortunately it's somewhat unreliable and I wish Roo Code was more flexible so I wouldn't need this, but here it is:

in_house_handler.py

# /// script
# requires-python = ">=3.9"
# dependencies = [
#     "apscheduler",
#     "backoff",
#     "cryptography",
#     "fastapi-sso",
#     "litellm",
#     "litellm-proxy-extras",
#     "orjson",
#     "python-multipart",
#     "uvicorn",
# ]
# ///

"""This module defines a custom LiteLLM handler for the in-house GPT.

You can either pass this to the ``litellm`` command line
if you have the custom LiteLLM in-house configuration in a YAML file::

    litellm --config=litellm-config.yaml

Or you can just run this as a stand-alone script using ``uv run``::

    uv run in_house_handler.py

This will create the LiteLLM in-house configuration YAML file on the fly,
install all dependencies in a temporary virtual environment,
and invoke the LiteLLM proxy with the correct configuration.

"""

import os
from collections.abc import Iterator, AsyncIterator
from pathlib import Path
from pprint import pformat
from tempfile import TemporaryFile
from textwrap import dedent

import click
import httpx
from litellm import CustomLLM, ModelResponse, run_server
from litellm.types.utils import GenericStreamingChunk


class InvalidResponse(Exception):
    def __init__(self, response_data):
        self._response_data = response_data

    def __str__(self):
        return f"\n{pformat(self._response_data)}"


def convert_message_content(content):
    if isinstance(content, str):
        return content
    if isinstance(content, dict):
        return content["text"]
    if isinstance(content, list):
        return "\n".join(convert_message_content(item) for item in content)
    message = f"Unknown message content structure {content}"
    raise ValueError(message)


def convert_messages(messages):
    return [
        {
            "role": message["role"],
            "content": convert_message_content(message["content"]),
        }
        for message in messages
    ]


class InHouseGPT(CustomLLM):
    def __init__(self) -> None:
        super().__init__()
        self.api_key = os.getenv("IN_HOUSE_GPT_API_KEY")
        if not self.api_key:
            msg = "IN_HOUSE_GPT_API_KEY environment variable is required"
            raise ValueError(msg)
        self.api_base = "https://api.in_house.com/int/in-house-gpt"

    def _make_request(
        self,
        messages: list,
        model: str = "claude-35",
    ) -> dict:
        headers = {
            "Authentication-Token": self.api_key,
            "Content-Type": "application/json",
        }
        filtered_messages = [m for m in messages if m["role"] != "system"]
        model_name = model.split("/")[-1] if "/" in model else model
        model_name = model_name.replace("i-", "")
        data = {
            "messages": convert_messages(filtered_messages),
            "stream": False,
            "model": model_name,
        }
        with httpx.Client() as client:
            response = client.post(
                f"{self.api_base}/chat/completions",
                headers=headers,
                json=data,
                timeout=30.0,
            )
            return response.json()

    def completion(
        self,
        model: str,
        messages: list,
        **kwargs: dict,
    ) -> ModelResponse:
        response_data = self._make_request(messages, model=model)
        try:
            return ModelResponse(
                id=response_data["id"],
                choices=response_data["choices"],
                model=model,
                usage=response_data["usage"],
            )
        except KeyError as exc_info:
            raise InvalidResponse(response_data)

    def streaming(
        self,
        model: str,
        messages: list,
        **kwargs: dict,
    ) -> Iterator[GenericStreamingChunk]:
        response_data = self._make_request(messages, model=model)
        content = response_data["choices"][0]["message"]["content"]
        yield {
            "finish_reason": None,
            "index": 0,
            "is_finished": False,
            "text": content,
            "tool_use": None,
            "usage": None,
        }
        yield {
            "finish_reason": "stop",
            "index": 1,
            "is_finished": True,
            "text": "",
            "tool_use": None,
            "usage": response_data["usage"],
        }

    async def acompletion(
        self,
        *args: tuple,
        **kwargs: dict,
    ) -> ModelResponse:
        return self.completion(*args, **kwargs)

    async def astreaming(
        self,
        *args: tuple,
        **kwargs: dict,
    ) -> AsyncIterator[GenericStreamingChunk]:
        for chunk in self.streaming(*args, **kwargs):
            yield chunk


in_house_llm = InHouseGPT()


@click.command
@click.pass_context
def main(ctx):
    with TemporaryFile(
        mode="w", dir=Path(__file__).parent, suffix=".yaml", delete_on_close=False
    ) as config_yaml:
        config_yaml.write(
            dedent(
                """
                model_list:
                  # Adding i- to the model name to avoid conflicts with other models
                  - model_name: GPT-4 (IGPT)
                    litellm_params:
                      model: in_house/i-gpt-4
                  - model_name: GPT-4o (IGPT)
                    litellm_params:
                      model: in_house/i-gpt-4o
                  - model_name: GPT-4o-mini (IGPT)
                    litellm_params:
                      model: in_house/i-gpt-4o-mini
                  - model_name: Llama-3 (IGPT)
                    litellm_params:
                      model: in_house/i-llama-3
                  - model_name: Claude-3.5 (IGPT)
                    litellm_params:
                      model: in_house/i-claude-35

                litellm_settings:
                  provider_list: ["in_house"]
                  custom_llm_providers:
                    - in_house
                  custom_provider_map:
                    - provider: in_house
                      custom_handler: in_house_handler.in_house_llm
                """
            )
        )
        config_yaml.close()
        ctx.invoke(run_server, config=config_yaml.name)


if __name__ == "__main__":
    main()

0 replies

mark-bradshaw · 2025-04-22T17:54:54Z

mark-bradshaw
Apr 22, 2025

My company requires using an internal custom LLM proxy to contact cloud LLM apis. This allows for centralized budgeting and also scrubbing customer data from requests.

That proxy requires adding custom HTTP headers for identification purposes. I would like a way to provide a set of extra headers with the OpenAI Compatible option to send along with all requests.

FYI, this was recently implemented in Cline so we can start using it internally. cline#1136

1 reply

mrubens Apr 22, 2025
Maintainer

Your PR looks great! We should be able to do something similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per Provider HTTP Request Customisation #779

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Per Provider HTTP Request Customisation #779

Uh oh!

Zate Feb 5, 2025

Replies: 2 comments · 1 reply

Uh oh!

akaihola Apr 20, 2025

Uh oh!

mark-bradshaw Apr 22, 2025

Uh oh!

mrubens Apr 22, 2025 Maintainer

Zate
Feb 5, 2025

Replies: 2 comments 1 reply

akaihola
Apr 20, 2025

mark-bradshaw
Apr 22, 2025

mrubens Apr 22, 2025
Maintainer