Skip to content

Add support for LLM routing using developer preferences #5362

@adilhafeez

Description

@adilhafeez

What specific problem does this solve?

Arch Gateway unifies access and routing to any LLM, including dynamic routing via user preferences. For example, it can direct a query to the appropriate model according to specified user preferences.

- name: code generation
  model: claude/claude-sonnet-4-0
  usage: generating new code snippets

- name: code understanding
  model: openai/gpt-4.1
  usage: understand and explain existing code snippets

A user could ask question like "write code to generate prime numbers in rust" and you would see request would go to claude-sonnet-4-0. And to question "help me understand this code ..." would direct query to gpt-4.1. This helps developers not have to manually pick and select a particular model for a particular use case. Arch Gateway does this automatically once the developers have set their preferences.

Additional context (optional)

Here is a demo video showing arch gateway with preference based routing working with arch gateway - https://www.reddit.com/r/LangChain/s/d2GKbYnveZ

More info,

In addition to preference based routing arch gateway supports following features that are incredibly powerful,

  • 🚦 Routing to Agents. Engineered with purpose-built LLMs for fast (<100ms) agent routing and hand-off scenarios
  • 🔗 Routing to LLMs: Unify access and routing to any LLM, including dynamic routing via preference policies.
  • ⛨ Guardrails: Centrally configure and prevent harmful outcomes and ensure safe user interactions
  • ⚡ Tools Use: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls
  • 🕵 Observability: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools
  • 🧱 Built on Envoy: Arch runs alongside app servers as a containerized process, and builds on top of Envoy's proven HTTP management and scalability features to handle ingress and egress traffic related to prompts and LLMs.

Roo Code Task Links (Optional)

No response

Request checklist

  • I've searched existing Issues and Discussions for duplicates
  • This describes a specific problem with clear impact and context

Interested in implementing this?

  • Yes, I'd like to help implement this feature

Implementation requirements

  • I understand this needs approval before implementation begins

How should this be solved? (REQUIRED if contributing, optional otherwise)

I have a PR ready but that is in draft mode and here is the link to it in my private fork - adilhafeez#2

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Simple (no user preferences)

  • Select arch gateway from roo-code UI
  • For each query typed in in roo-code the query should be handled by arch gateway

With user preferences,

  • user enters preferences in roo-code ui for "arch llm gateway" provider
  • For each query entered in roo-code, arch gateway will select appropriate model and direct query to the model

Technical considerations (REQUIRED if contributing, optional otherwise)

Arch gateway supports openai compliant protocol when exposing LLMs. But it needs a config for preference based routing.

How are preferences passed to arch gateway?

  • In arch gateway we look for metadata archgw_preference_config key which is part of chat_completion_request
  • If present then routing model is engaged to pick appropriate model

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

  • There will be slight increase in latency as a result of using routing model to pick appropriate model. Using cloud routing model endpoint, expect somewhere around 100ms to 300ms depending on location of the developer. With local deployment it can be much smaller. On mac m2 max we observed about 70ms latency overhead.
  • When preferences based routing is enabled then user preferences are attached to chat completion request in the metadata resulting in very marginal request size increase.

Metadata

Metadata

Assignees

Labels

Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.enhancementNew feature or requestproposal

Type

No type

Projects

Status

Issue [In Progress]

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions