Add support for LLM routing using developer preferences

### What specific problem does this solve?

[Arch Gateway](https://github.com/katanemo/archgw) unifies access and routing to any LLM, including dynamic routing via [user preferences](https://github.com/katanemo/archgw#Preference-based-Routing). For example, it can direct a query to the appropriate model according to specified user preferences.

```
- name: code generation
  model: claude/claude-sonnet-4-0
  usage: generating new code snippets

- name: code understanding
  model: openai/gpt-4.1
  usage: understand and explain existing code snippets
```

A user could ask question like "write code to generate prime numbers in rust" and you would see request would go to `claude-sonnet-4-0`. And to question "help me understand this code ..." would direct query to `gpt-4.1`. This helps developers not have to manually pick and select a particular model for a particular use case. Arch Gateway does this automatically once the developers have set their preferences.

### Additional context (optional)

Here is a demo video showing arch gateway with preference based routing working with arch gateway - https://www.reddit.com/r/LangChain/s/d2GKbYnveZ

More info,

- Model weights: https://huggingface.co/katanemo/Arch-Router-1.5B
- Published paper on routing from Katanemo - https://arxiv.org/abs/2506.16655
- Hacker News post announcing preference based routing https://news.ycombinator.com/item?id=44436031

In addition to preference based routing arch gateway supports following features that are incredibly powerful,

- 🚦 Routing to Agents. Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) for fast (<100ms) agent routing and hand-off scenarios
- 🔗 Routing to LLMs: Unify access and routing to any LLM, including dynamic routing via [preference policies](https://github.com/katanemo/archgw#Preference-based-Routing).
- ⛨ Guardrails: Centrally configure and prevent harmful outcomes and ensure safe user interactions
- ⚡ Tools Use: For common agentic scenarios let Arch instantly clarify and convert prompts to tools/API calls
- 🕵 Observability: W3C compatible request tracing and LLM metrics that instantly plugin with popular tools
- 🧱 Built on Envoy: Arch runs alongside app servers as a containerized process, and builds on top of [Envoy's](https://envoyproxy.io/) proven HTTP management and scalability features to handle ingress and egress traffic related to prompts and LLMs.

### Roo Code Task Links (Optional)

_No response_

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [x] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)

I have a PR ready but that is in draft mode and here is the link to it in my private fork - https://github.com/adilhafeez/Roo-Code/pull/2

### How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Simple (no user preferences)
- Select arch gateway from roo-code UI
- For each query typed in in roo-code the query should be handled by arch gateway

With user preferences,
- user enters preferences in roo-code ui for "arch llm gateway" provider
- For each query entered in roo-code, arch gateway will select appropriate model and direct query to the model


### Technical considerations (REQUIRED if contributing, optional otherwise)

Arch gateway supports openai compliant protocol when exposing LLMs. But it needs a config for preference based routing.

How are preferences passed to arch gateway?
- In arch gateway we look for metadata `archgw_preference_config` key which is part of chat_completion_request
- If present then routing model is engaged to pick appropriate model

### Trade-offs and risks (REQUIRED if contributing, optional otherwise)

- There will be slight increase in latency as a result of using routing model to pick appropriate model. Using cloud routing model endpoint, expect somewhere around 100ms to 300ms depending on location of the developer. With local deployment it can be much smaller. On mac m2 max we observed about 70ms latency overhead.
- When preferences based routing is enabled then user preferences are attached to chat completion request in the metadata resulting in very marginal request size increase.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for LLM routing using developer preferences #5362

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for LLM routing using developer preferences #5362

Description

What specific problem does this solve?

Additional context (optional)

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions