Skip to content

Inference-Aware Routing Layer #1202

@sdesai345

Description

@sdesai345

Provide an inference-aware routing layer capable of making dynamic decisions based on real-time metrics and model states and factors like KV cache, prefix-cache, workload distribution, or latency requirements.

Status: In progress
Owner: Ernest Wong

Metadata

Metadata

Assignees

Projects

Status

In Progress

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions