You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide an inference-aware routing layer capable of making dynamic decisions based on real-time metrics and model states and factors like KV cache, prefix-cache, workload distribution, or latency requirements.