Kill old connections when new connections coming in with RPM policy #10301
James4Ever0
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
Hey @James4Ever0 how would you want this to work on litellm? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Many local LLM services are not responsive when multiple requests coming in. Every request takes a long time to finish and sometimes indefinitely.
I want to resolve this problem by kicking out old connections when new connections coming in.
I have considered other implementations like using Python with proxy.py or Nginx with Lua scripts to proxy these services and manage connections according to the policy mentioned above.
In LiteLLM, if this feature is implemented, it should be configurable as a key for every upstream model entry.
Similar implementations:
vllm-project/vllm#4513
Beta Was this translation helpful? Give feedback.
All reactions