Need a native queue
#11136
Replies: 2 comments
-
Made my own: https://github.com/Cyberes/litellm-queue Still would like native support. |
Beta Was this translation helpful? Give feedback.
0 replies
-
We do this today - Line 1074 in 3984962 We run requests inside a semaphore to prevent models from being overwhelmed. You can set the max parallel request value https://docs.litellm.ai/docs/routing#max-parallel-requests-async, if this is not set we use the rpm limit set as a sanity check For a global queue, you can also try out scheduler - https://docs.litellm.ai/docs/scheduler |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Why does LiteLLM not have a queue to limit concurrent requests per-model?
Beta Was this translation helpful? Give feedback.
All reactions