Need a native queue #11136

Cyberes · 2025-05-25T03:30:49Z

Cyberes
May 25, 2025

Why does LiteLLM not have a queue to limit concurrent requests per-model?

Cyberes · 2025-05-25T21:50:56Z

Cyberes
May 25, 2025
Author

Made my own: https://github.com/Cyberes/litellm-queue

Still would like native support.

0 replies

krrishdholakia · 2025-05-31T18:57:33Z

krrishdholakia
May 31, 2025
Maintainer

We do this today -

litellm/litellm/router.py

Line 1074 in 3984962

async with rpm_semaphore:

We run requests inside a semaphore to prevent models from being overwhelmed.

You can set the max parallel request value https://docs.litellm.ai/docs/routing#max-parallel-requests-async, if this is not set we use the rpm limit set as a sanity check

For a global queue, you can also try out scheduler - https://docs.litellm.ai/docs/scheduler

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Need a native queue #11136

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Need a native queue #11136

Uh oh!

Cyberes May 25, 2025

Replies: 2 comments

Uh oh!

Cyberes May 25, 2025 Author

Uh oh!

Uh oh!

krrishdholakia May 31, 2025 Maintainer

Cyberes
May 25, 2025

Cyberes
May 25, 2025
Author

krrishdholakia
May 31, 2025
Maintainer