Skip to content

FEAT: Adapt to Rate Limit instead of Failure #37

@MiNeves00

Description

@MiNeves00

Feature Request

Providers like OpenAI have some rate limits (things like a limit in the requests per minute).
This feature would allow llm studio to wait it out (or keep trying) when necessary so that the response does not error even if it takes longer.

Advanced feature:
By being aware of the exact rate limit of the user (depends on their tier in OpenAI for example) it could also decide which prompts to send at what time in order to maximize the rate limit without overstepping it (in cases where these are parallel).

Motivation

Gives more robustness to LLM calls. The user does not need to worry about their application breaking when making to many requests per minute.

Your contribution

Discussion

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestpriority:lowLow priority. Could take 1+ month to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions