Skip to content

Dynamic throttling using both time and relative TPS/token usage #2941

@vigith

Description

@vigith

There are many ways to refill the tokens, each of them have different use-cases

Refill Strategies we should support.

Modes Description Notes
ONLY_IF_USED If the max token is not used, then we will give the same previously allocated token. +- 5% threshold, need not be exact
SCHEDULED If we will release/increase tokens on a schedule even if it not used This has side effects on the callee
RELAXED If there is some traffic, then release the max possible tokens

Doc

Sub-issues

Metadata

Metadata

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions