How to implement dynamic request prioritization for streaming ASR in Triton #8136
Unanswered
WoodieDudy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hola,
I'm implementing streaming inference for an ASR model with Triton. The goal is to process audio chunks in near real-time: if a user sends 2 seconds of audio, they expect a response within 2 seconds of wall-clock time or faster.
Some users send audio steadily in small chunks, while others may send large segments (e.g., 10 seconds of audio at once). Ideally, a request for a large segment should initially have lower priority than newly arriving short requests, but as time passes and its processing deadline approaches, its priority should increase - potentially overtaking newer short requests.
Triton supports integer-based request priorities, but as I understand, there's no way to update the priority of a request once it is enqueued.
What are the possible ways to implement such dynamic prioritization logic in Triton?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions