How to implement dynamic request prioritization for streaming ASR in Triton #8136

WoodieDudy · 2025-04-09T09:19:38Z

WoodieDudy
Apr 9, 2025

Hola,
I'm implementing streaming inference for an ASR model with Triton. The goal is to process audio chunks in near real-time: if a user sends 2 seconds of audio, they expect a response within 2 seconds of wall-clock time or faster.

Some users send audio steadily in small chunks, while others may send large segments (e.g., 10 seconds of audio at once). Ideally, a request for a large segment should initially have lower priority than newly arriving short requests, but as time passes and its processing deadline approaches, its priority should increase - potentially overtaking newer short requests.

Triton supports integer-based request priorities, but as I understand, there's no way to update the priority of a request once it is enqueued.

What are the possible ways to implement such dynamic prioritization logic in Triton?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to implement dynamic request prioritization for streaming ASR in Triton #8136

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to implement dynamic request prioritization for streaming ASR in Triton #8136

Uh oh!

WoodieDudy Apr 9, 2025

Replies: 0 comments

WoodieDudy
Apr 9, 2025