Recommendations to avoid model thrashing? #7937

TimothySeah · 2025-01-14T23:20:47Z

TimothySeah
Jan 14, 2025

I have a cluster of triton servers. Each of these loads a different model depending on the request it receives. However, because these models are large, there is a lot of "model thrashing" i.e. we waste time loading/unloading models that are too large to all fit in gpu memory. I was wondering if there is a general/canonical solution to this? For example, is there an easy way to route requests requiring a specific model to pods that already have that model loaded? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommendations to avoid model thrashing? #7937

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Recommendations to avoid model thrashing? #7937

Uh oh!

Uh oh!

TimothySeah Jan 14, 2025

Replies: 0 comments

TimothySeah
Jan 14, 2025