Enhance model service to reduce fragmentation #149

asm582 · 2025-05-09T20:58:06Z

asm582
May 9, 2025
Collaborator

ModelService creates compute and non-compute resources. Under compute resources, decode and prefill deployment can be made, where each can have multiple replicas. The default Scheduler will spread such pods. We need a mechanism to pack pods on a node. Here is a scenario: Assume the cluster has two nodes, each having two GPUs. If prefill and decode deployment with one replica places a pod on both the nodes, this will cause the next workload replica requesting two GPUs per pod not to start, even if the cluster has two GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhance model service to reduce fragmentation #149

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Enhance model service to reduce fragmentation #149

Uh oh!

asm582 May 9, 2025 Collaborator

Replies: 0 comments

asm582
May 9, 2025
Collaborator