-
We're currently exploring the use of admissionFairSharing:
usageHalfLifeTime: "168h" # decay over 1 week
usageSamplingInterval: "5m" # sampled every 5 minutes
resourceWeights:
cpu: 1.0
memory: 1.0
nvidia.com/gpu: 2.0 However, we’re concerned about possible workload behaviors that might undermine fair sharing without triggering preemption, such as:
Are there best practices to mitigate these patterns without enabling |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 12 replies
-
Beta Was this translation helpful? Give feedback.
-
As a follow-up, are there recommended strategies within |
Beta Was this translation helpful? Give feedback.
-
Here is the snapshot of my knowledge:
There is no such additional penalty AFAIK.
I don't think we currently have any additional mechanism rather than lowering
I don't think so. Please also know that we are changing the ordering of workloads for preemption in 0.13: #5632
Admission Fair Sharing is still a new feature, I don't think such "best practices" exist at the moment. One may consider a separate CQ for CPU and GPU heavy jobs. However, the best person to address the questions would be @PBundyra, but he is on vacation until July 21st. Maybe in the meanwhlie @mwielgus or @mwysokin could share some extra knowledge here. |
Beta Was this translation helpful? Give feedback.
-
Hi @kimminw00 thanks for reaching out
There is no such a penalty but please note that even if a workload surpasses E.g. There's a long-running workload that consumes 100 GPUs Even it runs for a long period of time the usage in LocalQueue's status won't surpass 100GPUs
In Kueue v0.13.0 we introduced an entry penalty for every admitted job, so even they run for less than sampling, some usage will be accounted. At the same time, as @mimowo has said lowering the sampling interval to 1min shouldn't be much of a problem, as it's just an extra reconcile every minute per LocalQueue, not per job More about the entry penalty: https://kueue.sigs.k8s.io/docs/concepts/admission_fair_sharing/#entry-penalty |
Beta Was this translation helpful? Give feedback.
-
You could also dedicate a separate LQ for a long-running jobs and balance the weight of the usage with E.g. You could decrease the importance of the usage for a dedicated LocalQueue by 10, setting the weight to |
Beta Was this translation helpful? Give feedback.
-
In scenarios where all workloads have the same priority and use a single shared resource pool(single cluster queue), the existing preemption logic may not function effectively due to the absence of borrowing or priority differences(to prevent users from inflating their workloads' priority to gain resource access) to trigger preemption. To address this, how about implementing a new preemption strategy that targets workloads exceeding a predefined resource usage limit(or maximum execution time)? This approach ensures fair resource distribution and prevents prolonged jobs from monopolizing resources, even when priorities are equal and nominal quota overuses are undefined. Key Points: |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed explanation! I now understand that the recent work on preemption in AFS, particularly the new ordering policy (e.g., #5632), is focused on determining which workloads should be preempted, especially in the context of a single ClusterQueue (CQ) being shared by multiple LocalQueues (LQs). However, my original intention was to ask not about "which" workloads should be preempted, but rather "what conditions" preemption in AFS is triggered. From the Kueue documentation on preemption, it seems that preemption can be triggered in a few scenarios, such as borrowing or priority differences. But in a setup with a single CQ, where borrowing is not possible and all workloads have the same priority, it's unclear when or if preemption would actually occur. Could you clarify the specific conditions that trigger preemption in such a configuration? |
Beta Was this translation helpful? Give feedback.
Also opened the issue: #6493 to clarify the algorithm for picking preemption targets relies on relative usage between LQs rather than priorities which seems to be a source of confusion.