Description
What happened:
While working on the integration test for #4934 // #4935 I noticed a curious issue
Workloads never get any conditions set, but are actively being hit by both the scheduler and the reconciler. The culprit seems to be a resource version conflict between scheduler and reconciler such that the scheduler never applies this status update during failed admission of head entries:
kueue/pkg/scheduler/scheduler.go
Line 659 in 3279d9c
resulting in e.g.
2025-04-21T17:49:24.03698-04:00 ERROR scheduler scheduler/scheduler.go:685 Could not update Workload status {"schedulingCycle": 5, "error": "Operation cannot be fulfilled on workloads.kueue.x-k8s.io \"admission-check-wl2\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).requeueAndUpdate
/Users/alexeldeib/code/kueue/pkg/scheduler/scheduler.go:685
sigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).schedule
/Users/alexeldeib/code/kueue/pkg/scheduler/scheduler.go:302
sigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff.func1
/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:43
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
/Users/alexeldeib/code/kueue/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
/Users/alexeldeib/code/kueue/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227
sigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff
/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:42
sigs.k8s.io/kueue/pkg/util/wait.UntilWithBackoff
/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:34
in this case it seems there is no workload reconcile, but the workload is requeued as inadmissible and never re-scheduled/nominated when it hits
kueue/pkg/queue/cluster_queue.go
Line 406 in 3279d9c
OR the update passes, but the workload reconciler triggers a no-op update from pending to pending. see this code path
kueue/pkg/controller/core/workload_controller.go
Lines 704 to 705 in 3279d9c
either case ends up with the workload requeued as inadmissible, and then it may never get requeued. there is no reason a single workload would retrigger scheduling once it is inadmissible, unless other workloads are deleted, or the CQs are updated, etc.
there are two potential fixes which seem to both be required:
- trigger requeue of inadmissible workload immediately on resource version conflict (e.g.
apierrors.IsConflict
) during requeue status update- this solves the first case, since without immediate requeue and no additional update from workload controller, it's kaput
- trigger requeue of inadmissible workloads during the pending -> pending reconciler in workload controller as well as the default path (for spurious/uncached events).
- this handles the case where the status update succeeds, triggers a workload reconcile, but that does not currently retrigger a scheduling loop
What you expected to happen:
condition status updates should occur on pending workloads
How to reproduce it (as minimally and precisely as possible):
see #4935 -- remove the changes mentioned above and run the test added in that PR a few times, it will reproduce both variations.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): - Kueue version (use
git describe --tags --dirty --always
): - Cloud provider or hardware configuration:
- OS (e.g:
cat /etc/os-release
): - Kernel (e.g.
uname -a
): - Install tools:
- Others: