Skip to content

single inadmissible workload from quota never gets pending condition with best effort fifo #5061

Open
@alexeldeib

Description

@alexeldeib

What happened:

While working on the integration test for #4934 // #4935 I noticed a curious issue

Workloads never get any conditions set, but are actively being hit by both the scheduler and the reconciler. The culprit seems to be a resource version conflict between scheduler and reconciler such that the scheduler never applies this status update during failed admission of head entries:

if err := workload.ApplyAdmissionStatusPatch(ctx, s.client, patch); err != nil {

resulting in e.g.

  2025-04-21T17:49:24.03698-04:00	ERROR	scheduler	scheduler/scheduler.go:685	Could not update Workload status	{"schedulingCycle": 5, "error": "Operation cannot be fulfilled on workloads.kueue.x-k8s.io \"admission-check-wl2\": the object has been modified; please apply your changes to the latest version and try again"}
  sigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).requeueAndUpdate
  	/Users/alexeldeib/code/kueue/pkg/scheduler/scheduler.go:685
  sigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).schedule
  	/Users/alexeldeib/code/kueue/pkg/scheduler/scheduler.go:302
  sigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff.func1
  	/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:43
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
  	/Users/alexeldeib/code/kueue/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil
  	/Users/alexeldeib/code/kueue/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227
  sigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff
  	/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:42
  sigs.k8s.io/kueue/pkg/util/wait.UntilWithBackoff
  	/Users/alexeldeib/code/kueue/pkg/util/wait/backoff.go:34

in this case it seems there is no workload reconcile, but the workload is requeued as inadmissible and never re-scheduled/nominated when it hits

return c.requeueIfNotPresent(wInfo, reason == RequeueReasonFailedAfterNomination || reason == RequeueReasonPendingPreemption)

OR the update passes, but the workload reconciler triggers a no-op update from pending to pending. see this code path

case prevStatus == workload.StatusPending && status == workload.StatusPending:
err := r.queues.UpdateWorkload(e.ObjectOld, wlCopy)

either case ends up with the workload requeued as inadmissible, and then it may never get requeued. there is no reason a single workload would retrigger scheduling once it is inadmissible, unless other workloads are deleted, or the CQs are updated, etc.

there are two potential fixes which seem to both be required:

  • trigger requeue of inadmissible workload immediately on resource version conflict (e.g. apierrors.IsConflict) during requeue status update
    • this solves the first case, since without immediate requeue and no additional update from workload controller, it's kaput
  • trigger requeue of inadmissible workloads during the pending -> pending reconciler in workload controller as well as the default path (for spurious/uncached events).
    • this handles the case where the status update succeeds, triggers a workload reconcile, but that does not currently retrigger a scheduling loop

What you expected to happen:

condition status updates should occur on pending workloads

How to reproduce it (as minimally and precisely as possible):

see #4935 -- remove the changes mentioned above and run the test added in that PR a few times, it will reproduce both variations.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Kueue version (use git describe --tags --dirty --always):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions