Skip to content

scheduler.go uses ApplyAdmissionStatusPatch, fails on missing fields due to fieldManager #4934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alexeldeib opened this issue Apr 11, 2025 · 0 comments · Fixed by #4935
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@alexeldeib
Copy link
Contributor

What happened:
We get an error like this during scheduler loop

{"level":"error","ts":"2025-04-09T12:39:42.128555256Z","logger":"scheduler","caller":"scheduler/scheduler.go:603","msg":"Could not update Workload status","schedulingCycle":1456,"error":"Workload.kueue.x-k8s.io \"jobset-xxxxxx" is invalid: [admissionChecks[0].lastTransitionTime: Required value, admissionChecks[0].message: Required value, admissionChecks[0].state: Required value, <nil>: Invalid value: \"null\": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]","stacktrace":"sigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).requeueAndUpdate\n\t/workspace/pkg/scheduler/scheduler.go:603\nsigs.k8s.io/kueue/pkg/scheduler.(*Scheduler).schedule\n\t/workspace/pkg/scheduler/scheduler.go:301\nsigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff.func1\n\t/workspace/pkg/util/wait/backoff.go:43\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/workspace/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227\nsigs.k8s.io/kueue/pkg/util/wait.untilWithBackoff\n\t/workspace/pkg/util/wait/backoff.go:42\nsigs.k8s.io/kueue/pkg/util/wait.UntilWithBackoff\n\t/workspace/pkg/util/wait/backoff.go:34"}

This code manually constructs the patch rather than using the existing helper which correctly would patch admission checks and avoid the error

if e.status == notNominated || e.status == skipped {
patch := workload.BaseSSAWorkload(e.Obj)
workload.AdmissionStatusPatch(e.Obj, patch, true)
reservationIsChanged := workload.UnsetQuotaReservationWithCondition(patch, "Pending", e.inadmissibleMsg, s.clock.Now())
resourceRequestsIsChanged := workload.PropagateResourceRequests(patch, &e.Info)
if reservationIsChanged || resourceRequestsIsChanged {
if err := workload.ApplyAdmissionStatusPatch(ctx, s.client, patch); err != nil {

the error is due to the kueue-admission field manager submitting an impartial patch for fields it is tracked as owner for. curiously this seems to force a failure rather than remove the field owner? I am able to provide a minimal repro of this but it requires custom ACC checks. it's possible it fails the update because there are multiple field owners (the ACC controller as well as kueue)?

note: this is difficult to unit test with a fake client, as kueue currently uses this workaround due to fake client not fully supporting SSA

cl := utiltesting.NewClientBuilder().WithInterceptorFuncs(interceptor.Funcs{
SubResourcePatch: func(ctx context.Context, client client.Client, subResourceName string, obj client.Object, patch client.Patch, opts ...client.SubResourcePatchOption) error {
updates++
return utiltesting.TreatSSAAsStrategicMerge(ctx, client, subResourceName, obj, patch, opts...)
},
}).WithObjects(objs...).WithStatusSubresource(objs...).Build()

the comment on the SSA helper hints at the issue

// TreatSSAAsStrategicMerge - can be used as a SubResourcePatch interceptor function to treat SSA patches as StrategicMergePatchType.
// Note: By doing so the values set in the patch will be updated but the call will have no knowledge of FieldManagement when it
// comes to detecting conflicts between managers or removing fields that are missing from the patch.
func TreatSSAAsStrategicMerge(ctx context.Context, clnt client.Client, subResourceName string, obj client.Object, patch client.Patch, opts ...client.SubResourcePatchOption) error {
return clnt.SubResource(subResourceName).Patch(ctx, obj, wrapSSAPatch(patch), opts...)
}

this is partly expected due to kubernetes-sigs/controller-runtime#2341 // kubernetes/kubernetes#115598 (comment)

however this leads to failed updates due to the field manager behavior, see the original error

What you expected to happen:

No error during patch, seems like ApplyAdmissionStatus already does the correct behavior, probably should just use that?

How to reproduce it (as minimally and precisely as possible):

Create a jobset with ACC checks, ensure it is in a state that does not get nominated for some reason but attempts scheduling, it should hit this code path.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.29.12 and others (n/a)
  • Kueue version (use git describe --tags --dirty --always): v0.10.4
  • Cloud provider or hardware configuration: n/a
  • OS (e.g: cat /etc/os-release): n/a
  • Kernel (e.g. uname -a): n/a
  • Install tools: n/a
  • Others: n/a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant