-
Notifications
You must be signed in to change notification settings - Fork 345
Kueue mutating webhooks drops fields in KubeRay resources #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The immediate fix we could do is bump the kuberay dependency in Kueue to a version that includes all new fields, but this doesn't seem like a scalable approach. The defaulting behavior seems a bit unusual to me, maybe it's something specific to controller-runtime webhooks? |
In v0.8 we fixed the job reconcilers to use Patch instead of Update to avoid this problem #2501 But we didn't check the webhooks, so that's probably the problem. |
No, it doesn't look like it's the defaulter, because it's also using Patch https://github.com/kubernetes-sigs/controller-runtime/blob/e6c3d139d2b6c286b1dbba6b6a95919159cfe655/pkg/webhook/admission/defaulter_custom.go#L88-L93 |
Looking at GKE audit logs, the only API updates come from the ray operator. Thus, the only possible culprit is the webhook. Thinking about this further:
So it's a bug in controller-runtime. I'll try to fix it there, but while waiting for a release there, our only chance is to update the APIs in Kueue. |
Fix is up kubernetes-sigs/controller-runtime#2931 |
/assign @alculquicondor |
As a short-term fix should we update kuberay version to v1.2? cc @astefanutti |
We already done it on #2960. |
Ah I missed that, thank you! |
What happened:
When using Kueue with KubeRay, new fields in KubeRay that are not recognized by Kueue are dropped during defaulting. Here's an example using Kind and RayJob:
Create cluster:
Install Kueue:
Install latest release candidate of KubeRay:
Deploy Kueue resources:
Create a RayJob with a reference to the local queue:
Note the RayJob sets a new field introduced in v1.2.0
spec.backoffLimit
. In this example it is set to 2. However, Kueue is defaulting the field to 0:What you expected to happen:
I believe we saw similar behavior while transition KubeRay v0.6.0 to v1.0.0, but I thought it was specific to the v1apha1 -> v1 upgrade when we deleted fields in v1alpha1.
I would not expect Kueue to drop new fields introduced in v1 APIs of KubeRay.
How to reproduce it (as minimally and precisely as possible):
See steps above.
Anything else we need to know?:
Environment:
kubectl version
):git describe --tags --dirty --always
): v0.8.0cat /etc/os-release
):uname -a
):The text was updated successfully, but these errors were encountered: