-
Notifications
You must be signed in to change notification settings - Fork 20
Fix operatorSync: false
breaking on implicit webhook mutations
#299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -116,10 +117,6 @@ func (r *StorageDefaulter) Default(ctx context.Context, obj runtime.Object) erro | |||
storage := obj.(*Storage) | |||
storagelog.Info("default", "name", storage.Name) | |||
|
|||
if !storage.Spec.OperatorSync { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bug, default webhook should run on every update even if you have operatorSync = false. Otherwise the yaml that gets applied after operatorSync gets switched to false will REMOVE the defaults -- which is definitely not what we want (good example is hosts
section, it is autogenerated for test stands)
I did not yet manage to come up with a scenario where this if
is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the yaml that gets applied after operatorSync gets switched to false will REMOVE the defaults -- which is definitely not what we want (good example is hosts section, it is autogenerated for test stands)
Can you describe in more detail the step-by-step scenario for reproducing the problem?
We already have e2e tests for operatorSync field with updating https://github.com/ydb-platform/ydb-kubernetes-operator/blob/master/tests/e2e/smoke_test.go#L204
Maybe we should write additional tests for this use-case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just re-read the test carefully and understood what is happening.
In the test, before reapplying the Storage object with operatorSync: false
, instead of modifying just the operatorSync option, we GET the Storage object from k8s cluster, then set operatorSync
and only then apply. Of course, if we get an object from k8s api, it has sorted configuration
string. Even if mutated webhook does not run, there is zero diff in configuration and the test passes.
Conclusion: currently all our e2e tests look like above (fetching a Storage from k8s api before apply instead of modifying the original Storage object that we create in a test).
We PROBABLY can stop doing that in tests, but it's non-trivial. I tried to modify the original object and got the following error:
STEP: setting storage operatorSync to false... - /home/jorres/work/nebius/ydb-kubernetes-operator/tests/e2e/smoke_test.go:215 @ 05/07/25 22:08:02.354
[FAILED] Expected success, but got an error:
<*errors.StatusError | 0xc00070d540>:
Operation cannot be fulfilled on storages.ydb.tech "storage": the object has been modified; please apply your changes to the latest version and try again
{
ErrStatus: {
╎ TypeMeta: {Kind: "", APIVersion: ""},
╎ ListMeta: {
╎ SelfLink: "",
╎ ResourceVersion: "",
╎ Continue: "",
╎ RemainingItemCount: nil,
╎ },
╎ Status: "Failure",
╎ Message: "Operation cannot be fulfilled on storages.ydb.tech \"storage\": the object has been modified; please apply your changes to the latest version and try again",
╎ Reason: "Conflict",
╎ Details: {Name: "storage", Group: "ydb.tech", Kind: "storages", UID: "", Causes: nil, RetryAfterSeconds: 0},
╎ Code: 409,
},
}
In [It] at: /home/jorres/work/nebius/ydb-kubernetes-operator/tests/e2e/smoke_test.go:223 @ 05/07/25 22:08:04.394
maybe with a little more time I will come up with a solution how to avoid this problem and still modify the original object, but it definitely will be another PR.
@kobzonega do you maybe have any ideas right now? Otherwise I'll put it in backlog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use K8S API Patch method instead of Update, that do not rely on resourceVersion field. It would be nice if we add this test, maybe in the separate PR.
Pull request type
Please check the type of change your PR introduces:
What is the current behavior?
operatorSync: false
, validating webhook checks for any other checks besides this field. It fails because in mutating webhook marshalling + unmarshalling ofspec.configuration
occurs, and it is not a noop (formatting changes)What is the new behavior?