Skip to content

Fix operatorSync: false breaking on implicit webhook mutations #299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 7, 2025

Conversation

Jorres
Copy link
Contributor

@Jorres Jorres commented May 6, 2025

Pull request type

Please check the type of change your PR introduces:

  • Bugfix

What is the current behavior?

  • when setting operatorSync: false, validating webhook checks for any other checks besides this field. It fails because in mutating webhook marshalling + unmarshalling of spec.configuration occurs, and it is not a noop (formatting changes)

What is the new behavior?

  • validating webhook compares configuration correctly, without failing on formatting only

@@ -116,10 +117,6 @@ func (r *StorageDefaulter) Default(ctx context.Context, obj runtime.Object) erro
storage := obj.(*Storage)
storagelog.Info("default", "name", storage.Name)

if !storage.Spec.OperatorSync {
Copy link
Contributor Author

@Jorres Jorres May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug, default webhook should run on every update even if you have operatorSync = false. Otherwise the yaml that gets applied after operatorSync gets switched to false will REMOVE the defaults -- which is definitely not what we want (good example is hosts section, it is autogenerated for test stands)

I did not yet manage to come up with a scenario where this if is necessary

Copy link
Contributor

@kobzonega kobzonega May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the yaml that gets applied after operatorSync gets switched to false will REMOVE the defaults -- which is definitely not what we want (good example is hosts section, it is autogenerated for test stands)

Can you describe in more detail the step-by-step scenario for reproducing the problem?
We already have e2e tests for operatorSync field with updating https://github.com/ydb-platform/ydb-kubernetes-operator/blob/master/tests/e2e/smoke_test.go#L204
Maybe we should write additional tests for this use-case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just re-read the test carefully and understood what is happening.

In the test, before reapplying the Storage object with operatorSync: false, instead of modifying just the operatorSync option, we GET the Storage object from k8s cluster, then set operatorSync and only then apply. Of course, if we get an object from k8s api, it has sorted configuration string. Even if mutated webhook does not run, there is zero diff in configuration and the test passes.

Conclusion: currently all our e2e tests look like above (fetching a Storage from k8s api before apply instead of modifying the original Storage object that we create in a test).

We PROBABLY can stop doing that in tests, but it's non-trivial. I tried to modify the original object and got the following error:

  STEP: setting storage operatorSync to false... - /home/jorres/work/nebius/ydb-kubernetes-operator/tests/e2e/smoke_test.go:215 @ 05/07/25 22:08:02.354
  [FAILED] Expected success, but got an error:
      <*errors.StatusError | 0xc00070d540>:
      Operation cannot be fulfilled on storages.ydb.tech "storage": the object has been modified; please apply your changes to the latest version and try again
      {
          ErrStatus: {
          ╎   TypeMeta: {Kind: "", APIVersion: ""},
          ╎   ListMeta: {
          ╎       SelfLink: "",
          ╎       ResourceVersion: "",
          ╎       Continue: "",
          ╎       RemainingItemCount: nil,
          ╎   },
          ╎   Status: "Failure",
          ╎   Message: "Operation cannot be fulfilled on storages.ydb.tech \"storage\": the object has been modified; please apply your changes to the latest version and try again",
          ╎   Reason: "Conflict",
          ╎   Details: {Name: "storage", Group: "ydb.tech", Kind: "storages", UID: "", Causes: nil, RetryAfterSeconds: 0},
          ╎   Code: 409,
          },
      }
  In [It] at: /home/jorres/work/nebius/ydb-kubernetes-operator/tests/e2e/smoke_test.go:223 @ 05/07/25 22:08:04.394

maybe with a little more time I will come up with a solution how to avoid this problem and still modify the original object, but it definitely will be another PR.

@kobzonega do you maybe have any ideas right now? Otherwise I'll put it in backlog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use K8S API Patch method instead of Update, that do not rely on resourceVersion field. It would be nice if we add this test, maybe in the separate PR.

@Jorres Jorres force-pushed the fix-operator-sync branch from 72368e7 to 8b71781 Compare May 7, 2025 12:29
@Jorres Jorres merged commit 53e8441 into master May 7, 2025
3 checks passed
@Jorres Jorres deleted the fix-operator-sync branch May 7, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants