Skip to content

fix: operator stability part2 #96

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 15, 2024
Merged

Conversation

csatib02
Copy link
Member

@csatib02 csatib02 commented Nov 11, 2024

Overview

  • The feature that introduced bridges, was not following the pattern that was set before. This means that now first bridges will be handled from a tenant's perspective:
  1. Upon reconciling a tenant, we are checking whether there are bridges deployed in a system that are referencing it.
  2. If there are, we add them to tenant.Status.ConnectedBridges.
  3. Upon reconciling the collector, we construct the otel-config, by reading the aforementioned field.
  • If there are only collectors in the system, dont't even try to create a config, rather just log it.
  • Fixed checking bridge's connection: Now we only check if the other end of the bridge is referencing a tenant that is deployed in the system.
  • The State of all resources were not updated correctly, therefore running the kubectl get telemetry-all -A command would report false CR states, it is now fixed.

Observations

  • It is now very hard (if not impossible) to make TC deploy an instantly failing otel-collector instance.
  • TC requires a collection of CR's to be deployed and to be in a ready state at the same time, but due to eventual-consistency the operator will always try to create an otel-config input. It would be great to have a logic that would detect frequent reconcile-requests, so we don't log out unnecessary errors.

@csatib02 csatib02 force-pushed the fix/operator-stability-part2 branch from 96ddc6f to 9ae5d89 Compare November 11, 2024 18:17
Signed-off-by: Bence Csati <bence.csati@axoflow.com>
@csatib02 csatib02 force-pushed the fix/operator-stability-part2 branch from 9ae5d89 to 14bf7f4 Compare November 11, 2024 18:18
Signed-off-by: Bence Csati <bence.csati@axoflow.com>
Signed-off-by: Bence Csati <bence.csati@axoflow.com>
@csatib02 csatib02 self-assigned this Nov 12, 2024
@csatib02 csatib02 added the bug Something isn't working label Nov 12, 2024
@csatib02 csatib02 changed the title Fix/operator stability part2 fix: operator stability part2 Nov 12, 2024
@csatib02 csatib02 requested review from pepov and OverOrion November 12, 2024 11:14
@csatib02 csatib02 marked this pull request as ready for review November 12, 2024 11:14
Copy link
Collaborator

@OverOrion OverOrion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, thank you!

@csatib02 csatib02 merged commit 8c6f860 into main Nov 15, 2024
11 checks passed
@csatib02 csatib02 deleted the fix/operator-stability-part2 branch November 15, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants