Skip to content

Prometheus Federator fails to remove the cleanup label from existing ProjectHelmChart #200

@apoorvajagtap

Description

@apoorvajagtap

Cluster Setup

  • Kubernetes version: v1.31.6
  • Type of Cluster: RKE2
  • Installation option (Helm Chart / Custom Installation):
    • Helm Chart. No custom configurations.

Describe the bug
In scenarios when PromFed is uninstalled, without manually removing the projectHelmCharts, it is expected that the stale projectHelmChart (say PHC) will have "helm.cattle.io/helm-project-operator-cleanup":true label added.
IIUC this label is expected to be removed by PromFed controller in order to reconsider the PHC, whenever PromFed is installed again.

Observations:

  • The new installation of PromFed will log, but fails to remove the label.
time="2025-03-20T21:36:35Z" level=info msg="Removing cleanup label from all registered ProjectHelmCharts..."
  • A manual removal of the label is needed, to let the PromFed controller start watching this PHC again.

Cause:

  • nLooks like, a check is performed during cleanup to ensure that the namespace of PHC exists, if the namespace itself doesn’t exist, then there’s no need to worry for the PHC, as that will be cleaned up by K8s eventually.
  • The problem seems that shouldManage() fetches content from namespaceCache here. But, this removal of cleanup label is basically initialized while Registering the project.
  • At this point, the controllers are not initialized, and hence the informers must not have started yet.
  • As a result, it always reports that the namespace does not exist (because namespaceCache is likely to be empty), leading to shouldManage returning false - Even when the namespace exists & the label is expected to be removed.

Thoughts:

  • Now, given that the informers haven’t started when we check shouldManage, waiting for cache would lead to a deadlock.
  • Shouldn’t initRemoveCleanupLabels() be called after the controllers have started? Is there any particular reason to initialize the cleanup while Registering the PHC?

To Reproduce

  1. Have a Rancher deployed.
  2. Install Monitoring Chart, followed by Prometheus Federator Chart. (ref)
  3. Create a Project Monitor.
  4. Once the Project Monitor has been configured successfully, uninstall Prometheus Federator (without deleing the Project Monitor manually).
  5. Then reinstall the Prometheus Federator chart.
  6. The ProjectHelmChart stucks in AwaitingOperatorRedeployment state.

Result

Expected Result

  • The cleanup label should be removed from PHC during the redeployment of PromFed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions