Skip to content

Start decoupling helm-controller from resulting PromFed binary #148

@mallardduck

Description

@mallardduck

Per title, this issue/RFE is simply for our team to consider the potential for fully removing the embedded helm-controller.

Summary

The primary objective of this effort would be to decouple the helm-controller from the PromFed binary. In turn directly reducing the complexity of delivering a PromFed that is compatible with all target distros (rke2/k3s, cloud, etc) and varying k8s versions.

The PromFed project and chart would still depend on helm-controller but instead of building into the binary would consume the original image. This will de-duplicate redundant efforts across projects and helm-controller systems within a cluster.

Finally, once the PromFed binary built in version is ready for removal it will simplify the PromFed project further by allowing us to remove the complicated helm-controller crd management we've needed to fix other bugs on k3s/rke2 systems.

Solution

Remove helm-controller as a godep for PromFed, make it a sibling container instead

This option entails keeping the existing functionality we surface to PromFed users today with a different implementation at a k8s level.Instead of a true "embedded" controller in PromFed binary, it would become a new helm-controller pod or deployment. The goal would be a compatible end-user experience with new opt-in helm fields to use the new mechanism.

The end result is still "embedded" - in the experience of using the chart - but not statically at compile time.

While this option has some considerable cons (see original issue text) - the Pros greatly outweigh those risks. We can mitigate most of those risks by creating good documentation - both formal docs, release notes and helm-chart docs (in values.yaml).

Background

3 Simple Supported Config Setups:

  • A) PromFed is the only possible helm-controller on Cluster:
    • The helm-controller w/ PromFed must be used (will now be a pod).
  • B)PromFed is on Cluster with existing Global helm-controller:
    • Cannot have PromFed's helm-controller enabled.
    • Based on talking with @brandond, mixing global and namespace scoped helm-controller instances is not fully supported.
      • It may work but also may have weirdness - if it needs to be supported Team ORBS can work in k3s upstream for helm-controller to add support properly.
  • C) PromFed is on a Cluster with Namespace scoped controller instances:
    • User's can either: a) deploy one for PromFed like their existing helm-controller instances, or b) use the PromFed one setting the values.yaml to match the helm-controller version.

Important Context:

  • helm-controller supports either: a single global controller watching every NS, or many NS scoped instances.
  • helm-controller doesn't support a mixture of both of these modes in the same cluster.
  • Today's PromFed integration for helm-controller leads to conflicts easily even on a single Rancher Minor version, because:
    • The existing PromFed embedded helm-controller is always locked at a single version (at build time).
    • k3s/RKE2 will update helm-controller versions on patch releases.
    • This can lead to CRD version conflicts and other weird issues
  • The existing PromFed embedded helm-controller both:
    • Installs under a new name, and
    • Installs under a namespace, so becomes namespace scoped.
  • Technically, the ManagedBy annotation that helm-controller supports does allow multiple controller instances to have overlapping namespace scopes w/o conflict via Alternative names.
    • In other words, mixing global and namespaced ones works with specific steps taken - as that's partially how PromFed can co-exist with global ones with proper configs.
    • The lease mechanism will allow both: global and/or many ns-scoped instances to hold a lease.
      • Using both together isn't officially supported, as mentioned above, but technically possible as long as each NS specific instance has a custom ControllerName set to differentiate ManagedBy and ensure global instance won't touch the ones for NS-scoped controllers.
      • The current helm-controller lease mechanism will not allow locks for: multiple global instances, or multiple ns-scoped instances within the same ns. (In this case, a second Global instance lock will overlap with the first and similarly any additional ns-scoped in the same NS overlap.)

The 1 Secret Unsupported Config:

This option is specific to k3s/RKE2 clusters only. We will call it B.2 - because it's option B, but we keep the new "external but embedded" helm-controller enabled.

It is technically not officially supported or consider supported by k3s upstream helm-controller project. As such it may not need to be tested, but because it is possible I wanted to document it here. The configuration is rather easy but does take an additional step and extra attention to detail - this is another reason why the documented option B is preferred to this route.

How?

  1. Identify the version of k3s/RKE2 in use,
  2. Pull up the release notes for that k8s minor version of k3s/RKE2,
  3. Find the specific version in the table and make note of the helm-controller columns value,
  4. During PromFed install set the .Values.helmController.deployment.image.tag value to match,
  5. Also follow other "normal testing steps" for this new mode,
  6. Observe the new PromFed deployment/pod for helm-controller that should use the version you set.

Example: Assume we are using RKE2 v1.33.5+rke2r1 we would look at Release Notes and find that version uses helm-controller of version v0.16.13. That is the version we would set for .Values.helmController.deployment.image.tag. In the future, if upgrading the k8s version of the cluster would change the distro built-in controller version, then an update for the PromFed helm release should be done to match.

Metadata

Metadata

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions