Skip to content

Commit 19d7941

Browse files
authored
feat: generate a self-signed cert for registry addon (#1127)
**What problem does this PR solve?**: This PR adds the functionality to set a self-signed certificate for the registry addon. A new `Certificate` deployed by the Helm chart generates a 10 year CA Secret `registry-addon-root-ca` in the namespace where this controller is running. The `ca.crt` data is copied into a Cluster specific Secret (with an OwnerRef set) in a `BeforeClusterCreate` hook and is then used by the mirror handler. This CA is then used to sign a new 2 year TLS certificate for every cluster that is copied to the remote cluster `registry-system/registry-tls` and used by the registry Pods. A new cert is generated and copied on every invocation of the handler, on `BeforeClusterUpgrade` and `AfterControlPlaneInitialized` lifecycle hooks, so that we can punt on auto renewal for now. **Which issue(s) this PR fixes**: Fixes # **How Has This Been Tested?**: <!-- Please describe the tests that you ran to verify your changes. Provide output from the tests and any manual steps needed to replicate the tests. --> 1. Tested manually by create a Docker cluster and verifying I can push an image into the registry, observe it get synced across the replicas, and then get pulled with Containerd. ``` $ kubectl get secret -n registry-system NAME TYPE DATA AGE cncf-distribution-registry-docker-registry-secret Opaque 3 5h16m registry-tls kubernetes.io/tls 3 5h16m sh.helm.release.v1.cncf-distribution-registry.v1 helm.sh/release.v1 1 5h16m $ kubectl port-forward --address=127.0.0.1 --namespace registry-system pod/cncf-distribution-registry-docker-registry-0 5000:5000 $ crane copy nginx:stable 127.0.0.1:5000/library/nginx:dkoshkin --insecure $ kubectl run nginx-working --image=docker.io/library/nginx:dkoshkin $ kubectl get pods NAME READY STATUS RESTARTS AGE cluster-autoscaler-0196db54-35b0-73fd-ad5b-14f998751820-7bzwnm9 0/1 ContainerCreating 0 5h15m nginx-working 1/1 Running 0 53m ``` 2. New integration tests. 3. Existing e2e tests already wait for the STS Pods to be Ready which won't happen unless the TLS Secret is there and is valid. **Special notes for your reviewer**: <!-- Use this to provide any additional information to the reviewers. This may include: - Best way to review the PR. - Where the author wants the most review attention on. - etc. --> I've considered other approaches here: 1. Using cert-manager to generate a unique CA per cluster. This makes it more difficult for clients to trust the registries when pushing images to multiple clusters. 2. Use cert-manager to also generate the TLS certificate, it got pretty complicated to trigger a renewal and needing to wait for it to reconcile the cert Secret. Using cert-manager was also not providing much value as we want better control when the certs get rotated - 118b39e. 3. Use the CAPI generated CA cert to sign a new cert, but decided against using an externally managed CA for a separate usecase.
1 parent ecca3b0 commit 19d7941

File tree

22 files changed

+1511
-126
lines changed

22 files changed

+1511
-126
lines changed
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
replicaCount: 2
1+
replicaCount: {{ .Replicas }}
22
persistence:
33
enabled: true
44
size: 50Gi
55
service:
66
type: ClusterIP
77
clusterIP: {{ .ServiceIP }}
8-
port: 80
8+
port: 443
99
statefulSet:
1010
enabled: true
1111
syncer:
1212
interval: 2m
13+
tlsSecretName: {{ .TLSSecretName }}

charts/cluster-api-runtime-extensions-nutanix/templates/certificates.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,3 +32,20 @@ spec:
3232
kind: {{ .Values.certificates.issuer.kind }}
3333
name: {{ template "chart.issuerName" . }}
3434
secretName: {{ template "chart.name" . }}-admission-tls
35+
---
36+
# CA used to sign certificates for the clusters' registry addons
37+
apiVersion: cert-manager.io/v1
38+
kind: Certificate
39+
metadata:
40+
name: registry-addon-root-ca
41+
namespace: {{ .Release.Namespace }}
42+
labels:
43+
{{- include "chart.labels" . | nindent 4 }}
44+
spec:
45+
isCA: true
46+
commonName: registry-addon
47+
secretName: registry-addon-root-ca
48+
issuerRef:
49+
kind: {{ .Values.certificates.issuer.kind }}
50+
name: {{ template "chart.issuerName" . }}
51+
duration: 87600h # 10 years

charts/cluster-api-runtime-extensions-nutanix/templates/helm-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ data:
2525
RepositoryURL: '{{ if .Values.helmRepository.enabled }}oci://helm-repository.{{ .Release.Namespace }}.svc/charts{{ else }}https://kubernetes.github.io/autoscaler{{ end }}'
2626
cncf-distribution-registry: |
2727
ChartName: docker-registry
28-
ChartVersion: 2.3.1
28+
ChartVersion: 2.3.2
2929
RepositoryURL: '{{ if .Values.helmRepository.enabled }}oci://helm-repository.{{ .Release.Namespace }}.svc/charts{{ else }}https://mesosphere.github.io/charts/staging/{{ end }}'
3030
cosi-controller: |
3131
ChartName: cosi
80.7 KB
Loading

docs/content/addons/registry.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,24 @@ spec:
3131
registry: {}
3232
```
3333
34+
## Registry Certificate
35+
36+
1. A root CA Certificate is deployed in the provider's namespace.
37+
2. cert-manager generates a 10-year self-signed root Certificate
38+
and creates a Secret `registry-addon-root-ca` in the provider's namespace.
39+
3. BCC handler copies `ca.crt` from the `registry-addon-root-ca` Secret
40+
to a new cluster Secret `<cluster-name>-registry-addon-ca`.
41+
A client pushing to the registry can use either the root CA Secret or the cluster Secret to trust the registry.
42+
4. The cluster CA Secret contents (`ca.crt`) is written out as files on the Nodes
43+
and used by Containerd to trust the registry addon.
44+
5. During the initial cluster creation, the ACPI handler uses the root CA to create a new 2-year server certificate
45+
for the registry and creates a Secret `registry-tls` on the remote cluster.
46+
6. During cluster upgrades, the BCU handler renews the server certificate
47+
and updates the Secret `registry-tls` on the remote cluster with the new certificate.
48+
It is expected that clusters will be upgraded at least once every 2 years to avoid certificate expiration.
49+
50+
![registry-certificate.png](registry-certificate.png)
51+
3452
[Distribution]: https://github.com/distribution/distribution
3553
[Cluster API Add-on Provider for Helm]: https://github.com/kubernetes-sigs/cluster-api-addon-provider-helm
3654
[Regsync]: https://regclient.org/usage/regsync/

hack/addons/helm-chart-bundler/repos.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ repositories:
3535
repoURL: https://mesosphere.github.io/charts/staging/
3636
charts:
3737
docker-registry:
38-
- 2.3.1
38+
- 2.3.2
3939
local-path-provisioner:
4040
repoURL: https://charts.containeroo.ch
4141
charts:

hack/addons/kustomize/cncf-distribution-registry/kustomization.yaml.tmpl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ helmCharts:
1818
- name: docker-registry
1919
repo: https://mesosphere.github.io/charts/staging/
2020
releaseName: cncf-distribution-registry
21-
version: 2.3.1
21+
version: 2.3.2
2222
valuesFile: helm-values.yaml
2323
includeCRDs: true
2424
skipTests: true

pkg/handlers/generic/lifecycle/registry/cncfdistribution/handler.go

Lines changed: 93 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ import (
2424
const (
2525
DefaultHelmReleaseName = "cncf-distribution-registry"
2626
DefaultHelmReleaseNamespace = "registry-system"
27+
28+
stsName = "cncf-distribution-registry-docker-registry"
29+
stsHeadlessServiceName = "cncf-distribution-registry-docker-registry-headless"
30+
stsReplicas = 2
31+
tlsSecretName = "registry-tls"
2732
)
2833

2934
type Config struct {
@@ -59,14 +64,64 @@ func New(
5964
}
6065
}
6166

67+
// Setup ensures any pre-requisites for the CNCF Distribution registry addon are met.
68+
// It is expected to be called before the cluster is created.
69+
// Specifically, it ensures that the CA secret for the registry is created in the cluster's namespace.
70+
func (n *CNCFDistribution) Setup(
71+
ctx context.Context,
72+
_ v1alpha1.RegistryAddon,
73+
cluster *clusterv1.Cluster,
74+
log logr.Logger,
75+
) error {
76+
log.Info("Setting up CA for CNCF Distribution registry")
77+
err := utils.EnsureCASecretForCluster(
78+
ctx,
79+
n.client,
80+
cluster,
81+
)
82+
if err != nil {
83+
return fmt.Errorf("failed to ensure CA secret for CNCF Distribution registry addon: %w", err)
84+
}
85+
return nil
86+
}
87+
88+
// Apply applies the CNCF Distribution registry addon to the cluster.
6289
func (n *CNCFDistribution) Apply(
6390
ctx context.Context,
6491
_ v1alpha1.RegistryAddon,
6592
cluster *clusterv1.Cluster,
6693
log logr.Logger,
6794
) error {
68-
log.Info("Applying CNCF Distribution registry installation")
95+
// Copy the TLS secret to the remote cluster.
96+
serviceIP, err := utils.ServiceIPForCluster(cluster)
97+
if err != nil {
98+
return fmt.Errorf("error getting service IP for the CNCF distribution registry: %w", err)
99+
}
100+
opts := &utils.EnsureCertificateOpts{
101+
RemoteSecretKey: ctrlclient.ObjectKey{
102+
Name: tlsSecretName,
103+
Namespace: DefaultHelmReleaseNamespace,
104+
},
105+
Spec: utils.CertificateSpec{
106+
CommonName: stsName,
107+
DNSNames: certificateDNSNames(),
108+
IPAddresses: certificateIPAddresses(serviceIP),
109+
},
110+
}
111+
err = utils.EnsureRegistryServerCertificateSecretOnRemoteCluster(
112+
ctx,
113+
n.client,
114+
cluster,
115+
opts,
116+
)
117+
if err != nil {
118+
return fmt.Errorf(
119+
"failed to copy certificate secret for CNCF Distribution registry addon to remote cluster: %w",
120+
err,
121+
)
122+
}
69123

124+
log.Info("Applying CNCF Distribution registry installation")
70125
helmChartInfo, err := n.helmChartInfoGetter.For(ctx, log, config.CNCFDistributionRegistry)
71126
if err != nil {
72127
return fmt.Errorf("failed to get CNCF Distribution registry helm chart: %w", err)
@@ -101,11 +156,15 @@ func templateValues(cluster *clusterv1.Cluster, text string) (string, error) {
101156
}
102157

103158
type input struct {
104-
ServiceIP string
159+
ServiceIP string
160+
Replicas int32
161+
TLSSecretName string
105162
}
106163

107164
templateInput := input{
108-
ServiceIP: serviceIP,
165+
Replicas: stsReplicas,
166+
ServiceIP: serviceIP,
167+
TLSSecretName: tlsSecretName,
109168
}
110169

111170
var b bytes.Buffer
@@ -119,3 +178,34 @@ func templateValues(cluster *clusterv1.Cluster, text string) (string, error) {
119178

120179
return b.String(), nil
121180
}
181+
182+
func certificateDNSNames() []string {
183+
names := []string{
184+
stsName,
185+
fmt.Sprintf("%s.%s", stsName, DefaultHelmReleaseNamespace),
186+
fmt.Sprintf("%s.%s.svc", stsName, DefaultHelmReleaseNamespace),
187+
fmt.Sprintf("%s.%s.svc.cluster.local", stsName, DefaultHelmReleaseNamespace),
188+
}
189+
for i := 0; i < stsReplicas; i++ {
190+
names = append(names,
191+
[]string{
192+
fmt.Sprintf("%s-%d", stsName, i),
193+
fmt.Sprintf("%s-%d.%s.%s", stsName, i, stsHeadlessServiceName, DefaultHelmReleaseNamespace),
194+
fmt.Sprintf("%s-%d.%s.%s.svc", stsName, i, stsHeadlessServiceName, DefaultHelmReleaseNamespace),
195+
fmt.Sprintf(
196+
"%s-%d.%s.%s.svc.cluster.local",
197+
stsName, i, stsHeadlessServiceName, DefaultHelmReleaseNamespace,
198+
),
199+
}...,
200+
)
201+
}
202+
203+
return names
204+
}
205+
206+
func certificateIPAddresses(serviceIP string) []string {
207+
return []string{
208+
serviceIP,
209+
"127.0.0.1",
210+
}
211+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// Copyright 2025 Nutanix. All rights reserved.
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package cncfdistribution
5+
6+
import (
7+
"testing"
8+
9+
"github.com/stretchr/testify/assert"
10+
)
11+
12+
func Test_certificateDNSNames(t *testing.T) {
13+
//nolint:lll // Keep long lines for readability.
14+
expected := []string{
15+
"cncf-distribution-registry-docker-registry",
16+
"cncf-distribution-registry-docker-registry.registry-system",
17+
"cncf-distribution-registry-docker-registry.registry-system.svc",
18+
"cncf-distribution-registry-docker-registry.registry-system.svc.cluster.local",
19+
"cncf-distribution-registry-docker-registry-0",
20+
"cncf-distribution-registry-docker-registry-0.cncf-distribution-registry-docker-registry-headless.registry-system",
21+
"cncf-distribution-registry-docker-registry-0.cncf-distribution-registry-docker-registry-headless.registry-system.svc",
22+
"cncf-distribution-registry-docker-registry-0.cncf-distribution-registry-docker-registry-headless.registry-system.svc.cluster.local",
23+
"cncf-distribution-registry-docker-registry-1",
24+
"cncf-distribution-registry-docker-registry-1.cncf-distribution-registry-docker-registry-headless.registry-system",
25+
"cncf-distribution-registry-docker-registry-1.cncf-distribution-registry-docker-registry-headless.registry-system.svc",
26+
"cncf-distribution-registry-docker-registry-1.cncf-distribution-registry-docker-registry-headless.registry-system.svc.cluster.local",
27+
}
28+
assert.Equal(t, expected, certificateDNSNames())
29+
}

pkg/handlers/generic/lifecycle/registry/handler.go

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,12 @@ import (
2020
)
2121

2222
type RegistryProvider interface {
23+
Setup(
24+
ctx context.Context,
25+
registryVar v1alpha1.RegistryAddon,
26+
cluster *clusterv1.Cluster,
27+
log logr.Logger,
28+
) error
2329
Apply(
2430
ctx context.Context,
2531
registryVar v1alpha1.RegistryAddon,
@@ -37,6 +43,7 @@ type RegistryHandler struct {
3743

3844
var (
3945
_ commonhandlers.Named = &RegistryHandler{}
46+
_ lifecycle.BeforeClusterCreate = &RegistryHandler{}
4047
_ lifecycle.AfterControlPlaneInitialized = &RegistryHandler{}
4148
_ lifecycle.BeforeClusterUpgrade = &RegistryHandler{}
4249
)
@@ -57,6 +64,17 @@ func (r *RegistryHandler) Name() string {
5764
return "RegistryHandler"
5865
}
5966

67+
func (r *RegistryHandler) BeforeClusterCreate(
68+
ctx context.Context,
69+
req *runtimehooksv1.BeforeClusterCreateRequest,
70+
resp *runtimehooksv1.BeforeClusterCreateResponse,
71+
) {
72+
commonResponse := &runtimehooksv1.CommonResponse{}
73+
r.setup(ctx, &req.Cluster, commonResponse)
74+
resp.Status = commonResponse.GetStatus()
75+
resp.Message = commonResponse.GetMessage()
76+
}
77+
6078
func (r *RegistryHandler) AfterControlPlaneInitialized(
6179
ctx context.Context,
6280
req *runtimehooksv1.AfterControlPlaneInitializedRequest,
@@ -79,6 +97,91 @@ func (r *RegistryHandler) BeforeClusterUpgrade(
7997
resp.Message = commonResponse.GetMessage()
8098
}
8199

100+
func (r *RegistryHandler) setup(
101+
ctx context.Context,
102+
cluster *clusterv1.Cluster,
103+
resp *runtimehooksv1.CommonResponse,
104+
) {
105+
clusterKey := ctrlclient.ObjectKeyFromObject(cluster)
106+
107+
log := ctrl.LoggerFrom(ctx).WithValues(
108+
"cluster",
109+
clusterKey,
110+
)
111+
112+
varMap := variables.ClusterVariablesToVariablesMap(cluster.Spec.Topology.Variables)
113+
registryVar, err := variables.Get[v1alpha1.RegistryAddon](
114+
varMap,
115+
r.variableName,
116+
r.variablePath...)
117+
if err != nil {
118+
if variables.IsNotFoundError(err) {
119+
log.V(5).
120+
Info(
121+
"Skipping RegistryAddon, field is not specified",
122+
"error",
123+
err,
124+
)
125+
return
126+
}
127+
log.Error(
128+
err,
129+
"failed to read RegistryAddon provider from cluster definition",
130+
)
131+
resp.SetStatus(runtimehooksv1.ResponseStatusFailure)
132+
resp.SetMessage(
133+
fmt.Sprintf("failed to read RegistryAddon provider from cluster definition: %v",
134+
err,
135+
),
136+
)
137+
return
138+
}
139+
140+
handler, ok := r.ProviderHandler[registryVar.Provider]
141+
if !ok {
142+
err = fmt.Errorf("unknown RegistryAddon Provider")
143+
log.Error(err, "provider", registryVar.Provider)
144+
resp.SetStatus(runtimehooksv1.ResponseStatusFailure)
145+
resp.SetMessage(
146+
fmt.Sprintf("%s %s", err, registryVar.Provider),
147+
)
148+
return
149+
}
150+
151+
log.Info(fmt.Sprintf("Setting up RegistryAddon provider prerequisites %s", registryVar.Provider))
152+
err = handler.Setup(
153+
ctx,
154+
registryVar,
155+
cluster,
156+
log,
157+
)
158+
if err != nil {
159+
log.Error(
160+
err,
161+
fmt.Sprintf(
162+
"failed to set up RegistryAddon provider prerequisites %s",
163+
registryVar.Provider,
164+
),
165+
)
166+
resp.SetStatus(runtimehooksv1.ResponseStatusFailure)
167+
resp.SetMessage(
168+
fmt.Sprintf(
169+
"failed to set up RegistryAddon provider prerequisites: %v",
170+
err,
171+
),
172+
)
173+
return
174+
}
175+
176+
resp.SetStatus(runtimehooksv1.ResponseStatusSuccess)
177+
resp.SetMessage(
178+
fmt.Sprintf(
179+
"Set up RegistryAddon provider prerequisites %s",
180+
registryVar.Provider,
181+
),
182+
)
183+
}
184+
82185
func (r *RegistryHandler) apply(
83186
ctx context.Context,
84187
cluster *clusterv1.Cluster,

0 commit comments

Comments
 (0)