Metrics Server Operator

A Kubernetes operator for deploying and managing metrics-server using a declarative CRD-based approach. This operator replaces the need for manual Helm deployments or Ansible playbooks by providing a native Kubernetes resource to manage metrics-server installations.

Description

The Metrics Server Operator provides a Kubernetes-native way to deploy and manage metrics-server instances. It handles the complexity of deploying all required resources including:

Deployment - The metrics-server pods with proper security context and resource limits
Service - ClusterIP service for metrics-server communication
ServiceAccount - Dedicated service account with minimal required permissions
RBAC - ClusterRole and ClusterRoleBinding with least-privilege access
APIService - Registration of the metrics.k8s.io/v1beta1 API
PodDisruptionBudget - Optional PDB for high availability deployments

Key Features

🔒 Security-first - Follows Pod Security Standards with locked-down RBAC
🏗️ Production-ready - Comprehensive health checks, monitoring, and observability
🧪 Well-tested - Extensive unit tests and e2e test coverage
📊 Configurable - Flexible configuration options for different environments
🔄 GitOps-friendly - Declarative configuration that works with GitOps workflows
🏢 Enterprise-ready - Multi-architecture support (amd64/arm64) and air-gapped environments

Quick Start

Prerequisites

Kubernetes cluster v1.24+ (for Pod Security Standards support)
kubectl configured to access your cluster
Cluster admin permissions for initial setup

Installation

Install the operator using the latest release:

kubectl apply -f https://github.com/vexxhost/metrics-server-operator/releases/latest/download/install.yaml

Deploy metrics-server

Create a MetricsServer resource:

apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: default-metrics-server
spec:
  # Use defaults - deploys to kube-system namespace
  image: "registry.k8s.io/metrics-server/metrics-server:v0.7.2"
  replicas: 1
  kubeletInsecureTLS: true  # Required for most clusters

Apply the configuration:

kubectl apply -f metricsserver.yaml

Verify the deployment:

# Check the MetricsServer status
kubectl get metricsserver default-metrics-server

# Verify metrics-server is working
kubectl top nodes
kubectl top pods -A

Configuration Examples

Minimal Configuration

apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: simple
spec: {}  # Uses all defaults

Production Configuration

apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: production
spec:
  image: "registry.k8s.io/metrics-server/metrics-server:v0.7.2"
  replicas: 2
  kubeletInsecureTLS: false  # Use secure TLS if supported
  
  resources:
    requests:
      cpu: 10m      # Very conservative for production
      memory: 32Mi  # Minimal memory footprint
    limits:
      cpu: 100m     # Prevents resource spikes
      memory: 128Mi # Conservative memory limit
  
  args:
    - --metric-resolution=30s
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
    - --v=2
  
  # Pod placement constraints
  nodeSelector:
    kubernetes.io/os: linux
  
  tolerations:
    - key: node-role.kubernetes.io/control-plane
      operator: Exists
      effect: NoSchedule
  
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchLabels:
                app.kubernetes.io/component: metrics-server
            topologyKey: kubernetes.io/hostname
  
  # High availability features
  podDisruptionBudget: true
  
  # Monitoring integration
  serviceMonitor: true  # Creates ServiceMonitor for Prometheus
  
  # Custom labels and annotations
  serviceLabels:
    monitoring: "enabled"
  
  serviceAnnotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "10250"
    prometheus.io/scheme: "https"
  
  podLabels:
    environment: "production"
  
  podAnnotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "true"

⚠️ Important: Singleton Constraint

Only one MetricsServer instance is allowed per cluster. This constraint exists because metrics-server registers the cluster-wide v1beta1.metrics.k8s.io APIService, which can only have one backend service.

Enforcement

The operator enforces this constraint at two levels:

1. API Level (Validating Webhook):

🚫 Immediate rejection: Additional MetricsServer creation requests are rejected with an error
⚡ Fast feedback: Users get immediate feedback without creating invalid resources
🛡️ Fail-safe: Webhook validates before any resource is stored in etcd

2. Controller Level (Reconciliation):

✅ First instance: Deploys successfully and remains healthy
❌ Additional instances: Marked as Degraded with reason SingletonViolation (backup enforcement)
🔄 After deletion: New instances can be created once the existing one is removed

Example Violation

# First instance - works fine
kubectl apply -f - <<EOF
apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: primary
spec: {}
EOF

# Second instance - will be rejected immediately by webhook
kubectl apply -f - <<EOF
apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: secondary  # ❌ This will fail at API level
spec: {}
EOF

# Output will show immediate rejection:
# error validating data: ValidationError(MetricsServer): singleton constraint violation: 
# MetricsServer instance 'primary' already exists. Only one MetricsServer is allowed per cluster...

Migration Strategy

To replace an existing MetricsServer:

Delete the old instance: kubectl delete metricsserver old-name
Wait for deletion to complete: kubectl wait --for=delete metricsserver old-name --timeout=60s
Create the new instance: The webhook will now allow the new instance to be created

Configuration Reference

MetricsServerSpec

Field	Type	Default	Description
`image`	string	`registry.k8s.io/metrics-server/metrics-server:v0.7.2`	Container image for metrics-server
`replicas`	int32	`1`	Number of replicas for the deployment
`kubeletInsecureTLS`	bool	`true`	Skip TLS verification when connecting to kubelets
`priorityClassName`	string	`system-cluster-critical`	Priority class for metrics-server pods
`hostNetwork`	bool	`false`	Enable host networking for pods
`resources`	ResourceRequirements	See defaults	Resource limits and requests
`args`	[]string	See defaults	Additional command-line arguments
`nodeSelector`	map[string]string	`{}`	Node selection constraints
`tolerations`	[]Toleration	`[]`	Pod tolerations
`affinity`	Affinity	`nil`	Pod affinity constraints
`podDisruptionBudget`	bool	`false`	Create a PodDisruptionBudget
`serviceMonitor`	bool	`false`	Create a ServiceMonitor for Prometheus
`serviceLabels`	map[string]string	`{}`	Additional labels for the service
`serviceAnnotations`	map[string]string	`{}`	Additional annotations for the service
`podLabels`	map[string]string	`{}`	Additional labels for pods
`podAnnotations`	map[string]string	`{}`	Additional annotations for pods
`serviceAccountAnnotations`	map[string]string	`{}`	Additional annotations for the service account

Default Arguments

The operator sets these default arguments for metrics-server:

--cert-dir=/tmp
--secure-port=10250
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
--kubelet-insecure-tls (if kubeletInsecureTLS: true)

Additional arguments can be provided via the args field.

Default Resource Limits

The operator uses conservative, production-ready resource defaults to prevent runaway resource consumption:

Metrics-Server Defaults:

CPU Request: 10m (minimal baseline)
Memory Request: 32Mi (small memory footprint)
CPU Limit: 100m (prevents CPU spikes)
Memory Limit: 128Mi (conservative memory ceiling)

Operator Defaults:

CPU Request: 10m (minimal baseline)
Memory Request: 32Mi (small memory footprint)
CPU Limit: 100m (prevents CPU spikes)
Memory Limit: 128Mi (conservative memory ceiling)
Ephemeral Storage: 128Mi request, 256Mi limit

These defaults are suitable for most production clusters and can be overridden via the resources field if higher limits are needed for large clusters.

Development

Prerequisites

Go 1.23+
Docker 17.03+
kubectl v1.11.3+
Access to a Kubernetes v1.24+ cluster
operator-sdk v1.34+

Local Development

# Clone the repository
git clone https://github.com/vexxhost/metrics-server-operator.git
cd metrics-server-operator

# Install dependencies
go mod download

# Generate manifests and code
make manifests generate

# Run tests
make test

# Install CRDs
make install

# Run the operator locally (connects to your current kubectl context)
make run

Building and Deployment

# Build and push the container image
make docker-build docker-push IMG=your-registry/metrics-server-operator:tag

# Deploy to cluster
make deploy IMG=your-registry/metrics-server-operator:tag

# Create a sample MetricsServer
kubectl apply -f config/samples/core_v1alpha1_metricsserver.yaml

Testing with Kind

Create a local Kubernetes cluster for testing:

# Install Kind
go install sigs.k8s.io/kind@latest

# Create cluster
kind create cluster --name metrics-server-test

# Load your image (if testing local builds)
kind load docker-image your-registry/metrics-server-operator:tag --name metrics-server-test

# Deploy and test
make install
make deploy IMG=your-registry/metrics-server-operator:tag
kubectl apply -f config/samples/core_v1alpha1_metricsserver.yaml

# Verify metrics are working
kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=metrics-server -n kube-system --timeout=300s
kubectl top nodes

# Cleanup
kind delete cluster --name metrics-server-test

Running Tests

# Unit tests
make test

# E2E tests (requires a running cluster)
make test-e2e

# Test with coverage
make test COVERAGE=true

Troubleshooting

Common Issues

1. APIService not available

# Check APIService status
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml

# Check metrics-server pod logs
kubectl logs -l app.kubernetes.io/component=metrics-server -n kube-system

2. Kubelet connection issues

If you see TLS errors, ensure kubeletInsecureTLS: true is set:

spec:
  kubeletInsecureTLS: true

3. Resource not found errors

Ensure the CRDs are installed:

kubectl get crd metricsservers.observability.vexxhost.dev

4. Singleton constraint violation

With the validating webhook enabled, creation of additional MetricsServer instances is blocked at the API level:

# If you see this error during kubectl apply:
# error validating data: ValidationError(MetricsServer): singleton constraint violation

# List all MetricsServer instances
kubectl get metricsserver

# Delete the existing instance to create a new one
kubectl delete metricsserver existing-instance-name
kubectl wait --for=delete metricsserver existing-instance-name --timeout=60s

# Now you can create the new instance
kubectl apply -f your-new-metricsserver.yaml

If your MetricsServer shows Degraded=True with Reason=SingletonViolation (backup enforcement):

# Check which instance is conflicting
kubectl get metricsserver YOUR-INSTANCE -o yaml | grep -A 5 "conditions:"

# Delete the unwanted instance
kubectl delete metricsserver unwanted-instance-name

Only one MetricsServer is allowed per cluster due to APIService constraints.

5. Webhook issues

If the validating webhook is not working (e.g., you can create multiple MetricsServer instances):

# Check if the webhook is registered
kubectl get validatingwebhookconfigurations

# Check webhook service and endpoints
kubectl get service metrics-server-operator-webhook-service -n metrics-server-operator-system
kubectl get endpoints metrics-server-operator-webhook-service -n metrics-server-operator-system

# Check operator pod is ready and webhook server is running
kubectl get pods -n metrics-server-operator-system
kubectl logs deployment/metrics-server-operator-controller-manager -n metrics-server-operator-system | grep webhook

6. RBAC permission errors

Check that the operator has proper cluster permissions:

kubectl auth can-i '*' '*' --as=system:serviceaccount:metrics-server-operator-system:metrics-server-operator-controller-manager

Debug Commands

# Check operator logs
kubectl logs -f deployment/metrics-server-operator-controller-manager -n metrics-server-operator-system

# Check MetricsServer status
kubectl get metricsserver -o yaml

# Check all created resources
kubectl get all -n kube-system -l app.kubernetes.io/managed-by=metrics-server-operator

# Test metrics endpoint directly
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"

Migration from Helm/Ansible

If you're currently using Helm or Ansible to deploy metrics-server, here's how to migrate:

From Helm

Uninstall existing Helm deployment:

helm uninstall metrics-server -n kube-system

Install the operator:

kubectl apply -f https://github.com/vexxhost/metrics-server-operator/releases/latest/download/install.yaml

Create MetricsServer resource with equivalent configuration to your Helm values.

From Ansible (atmosphere.common)

This operator is designed as a direct replacement for the metrics_server role from atmosphere.common. The default configuration provides the same behavior:

apiVersion: observability.vexxhost.dev/v1alpha1
kind: MetricsServer
metadata:
  name: default
spec:
  image: "registry.k8s.io/metrics-server/metrics-server:v0.7.2"
  kubeletInsecureTLS: true

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

Fork the repository
Create a feature branch
Make your changes
Add tests for new functionality
Ensure all tests pass
Submit a pull request

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct.

Security

For security issues, please see our Security Policy.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Acknowledgments

Kubernetes SIG Instrumentation for metrics-server
Operator SDK for the operator framework
Kubebuilder for the controller-runtime framework

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.devcontainer		.devcontainer
.direnv		.direnv
.github/workflows		.github/workflows
api/v1alpha1		api/v1alpha1
cmd		cmd
config		config
dist		dist
hack		hack
internal		internal
test		test
.dockerignore		.dockerignore
.envrc		.envrc
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
go.mod		go.mod
go.sum		go.sum

vexxhost/metrics-server-operator

Folders and files

Latest commit

History

Repository files navigation

Metrics Server Operator

Description

Key Features

Quick Start

Prerequisites

Installation

Deploy metrics-server

Configuration Examples

Minimal Configuration

Production Configuration

⚠️ Important: Singleton Constraint

Enforcement

Example Violation

Migration Strategy

Configuration Reference

MetricsServerSpec

Default Arguments

Default Resource Limits

Development

Prerequisites

Local Development

Building and Deployment

Testing with Kind

Running Tests

Troubleshooting

Common Issues

Debug Commands

Migration from Helm/Ansible

From Helm

From Ansible (atmosphere.common)

Contributing

Development Workflow

Code of Conduct

Security

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages