-
Notifications
You must be signed in to change notification settings - Fork 191
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Required prerequisites
- I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
- I have tried the latest version of nvitop in a new isolated virtual environment.
Motivation
Running nvitop-exporter on Kubernetes currently requires users to manually craft their own Deployment, Service, and RBAC manifests. This adds friction for teams adopting nvitop-exporter in production environments — especially for those managing multiple clusters — because:
- Consistency: Every team must reinvent their own manifests, increasing the risk of misconfiguration.
- Deployability: Lack of an official, standardized setup makes automation (e.g., GitOps with ArgoCD/Flux) harder.
- Scalability: For clusters with multiple GPU nodes, users need a DaemonSet setup so the exporter runs on every node automatically, which can be tricky to get right without official guidance.
- Adoption barrier: Many users expect Helm charts or ready-to-apply manifests as the default installation method for Kubernetes-native software.
Having an official deployment option lowers the barrier to entry, encourages wider adoption, and ensures the exporter is deployed in a secure, scalable, and repeatable way.
Solution
Provide one (or both) of the following:
- Helm Chart:
- A simple Helm chart with:
- DaemonSet (for node-level GPU monitoring)
- Service / ServiceMonitor (for Prometheus scraping)
- Configurable tolerations / node selectors (to target GPU nodes only)
- RBAC manifests for least-privilege access (if needed)
- Values.yaml for easy configuration (port, scrape interval, resource requests/limits, etc.)
- Kubernetes Manifests:
- A set of static YAML manifests (DaemonSet, Service, RBAC) published under deploy/kubernetes/ in the repo for those who prefer Kustomize or manual installation.
I’d be happy to contribute a first version of this Helm chart (or manifests) as a PR, following best practices (namespace-scoped, Prometheus compatibility, minimal privileges).
Alternatives
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request