Skip to content

Conversation

@zeroalphat
Copy link
Contributor

No description provided.

@zeroalphat zeroalphat self-assigned this Oct 22, 2025
@zeroalphat zeroalphat added the enhancement New feature or request label Oct 22, 2025
@zeroalphat zeroalphat requested a review from Copilot October 22, 2025 08:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds Prometheus metrics support to both the imageprefetch and nodeimageset controllers, enabling monitoring of image prefetching operations across the cluster.

Key changes include:

  • New metrics package exposing gauges for image availability, pull success/failure counts, image metadata, size, and prefetch duration
  • Enhanced image pull tracking to capture timing and size information
  • Integration of metrics collection in both controller reconciliation loops

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/metrics/metrics.go Defines Prometheus gauge vectors for tracking image prefetch operations and a registration function
internal/imgmanager/image_puller.go Adds timing and size tracking to ImagePullStatus and exposes getter methods for metrics
internal/imgmanager/fake_containerd.go Updates mock to return image size from PullImage
internal/imgmanager/containerd.go Modifies PullImage to return image size alongside error
internal/controller/nodeimageset_controller.go Integrates metrics recording during reconciliation and cleanup on deletion
internal/controller/imageprefetch_controller.go Adds metrics setters/removers and calls them during reconciliation and finalization
go.mod Moves prometheus/client_golang from indirect to direct dependency
config/manager/nodeimageset-controller.yaml Adds metrics server configuration flags
config/manager/imageprefetch-controller.yaml Adds metrics server configuration flags
cmd/nodeimageset-controller/main.go Registers metrics with controller-runtime metrics registry
cmd/imageprefetch-controller/main.go Registers metrics with controller-runtime metrics registry

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@zeroalphat zeroalphat force-pushed the add-metrics branch 2 times, most recently from ce401bc to e0628e1 Compare October 23, 2025 06:36
Signed-off-by: zeroalphat <taichi-takemura@cybozu.co.jp>
Comment on lines +72 to +75
// Delete all metrics for this NodeImageSet
metrics.ImageInfoVec.DeletePartialMatch(map[string]string{"name": nodeImageSet.Name})
metrics.ImageSizeBytesVec.DeletePartialMatch(map[string]string{"name": nodeImageSet.Name})
metrics.ImagePrefetchDurationSecondsVec.DeletePartialMatch(map[string]string{"name": nodeImageSet.Name})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting metrics should be done before removing the finalizer.
If an error occurs during the update, metrics may leak depending on the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I fixed the code to ensure that deleting metrics is done before removing the finalizer.

Comment on lines 12 to 28
ReadyVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_ready",
Help: "1 if the ImagePrefetch resource is ready, 0 otherwise",
}, []string{"namespace", "name"})

ImagePulledNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_successful_nodes",
Help: "Number of nodes where images have been successfully prefetched for thins Imageprefetch",
}, []string{"namespace", "name"})

ImagePullFailedNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_failed_nodes",
Help: "Number of nodes where image prefetching has failed for this ImagePrefetch",
}, []string{"namespace", "name"})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make the label's key and value like kube-state-metrics.
The key is the resource kind, and the label is the resource name.

So then, label_replace becomes unnecessary when combined with other metrics in MetricsQL.

Suggested change
ReadyVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_ready",
Help: "1 if the ImagePrefetch resource is ready, 0 otherwise",
}, []string{"namespace", "name"})
ImagePulledNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_successful_nodes",
Help: "Number of nodes where images have been successfully prefetched for thins Imageprefetch",
}, []string{"namespace", "name"})
ImagePullFailedNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_failed_nodes",
Help: "Number of nodes where image prefetching has failed for this ImagePrefetch",
}, []string{"namespace", "name"})
ReadyVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_ready",
Help: "1 if the ImagePrefetch resource is ready, 0 otherwise",
}, []string{"namespace", "imageprefetch"})
ImagePulledNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_successful_nodes",
Help: "Number of nodes where images have been successfully prefetched for thins Imageprefetch",
}, []string{"namespace", "imageprefetch"})
ImagePullFailedNodesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "imageprefetch_image_pull_failed_nodes",
Help: "Number of nodes where image prefetching has failed for this ImagePrefetch",
}, []string{"namespace", "imageprefetch"})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it to change the metrics labels to be like kube-state-metrics.

Comment on lines 30 to 46
ImageInfoVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_info",
Help: "Information about NodeImageSet image",
}, []string{"name", "image_name", "registry_policy", "node_name"})

ImageSizeBytesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_size_bytes",
Help: "Size of images in NodeImageSets in bytes",
}, []string{"name", "image_name", "node_name"})

ImagePrefetchDurationSecondsVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_prefetch_duration_seconds",
Help: "Duration taken to prefetch images in NodeImageSets in seconds",
}, []string{"name", "image_name", "node_name"})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.
It is better to use similar label keys to kube-state-metrics.

Suggested change
ImageInfoVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_info",
Help: "Information about NodeImageSet image",
}, []string{"name", "image_name", "registry_policy", "node_name"})
ImageSizeBytesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_size_bytes",
Help: "Size of images in NodeImageSets in bytes",
}, []string{"name", "image_name", "node_name"})
ImagePrefetchDurationSecondsVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_prefetch_duration_seconds",
Help: "Duration taken to prefetch images in NodeImageSets in seconds",
}, []string{"name", "image_name", "node_name"})
ImageInfoVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_info",
Help: "Information about NodeImageSet image",
}, []string{"nodeimageset", "image", "registry_policy", "node"})
ImageSizeBytesVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_size_bytes",
Help: "Size of images in NodeImageSets in bytes",
}, []string{"nodeimageset", "image", "node"})
ImagePrefetchDurationSecondsVec = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Namespace: metricsNamespace,
Name: "nodeimageset_image_prefetch_duration_seconds",
Help: "Duration taken to prefetch images in NodeImageSets in seconds",
}, []string{"nodeimageset", "image", "node"})

Signed-off-by: zeroalphat <taichi-takemura@cybozu.co.jp>
@zeroalphat zeroalphat requested a review from masa213f October 27, 2025 00:59
Copy link

@masa213f masa213f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zeroalphat zeroalphat merged commit f5feb52 into main Oct 27, 2025
5 checks passed
@zeroalphat zeroalphat deleted the add-metrics branch October 27, 2025 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants