GPU Utilization Metrics per Inference Service in ModelMesh #8127

cbisht31 · 2025-04-03T15:16:43Z

cbisht31
Apr 3, 2025

Hi team,
So we are using ModelMesh, and we are able to scrape all the standard metrics. However, we are not seeing GPU/CPU utilization metrics per inference service. We do get metrics like nv_inference_request_success, but there doesn’t seem to be an equivalent for GPU/CPU utilization at the inference service level.

Is there a way to track GPU utilization per inference service, or is there a recommended approach to achieve this?

scotgopal · 2025-04-17T19:35:36Z

scotgopal
Apr 17, 2025

I hope by 'inference service level', you meant the server/cluster that is running tritonserver. Seems like Triton Server does provide those metrics you mentioned. Have you seen it?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU Utilization Metrics per Inference Service in ModelMesh #8127

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GPU Utilization Metrics per Inference Service in ModelMesh #8127

Uh oh!

cbisht31 Apr 3, 2025

Replies: 1 comment

Uh oh!

scotgopal Apr 17, 2025

cbisht31
Apr 3, 2025

scotgopal
Apr 17, 2025