Replies: 1 comment
-
I hope by 'inference service level', you meant the server/cluster that is running tritonserver. Seems like Triton Server does provide those metrics you mentioned. Have you seen it? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi team,
So we are using ModelMesh, and we are able to scrape all the standard metrics. However, we are not seeing GPU/CPU utilization metrics per inference service. We do get metrics like nv_inference_request_success, but there doesn’t seem to be an equivalent for GPU/CPU utilization at the inference service level.
Is there a way to track GPU utilization per inference service, or is there a recommended approach to achieve this?
Beta Was this translation helpful? Give feedback.
All reactions