Skip to content

Commit e8e80bc

Browse files
RUN-17720 add table again
1 parent 0a7b9fc commit e8e80bc

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

docs/developer/metrics/metrics.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,40 @@ Run:ai exports other metrics emitted by NVIDIA and Kubernetes packages, as follo
111111

112112
For additional information, see Kubernetes [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics){target=_blank} and NVIDIA [dcgm exporter](https://github.com/NVIDIA/gpu-monitoring-tools){target=_blank}.
113113

114+
## Metrics APIs
115+
116+
Starting in version 2.17, Run:ai metrics are available as API endpoints. Using the API endpoints is more efficient and provides an easier way of retrieving metrics in any application. The following table lists the metrics that were changed.
117+
118+
| 2.16 | 2.17 | API Endpoint |
119+
| --- | --- | --- |
120+
| runai\_active\_job\_cpu\_requested\_cores | chaned to API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_REQUEST" metricType |
121+
| runai\_active\_job\_memory\_requested\_bytes | chaned to API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_REQUEST" metricType |
122+
| runai\_cluster\_cpu\_utilization | chaned to API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_UTILIZATION" metricType |
123+
| runai\_cluster\_memory\_utilization | chaned to API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_MEMORY\_UTILIZATION" metricType |
124+
| runai\_gpu\_utilization\_non\_fractional\_jobs | no longer available | |
125+
| runai\_allocated\_gpu\_count\_per\_workload | labels changed | |
126+
| runai\_gpu\_utilization\_per\_pod\_per\_gpu | chaned to API | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_UTILIZATION\_PER\_GPU" metricType |
127+
| runai\_gpu\_utilization\_per\_workload | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_UTILIZATION" metricType |
128+
| runai\_job\_image | no longer available | |
129+
| runai\_job\_requested\_gpu\_memory | chaned to API + renamed to: "runai\_requested\_gpu\_memory\_mb\_per\_workload" with different labels | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_REQUEST" metricType |
130+
| runai\_job\_requested\_gpus | renamed to: "runai\_requested\_gpus\_per\_workload" with different labels | |
131+
| runai\_job\_total\_runtime | renamed to: "runai\_run\_time\_seconds\_per\_workload" with different labels | |
132+
| runai\_job\_total\_wait\_time | renamed to: "runai\_wait\_time\_seconds\_per\_workload" with different labels | |
133+
| runai\_gpu\_memory\_used\_mebibytes\_per\_workload | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_USAGE" metricType |
134+
| runai\_gpu\_memory\_used\_mebibytes\_per\_pod\_per\_gpu | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_MEMORY\_USAGE\_PER\_GPU" metricType |
135+
| runai\_node\_gpu\_used\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_used\_mebibytes\_per\_node" | |
136+
| runai\_node\_total\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_total\_mebibytes\_per\_node" | |
137+
| runai\_project\_info | labels changed | |
138+
| runai\_active\_job\_cpu\_limits | chaned to API + renamed to: "runai\_cpu\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_LIMIT" metricType |
139+
| runai\_job\_cpu\_usage | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_USAGE" metricType |
140+
| runai\_active\_job\_memory\_limits | chaned to API + renamed to: "runai\_memory\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_LIMIT" metricType |
141+
| runai\_running\_job\_memory\_requested\_bytes | was a duplication of "runai\_active\_job\_memory\_requested\_bytes", see above | |
142+
| runai\_job\_memory\_used\_bytes | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_USAGE" metricType |
143+
| runai\_job\_swap\_memory\_used\_bytes | no longer available | |
144+
| runai\_gpu\_count\_per\_node | added labels | |
145+
| runai\_last\_gpu\_utilization\_time\_per\_workload | labels changed | |
146+
| runai\_gpu\_idle\_time\_per\_workload | renamed to: "runai\_gpu\_idle\_seconds\_per\_workload" with different labels | |
147+
114148
## Create custom dashboards
115149

116150
To create custom dashboards based on the above metrics, please contact Run:ai customer support.

0 commit comments

Comments
 (0)