Skip to content

Commit 7298a40

Browse files
Merge pull request #763 from lavianalon/v2.17
Some additions to the metrics change
2 parents fda6454 + 51dc279 commit 7298a40

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

docs/developer/metrics/metrics.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -112,37 +112,37 @@ For additional information, see Kubernetes [kube-state-metrics](https://github.c
112112

113113
## Changed metrics and API mapping
114114

115-
Starting in version 2.17, Run:ai metrics are available as API endpoints. Using the API endpoints is more efficient and provides an easier way of retrieving metrics in any application. The following table lists the metrics that were changed.
116-
117-
| Metric name in version 2.16 | 2.17 Change Description | 2.17 API Endpoint |
118-
| --- | --- | --- |
119-
| runai\_active\_job\_cpu\_requested\_cores | changed to API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_REQUEST" metricType |
120-
| runai\_active\_job\_memory\_requested\_bytes | changed to API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_REQUEST" metricType |
121-
| runai\_cluster\_cpu\_utilization | changed to API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_UTILIZATION" metricType |
122-
| runai\_cluster\_memory\_utilization | changed to API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_MEMORY\_UTILIZATION" metricType |
123-
| runai\_gpu\_utilization\_non\_fractional\_jobs | no longer available | |
124-
| runai\_allocated\_gpu\_count\_per\_workload | labels changed | |
125-
| runai\_gpu\_utilization\_per\_pod\_per\_gpu | changed to API | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_UTILIZATION\_PER\_GPU" metricType |
126-
| runai\_gpu\_utilization\_per\_workload | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_UTILIZATION" metricType |
127-
| runai\_job\_image | no longer available | |
128-
| runai\_job\_requested\_gpu\_memory | changed to API + renamed to: "runai\_requested\_gpu\_memory\_mb\_per\_workload" with different labels | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_REQUEST" metricType |
129-
| runai\_job\_requested\_gpus | renamed to: "runai\_requested\_gpus\_per\_workload" with different labels | |
130-
| runai\_job\_total\_runtime | renamed to: "runai\_run\_time\_seconds\_per\_workload" with different labels | |
131-
| runai\_job\_total\_wait\_time | renamed to: "runai\_wait\_time\_seconds\_per\_workload" with different labels | |
132-
| runai\_gpu\_memory\_used\_mebibytes\_per\_workload | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_USAGE" metricType |
133-
| runai\_gpu\_memory\_used\_mebibytes\_per\_pod\_per\_gpu | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_MEMORY\_USAGE\_PER\_GPU" metricType |
134-
| runai\_node\_gpu\_used\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_used\_mebibytes\_per\_node" | |
135-
| runai\_node\_total\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_total\_mebibytes\_per\_node" | |
136-
| runai\_project\_info | labels changed | |
137-
| runai\_active\_job\_cpu\_limits | changed to API + renamed to: "runai\_cpu\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_LIMIT" metricType |
138-
| runai\_job\_cpu\_usage | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_USAGE" metricType |
139-
| runai\_active\_job\_memory\_limits | changed to API + renamed to: "runai\_memory\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_LIMIT" metricType |
140-
| runai\_running\_job\_memory\_requested\_bytes | was a duplication of "runai\_active\_job\_memory\_requested\_bytes", see above | |
141-
| runai\_job\_memory\_used\_bytes | changed to API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_USAGE" metricType |
142-
| runai\_job\_swap\_memory\_used\_bytes | no longer available | |
143-
| runai\_gpu\_count\_per\_node | added labels | |
144-
| runai\_last\_gpu\_utilization\_time\_per\_workload | labels changed | |
145-
| runai\_gpu\_idle\_time\_per\_workload | renamed to: "runai\_gpu\_idle\_seconds\_per\_workload" with different labels | |
115+
Starting in cluster version 2.17, some of the metrics names have been changed. In addition some Run:ai metrics are available as API endpoints. Using the API endpoints is more efficient and provides an easier way of retrieving metrics in any application. The following table lists the metrics that were changed.
116+
117+
| Metric name in version 2.16 | 2.17 Change Description | 2.17 API Endpoint |
118+
| --- |-------------------------------------------------------------------------------------------------------| --- |
119+
| runai\_active\_job\_cpu\_requested\_cores | available also via API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_REQUEST" metricType |
120+
| runai\_active\_job\_memory\_requested\_bytes | available also via API | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_REQUEST" metricType |
121+
| runai\_cluster\_cpu\_utilization | available also via API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_UTILIZATION" metricType |
122+
| runai\_cluster\_memory\_utilization | available also via API | https://app.run.ai/api/v2/clusters/{clusterUuid}/metrics ; with "CPU\_MEMORY\_UTILIZATION" metricType |
123+
| runai\_gpu\_utilization\_non\_fractional\_jobs | no longer available | |
124+
| runai\_allocated\_gpu\_count\_per\_workload | labels changed | |
125+
| runai\_gpu\_utilization\_per\_pod\_per\_gpu | available also via API | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_UTILIZATION\_PER\_GPU" metricType |
126+
| runai\_gpu\_utilization\_per\_workload | available also via API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_UTILIZATION" metricType |
127+
| runai\_job\_image | no longer available | |
128+
| runai\_job\_requested\_gpu\_memory | available also via API + renamed to: "runai\_requested\_gpu\_memory\_mb\_per\_workload" with different labels | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_REQUEST" metricType |
129+
| runai\_job\_requested\_gpus | renamed to: "runai\_requested\_gpus\_per\_workload" with different labels | |
130+
| runai\_job\_total\_runtime | renamed to: "runai\_run\_time\_seconds\_per\_workload" with different labels | |
131+
| runai\_job\_total\_wait\_time | renamed to: "runai\_wait\_time\_seconds\_per\_workload" with different labels | |
132+
| runai\_gpu\_memory\_used\_mebibytes\_per\_workload | available also via API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "GPU\_MEMORY\_USAGE" metricType |
133+
| runai\_gpu\_memory\_used\_mebibytes\_per\_pod\_per\_gpu | available also via API + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/pods/{podId}/metrics ; with "GPU\_MEMORY\_USAGE\_PER\_GPU" metricType |
134+
| runai\_node\_gpu\_used\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_used\_mebibytes\_per\_node" | |
135+
| runai\_node\_total\_memory\_bytes | renamed and changed units: "runai\_gpu\_memory\_total\_mebibytes\_per\_node" | |
136+
| runai\_project\_info | labels changed | |
137+
| runai\_active\_job\_cpu\_limits | available also via + renamed to: "runai\_cpu\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_LIMIT" metricType |
138+
| runai\_job\_cpu\_usage | available also via + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_USAGE" metricType |
139+
| runai\_active\_job\_memory\_limits | available also via + renamed to: "runai\_memory\_limits\_per\_active\_workload" | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_LIMIT" metricType |
140+
| runai\_running\_job\_memory\_requested\_bytes | was a duplication of "runai\_active\_job\_memory\_requested\_bytes", see above | |
141+
| runai\_job\_memory\_used\_bytes | available also via + labels changed | https://app.run.ai/api/v1/workloads/{workloadId}/metrics ; with "CPU\_MEMORY\_USAGE" metricType |
142+
| runai\_job\_swap\_memory\_used\_bytes | no longer available | |
143+
| runai\_gpu\_count\_per\_node | added labels | |
144+
| runai\_last\_gpu\_utilization\_time\_per\_workload | labels changed | |
145+
| runai\_gpu\_idle\_time\_per\_workload | renamed to: "runai\_gpu\_idle\_seconds\_per\_workload" with different labels | |
146146

147147
## Create custom dashboards
148148

0 commit comments

Comments
 (0)