Skip to content

Commit 1f4f9b5

Browse files
pyek-botkolchfa-awsnatebower
authored
[MLCommons] Add details about metrics integration (#10120)
* feat: add details about metrics framework in ml-commons Signed-off-by: Pavan Yekbote <pybot@amazon.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Fix links Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _monitoring-your-cluster/metrics/getting-started.md Signed-off-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: Pavan Yekbote <pybot@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent 362e090 commit 1f4f9b5

File tree

2 files changed

+22
-5
lines changed

2 files changed

+22
-5
lines changed

_ml-commons-plugin/api/stats.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ nav_order: 110
77

88
# Stats
99

10+
The Stats API provides basic statistics about ML Commons, such as the number of running tasks. To monitor machine learning workflows using more detailed time-series metrics, see [Monitoring machine learning workflows]({{site.url}}{{site.baseurl}}/monitoring-your-cluster/metrics/getting-started/#monitoring-machine-learning-workflows).
11+
{: .note }
12+
1013
Gets statistics related to the number of tasks.
1114

1215
## Endpoints

_monitoring-your-cluster/metrics/getting-started.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,11 @@ The `enable` flag is toggled using a Java Virtual Machine (JVM) parameter that i
3535
cd \path\to\opensearch
3636
```
3737

38-
2. Open your `opensearch.yaml` file.
39-
3. Add the following setting to `opensearch.yaml`:
38+
2. Open your `opensearch.yml` file.
39+
3. Add the following setting to `opensearch.yml`:
4040

41-
```bash
42-
opensearch.experimental.feature.telemetry.enabled=true
41+
```yaml
42+
opensearch.experimental.feature.telemetry.enabled: true
4343
```
4444
{% include copy.html %}
4545
@@ -73,7 +73,7 @@ export OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.telemetry.enabled
7373

7474
### Enable with Docker
7575

76-
If youre running OpenSearch using Docker, add the following line to `docker-compose.yml` under `environment`:
76+
If you're running OpenSearch using Docker, add the following line to `docker-compose.yml` under `environment`:
7777

7878
```bash
7979
OPENSEARCH_JAVA_OPTS="-Dopensearch.experimental.feature.telemetry.enabled=true"
@@ -105,3 +105,17 @@ The metrics framework feature supports the following metric types:
105105
2. **UpDown counters:** UpDown counters can be incremented with positive values or decremented with negative values. UpDown counters are well suited for tracking metrics like open connections, active requests, and other fluctuating quantities.
106106
3. **Histograms:** Histograms are valuable tools for visualizing the distribution of continuous data. Histograms offer insight into the central tendency, spread, skewness, and potential outliers that might exist in your metrics. Patterns such as normal distribution, skewed distribution, or bimodal distribution can be readily identified, making histograms ideal for analyzing latency metrics and assessing percentiles.
107107
4. **Asynchronous Gauges:** Asynchronous gauges capture the current value at the moment a metric is read. These metrics are non-additive and are commonly used to measure CPU utilization on a per-minute basis, memory utilization, and other real-time values.
108+
109+
## Monitoring machine learning workflows
110+
Introduced 3.1
111+
{: .label .label-purple }
112+
113+
OpenSearch provides enhanced observability for [machine learning (ML)]({{site.url}}{{site.baseurl}}/ml-commons-plugin/) workflows. Metrics related to ML operations are pushed directly to the core metrics registry, giving you improved visibility into model usage and performance. Additionally, every 5 minutes, a periodic job collects and exports state data, helping you monitor the health and activity of your ML workloads over time.
114+
115+
To enable ML observability, specify the following settings in `opensearch.yml`:
116+
117+
```yaml
118+
plugins.ml_commons.metrics_collection_enabled: true
119+
plugins.ml_commons.metrics_static_collection_enabled: true
120+
```
121+
{% include copy.html %}

0 commit comments

Comments
 (0)