Skip to content

Commit 4a18b74

Browse files
Some edits to #1584
1 parent 67d755d commit 4a18b74

File tree

1 file changed

+40
-33
lines changed

1 file changed

+40
-33
lines changed

site/prometheus.md

Lines changed: 40 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -499,12 +499,16 @@ prometheus.return_per_object_metrics = true
499499

500500
### <a id="per-object-endpoint" class="anchor" href="#per-object-endpoint">Prometheus endpoints: `/metrics/per-object`</a>
501501

502-
RabbitMQ offers a dedicated endpoint, `/metrics/per-object`, which always returns per-object metrics, regardless of the value of `prometheus.return_per_object_metrics`.
503-
You can therefore keep the default value of `prometheus.return_per_object_metrics`, which is `false`, and still scrape per-object metrics when necessary, by setting `metrics_path = /metrics/per-object` in the Prometheus target configuration (check [Prometheus Documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) for additional information).
502+
RabbitMQ offers a dedicated endpoint, `/metrics/per-object`, which always returns per-object metrics,
503+
regardless of the value of `prometheus.return_per_object_metrics`.
504+
You can therefore keep the default value of `prometheus.return_per_object_metrics`,
505+
which is `false`, and still scrape per-object metrics when necessary, by setting `metrics_path = /metrics/per-object` in the Prometheus target configuration (check [Prometheus Documentation](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config) for additional information).
504506

505507
### <a id="detailed-endpoint" class="anchor" href="#detailed-endpoint">Prometheus endpoints: `/metrics/detailed`</a>
506508

507-
Those are metrics than can be explicitly requested via `/metrics/detailed` endpoint.
509+
Because [enabling per-object metics can be very expensive](#metric-aggregation) but at times necessary,
510+
a separate endpoint, `GET /metrics/detailed`, provides access to them even if per-object
511+
metrics are disabled for the node.
508512

509513
#### Generic metrics
510514

@@ -513,7 +517,7 @@ queue/connection/etc.
513517

514518
##### Connection/channel/queue churn
515519

516-
Group `connection_churn_metrics`:
520+
Grouped under `connection_churn_metrics`:
517521

518522
| Metric | Description |
519523
|--------------------------------------------|--------------------------------------------------|
@@ -528,7 +532,7 @@ Group `connection_churn_metrics`:
528532

529533
##### Erlang VM/Disk IO via RabbitMQ
530534

531-
Group `node_coarse_metrics`:
535+
Grouped under `node_coarse_metrics`:
532536

533537
| Metric | Description |
534538
|-----------------------------------------------------------|-----------------------------------------------------------------------|
@@ -541,7 +545,7 @@ Group `node_coarse_metrics`:
541545
| rabbitmq_detailed_erlang_gc_reclaimed_bytes_total | Total number of bytes of memory reclaimed by Erlang garbage collector |
542546
| rabbitmq_detailed_erlang_scheduler_context_switches_total | Total number of Erlang scheduler context switches |
543547

544-
Group `node_metrics`:
548+
Grouped under `node_metrics`:
545549

546550
| Metric | Description |
547551
|----------------------------------------------------|----------------------------------------|
@@ -555,7 +559,7 @@ Group `node_metrics`:
555559
| rabbitmq_detailed_erlang_uptime_seconds | Node uptime |
556560

557561

558-
Group `node_persister_metrics`:
562+
Grouped under `node_persister_metrics`:
559563

560564
| Metric | Description |
561565
|-------------------------------------------------------|------------------------------------------------------|
@@ -578,9 +582,9 @@ Group `node_persister_metrics`:
578582
| rabbitmq_detailed_io_seek_time_seconds_total | Total I/O seek time |
579583

580584

581-
##### Raft metrics
585+
##### Raft-related (Quorum queues, streams) metrics
582586

583-
Group `ra_metrics`:
587+
Grouped under `ra_metrics`:
584588

585589
| Metric | Description |
586590
|-----------------------------------------------------|--------------------------------------------|
@@ -593,7 +597,7 @@ Group `ra_metrics`:
593597

594598
##### Auth metrics
595599

596-
Group `auth_attempt_metrics`:
600+
Grouped under `auth_attempt_metrics`:
597601

598602
| Metric | Description |
599603
|-------------------------------------------------|----------------------------------------------------|
@@ -602,7 +606,7 @@ Group `auth_attempt_metrics`:
602606
| rabbitmq_detailed_auth_attempts_failed_total | Total number of failed authentication attempts |
603607

604608

605-
Group `auth_attempt_detailed_metrics` (when aggregated, it produces the same numbers as `auth_attempt_metrics` - so it's mutually exclusive with it in the aggregation mode):
609+
Grouped under `auth_attempt_detailed_metrics`. When aggregated, these add up to the same numbers as `auth_attempt_metrics`.
606610

607611
| Metric | Description |
608612
|----------------------------------------------------------|--------------------------------------------------------------------|
@@ -613,13 +617,15 @@ Group `auth_attempt_detailed_metrics` (when aggregated, it produces the same num
613617

614618
#### Queue metrics
615619

616-
Each of metrics in this group refers to a single queue in its label. Amount of data and performance totally depends on the number of queues.
620+
Each metric in this group points to a single queue via its label.
621+
So the size of the response here is directly proportional to the number of queues hosted
622+
on the node.
617623

618-
They are listed from least expensive to collect to the most expensive.
624+
The metrics below are listed from the least expensive to collect to the most expensive.
619625

620626
##### Queue coarse metrics
621627

622-
Group `queue_coarse_metrics`:
628+
Grouped under `queue_coarse_metrics`:
623629

624630
| Metric | Description |
625631
|--------------------------------------------------|--------------------------------------------------------------|
@@ -630,17 +636,18 @@ Group `queue_coarse_metrics`:
630636

631637
##### Per-queue consumer count
632638

633-
Group `queue_consumer_count`. This is a strict subset of `queue_metrics` which contains only a single metric (if both `queue_consumer_count` and `queue_metrics` are requested, the former will be automatically skipped):
639+
Grouped under `queue_consumer_count`. This is a subset of `queue_metrics` which is skipped if `queue_metrics` are requested:
634640

635641
| Metric | Description |
636642
|-----------------------------------|----------------------|
637643
| rabbitmq_detailed_queue_consumers | Consumers on a queue |
638644

639-
This is one of the more telling metrics, and having it separately allows to skip some expensive operations for extracting/exposing the other metrics from the same datasource.
645+
This metric is useful for quickly detecting issues with consumers (e.g. when there are no consumers online).
646+
This is why it is exposed separately.
640647

641648
##### Detailed queue metrics
642649

643-
Group `queue_metrics` contains all the metrics for every queue, and can be relatively expensive to produce:
650+
Grouped under `queue_metrics`. This group contains all the metrics for every queue, and can be relatively expensive to produce:
644651

645652
| Metric | Description |
646653
|---------------------------------------------------|------------------------------------------------------------|
@@ -663,23 +670,24 @@ Group `queue_metrics` contains all the metrics for every queue, and can be relat
663670
| rabbitmq_detailed_queue_disk_reads_total | Total number of times queue read messages from disk |
664671
| rabbitmq_detailed_queue_disk_writes_total | Total number of times queue wrote messages to disk |
665672

666-
Tests show that performance difference between it and `queue_consumer_count` is approximately 8 times. E.g. on a test broker with 10k queues/producers/consumers, scrape time was ~8 second and ~1 respectively. So while it's expensive, it's not prohibitively so - especially compared to other metrics from per-connection/channel groups.
667-
668673
#### Connection/channel metrics
669674

670-
All of those include Erlang PID in their label, which is rarely useful when ingested into Prometheus. And they are most expensive to produce, the most resources are spent by `/metrics/per-object` on these.
675+
All of those include the Erlang process ID of the channel in their label. This data is not particularly useful
676+
and is only present to distinguish metrics of separate channels.
677+
678+
These metrics are the most expensive to produce.
671679

672680
##### Connection metrics
673681

674-
Group `connection_coarse_metrics`:
682+
Grouped under `connection_coarse_metrics`:
675683

676684
| Metric | Description |
677685
|-------------------------------------------------------|------------------------------------------------|
678686
| rabbitmq_detailed_connection_incoming_bytes_total | Total number of bytes received on a connection |
679687
| rabbitmq_detailed_connection_outgoing_bytes_total | Total number of bytes sent on a connection |
680688
| rabbitmq_detailed_connection_process_reductions_total | Total number of connection process reductions |
681689

682-
Group `connection_metrics`:
690+
Grouped under `connection_metrics`:
683691

684692
| Metric | Description |
685693
|-----------------------------------------------------|------------------------------------------------------|
@@ -690,7 +698,7 @@ Group `connection_metrics`:
690698

691699
##### General channel metrics
692700

693-
Group `channel_metrics`:
701+
Grouped under `channel_metrics`:
694702

695703
| Metric | Description |
696704
|------------------------------------------------|-----------------------------------------------------------------------|
@@ -703,7 +711,7 @@ Group `channel_metrics`:
703711
| rabbitmq_detailed_channel_prefetch | Total limit of unacknowledged messages for all consumers on a channel |
704712

705713

706-
Group `channel_process_metrics`:
714+
Grouped under `channel_process_metrics`:
707715

708716
| Metric | Description |
709717
|----------------------------------------------------|--------------------------------------------|
@@ -712,7 +720,7 @@ Group `channel_process_metrics`:
712720

713721
##### Channel metrics with queue/exchange breakdowns
714722

715-
Group `channel_exchange_metrics`:
723+
Grouped under `channel_exchange_metrics`:
716724

717725
| Metric | Description |
718726
|--------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
@@ -721,7 +729,7 @@ Group `channel_exchange_metrics`:
721729
| rabbitmq_detailed_channel_messages_unroutable_returned_total | Total number of messages published as mandatory into an exchange and returned to the publisher as unroutable |
722730
| rabbitmq_detailed_channel_messages_unroutable_dropped_total | Total number of messages published as non-mandatory into an exchange and dropped as unroutable |
723731

724-
Group `channel_queue_metrics`:
732+
Grouped under `channel_queue_metrics`:
725733

726734
| Metric | Description |
727735
|--------------------------------------------------------|-----------------------------------------------------------------------------------|
@@ -733,7 +741,7 @@ Group `channel_queue_metrics`:
733741
| rabbitmq_detailed_channel_messages_acked_total | Total number of messages acknowledged by consumers |
734742
| rabbitmq_detailed_channel_get_empty_total | Total number of times basic.get operations fetched no message |
735743

736-
Group `channel_queue_exchange_metrics`:
744+
Grouped under `channel_queue_exchange_metrics`:
737745

738746
| Metric | Description |
739747
|--------------------------------------------------|----------------------------------------------|
@@ -742,23 +750,22 @@ Group `channel_queue_exchange_metrics`:
742750
#### Virtual hosts and exchange metrics
743751

744752
These additional metrics can be useful when virtual hosts or exchanges are
745-
created on a shared cluster in a self-service way. They are different
746-
from the rest of the metrics: they are cluster-wide and not node-local.
747-
These metrics **must not** be aggregated across cluster nodes.
753+
created in a shared cluster. **These metrics are cluster-wide and not node-local**.
754+
Therefore these metrics **must not be aggregated** across cluster nodes.
748755

749-
Group `vhost_status`:
756+
Grouped under `vhost_status`:
750757

751758
| Metric | Description |
752759
|-------------------------------|----------------------------------|
753760
| rabbitmq_cluster_vhost_status | Whether a given vhost is running |
754761

755-
Group `exchange_names`:
762+
Grouped under `exchange_names`:
756763

757764
| Metric | Description |
758765
|--------------------------------|----------------------------------------------------------------------------------------------------------------------------|
759766
| rabbitmq_cluster_exchange_name | Enumerates exchanges without any additional info. This value is cluster-wide. A cheaper alternative to `exchange_bindings` |
760767

761-
Group `exchange_bindings`:
768+
Grouped under `exchange_bindings`:
762769

763770
| Metric | Description |
764771
|------------------------------------|-----------------------------------------------------------------|

0 commit comments

Comments
 (0)