You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quickwit exposes key metrics in the [Prometheus](https://prometheus.io/) format on the `/metrics` endpoint. You can use any front-end that supports Prometheus to examine the behavior of Quickwit visually.
7
7
8
+
:::tip
9
+
10
+
Workloads with a large number of indexes generate high cardinality metrics for the label `index`. Set the environment variable `QW_DISABLE_PER_INDEX_METRICS=true` to disable that label if this is problematic for your metrics database.
11
+
12
+
:::
13
+
8
14
## Cache Metrics
9
15
10
-
Currently Quickwit exposes metrics for three caches: `fastfields`, `shortlived`, `splitfooter`. These metrics share the same structure.
16
+
Quickwit exposes several metrics every caches. The cache type is defined in the `component_name` label. Values are `fastfields`, `shortlived`, `splitfooter`, `fd`, `partial_request`, and `searcher_split`.
11
17
12
-
| Namespace | Metric Name | Description | Type |
13
-
| --------- | ----------- | ----------- | ---- |
14
-
|`quickwit_cache_{cache_name}`|`in_cache_count`| Count of {cache_name} in cache |`gauge`|
15
-
|`quickwit_cache_{cache_name}`|`in_cache_num_bytes`| Number of {cache_name} bytes in cache |`gauge`|
16
-
|`quickwit_cache_{cache_name}`|`cache_hit_total`| Number of {cache_name} cache hits |`counter`|
17
-
|`quickwit_cache_{cache_name}`|`cache_hits_bytes`| Number of {cache_name} cache hits in bytes |`counter`|
18
-
|`quickwit_cache_{cache_name}`|`cache_miss_total`| Number of {cache_name} cache hits |`counter`|
18
+
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit_control_plane`|`indexes_total`| Number of indexes ||`gauge`|
54
+
|`quickwit_control_plane`|`restart_total`| Number of control plane restarts ||`counter`|
55
+
|`quickwit_control_plane`|`schedule_total`| Number of control plane schedule operations ||`counter`|
56
+
|`quickwit_control_plane`|`apply_total`| Number of control plane apply plan operations ||`counter`|
57
+
|`quickwit_control_plane`|`metastore_error_aborted`| Number of aborted metastore transactions (do not trigger a control plane restart) ||`counter`|
58
+
|`quickwit_control_plane`|`metastore_error_maybe_executed`| Number of metastore transactions with an uncertain outcome (do trigger a control plane restart) ||`counter`|
59
+
|`quickwit_control_plane`|`open_shards_total`| Number of open shards per source |[`index_id`]|`gauge`|
60
+
|`quickwit_control_plane`|`shards`| Number of (remote/local) shards in the indexing plan |[`locality`]|`gauge`|
61
+
62
+
## GRPC Metrics
63
+
64
+
The following subsystems expose gRPC metrics: `cluster`, `control_plane`, `indexing`, `ingest`, `metastore`.
27
65
28
66
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit`|`write_bytes`| Number of bytes written by a given component in [`indexer`, `merger`, `deleter`, `split_downloader_{merge,delete}`]|[`index`, `component`]|`counter`|
68
+
|`quickwit_{subsystem}`|`grpc_requests_total`| Total number of gRPC requests processed |[`kind`, `rpc`, `status`]|`counter`|
69
+
|`quickwit_{subsystem}`|`grpc_requests_in_flight`| Number of gRPC requests in-flight |[`kind`, `rpc`]|`gauge`|
70
+
|`quickwit_{subsystem}`|`grpc_request_duration_seconds`| Duration of request in seconds |[`kind`, `rpc`, `status`]|`histogram`|
|`quickwit_indexing`|`processed_docs_total`| Number of processed docs by index, source and processed status in [`valid`, `schema_error`, `parse_error`, `transform_error`]|[`index`, `source`, `docs_processed_status`]|`counter`|
37
-
|`quickwit_indexing`|`processed_bytes`| Number of processed bytes by index, source and processed status in [`valid`, `schema_error`, `parse_error`, `transform_error`]|[`index`, `source`, `docs_processed_status`]|`counter`|
38
-
|`quickwit_indexing`|`available_concurrent_upload_permits`| Number of available concurrent upload permits by component in [`merger`, `indexer`]|[`component`]|`gauge`|
39
-
|`quickwit_indexing`|`ongoing_merge_operations`| Number of available concurrent upload permits by component in [`merger`, `indexer`]. |[`index`, `source`]|`gauge`|
77
+
|`quickwit_indexing`|`processed_docs_total`| Number of processed docs by index and processed status |[`index`, `docs_processed_status`]|`counter`|
78
+
|`quickwit_indexing`|`processed_bytes`| Number of bytes of processed documents by index and processed status |[`index`, `docs_processed_status`]|`counter`|
79
+
|`quickwit_indexing`|`backpressure_micros`| Amount of time spent in backpressure (in micros) |[`actor_name`]|`counter`|
80
+
|`quickwit_indexing`|`concurrent_upload_available_permits_num`| Number of available concurrent upload permits by component |[`component`]|`gauge`|
81
+
|`quickwit_indexing`|`split_builders`| Number of existing index writer instances ||`gauge`|
82
+
|`quickwit_indexing`|`ongoing_merge_operations`| Number of ongoing merge operations ||`gauge`|
83
+
|`quickwit_indexing`|`pending_merge_operations`| Number of pending merge operations ||`gauge`|
84
+
|`quickwit_indexing`|`pending_merge_bytes`| Number of pending merge bytes ||`gauge`|
85
+
|`quickwit_indexing`|`kafka_rebalance_total`| Number of kafka rebalances ||`counter`|
40
86
41
87
## Ingest Metrics
42
88
43
-
| Namespace | Metric Name | Description | Type |
44
-
| --------- | ----------- | ----------- | ---- |
45
-
|`quickwit_ingest`|`ingested_num_bytes`| Total size of the docs ingested in bytes |`counter`|
46
-
|`quickwit_ingest`|`ingested_num_docs`| Number of docs received to be ingested |`counter`|
47
-
|`quickwit_ingest`|`queue_count`| Number of queues currently active |`counter`|
89
+
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit_ingest`|`docs_total`| Total number of docs ingested, measured in ingester's leader |[`validity`]|`counter`|
92
+
|`quickwit_ingest`|`docs_bytes_total`| Total size of the docs ingested in bytes, measured in ingester's leader |[`validity`]|`counter`|
93
+
|`quickwit_ingest`|`ingest_result_total`| Number of ingest requests by result |[`result`]|`counter`|
94
+
|`quickwit_ingest`|`reset_shards_operations_total`| Total number of reset shards operations performed |[`status`]|`counter`|
95
+
|`quickwit_ingest`|`shards`| Number of shards hosted by the ingester |[`state`]|`gauge`|
96
+
|`quickwit_ingest`|`shard_lt_throughput_mib`| Shard long term throughput as reported through chitchat ||`histogram`|
97
+
|`quickwit_ingest`|`shard_st_throughput_mib`| Shard short term throughput as reported through chitchat ||`histogram`|
98
+
|`quickwit_ingest`|`wal_acquire_lock_requests_in_flight`| Number of acquire lock requests in-flight |[`operation`, `type`]|`gauge`|
99
+
|`quickwit_ingest`|`wal_acquire_lock_request_duration_secs`| Duration of acquire lock requests in seconds |[`operation`, `type`]|`histogram`|
100
+
|`quickwit_ingest`|`wal_disk_used_bytes`| WAL disk space used in bytes ||`gauge`|
101
+
|`quickwit_ingest`|`wal_memory_used_bytes`| WAL memory used in bytes ||`gauge`|
102
+
<!-- uncomment when replication is released
103
+
| `quickwit_ingest` | `replicated_num_bytes_total` | Total size in bytes of the replicated docs | | `counter` |
104
+
| `quickwit_ingest` | `replicated_num_docs_total` | Total number of docs replicated | | `counter` |
105
+
-->
106
+
107
+
Note that the legacy ingest (V1) only records the `docs_total` and `docs_bytes_total` metrics. The `validity` label is always set to `valid` because it doesn't parse the documents at ingest time. Invalid documents are discarded asynchronously in the indexing pipeline's doc processor.
108
+
109
+
## Janitor Metrics
48
110
49
-
## Metastore Metrics
111
+
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit_metastore`|`requests_total`| Number of requests |[`operation`, `index`]|`counter`|
56
-
|`quickwit_metastore`|`request_errors_total`| Number of failed requests |[`operation`, `index`]|`counter`|
57
-
|`quickwit_metastore`|`request_duration_seconds`| Duration of requests |[`operation`, `index`, `error`]|`histogram`|
134
+
|`quickwit_memory`|`active_bytes`| Total number of bytes in active pages allocated by the application, as reported by jemalloc `stats.active`||`gauge`|
135
+
|`quickwit_memory`|`allocated_bytes`| Total number of bytes allocated by the application, as reported by jemalloc `stats.allocated`||`gauge`|
136
+
|`quickwit_memory`|`resident_bytes`| Total number of bytes in physically resident data pages mapped by the allocator, as reported by jemalloc `stats.resident`||`gauge`|
137
+
|`quickwit_memory`|`in_flight_data_bytes`| Amount of data in-flight in various buffers in bytes |[`component`]|`gauge`|
|`quickwit`|`http_requests_total`| Total number of HTTP requests processed |[`method`, `status_code`]|`counter`|
163
+
|`quickwit`|`request_duration_secs`| Response time in seconds |[`method`, `status_code`]|`histogram`|
164
+
|`quickwit`|`ongoing_requests`| Number of ongoing requests on specific endpoint groups |[`endpoint_group`]|`gauge`|
165
+
|`quickwit`|`pending_requests`| Number of pending requests on specific endpoint groups |[`endpoint_group`]|`gauge`|
66
166
67
167
## Search Metrics
68
168
69
-
| Namespace | Metric Name | Description | Type |
70
-
| --------- | ----------- | ----------- | ---- |
71
-
|`quickwit_search`|`leaf_searches_splits_total`| Number of leaf searches (count of splits) started |`counter`|
72
-
|`quickwit_search`|`leaf_search_split_duration_secs`| Number of seconds required to run a leaf search over a single split. The timer starts after the semaphore is obtained |`histogram`|
73
-
|`quickwit_search`|`active_search_threads_count`| Number of threads in use in the CPU thread pool |`gauge`|
169
+
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit_search`|`root_search_requests_total`| Total number of root search gRPC requests processed |[`status`]|`counter`|
172
+
|`quickwit_search`|`root_search_request_duration_seconds`| Duration of root search gRPC requests in seconds |[`status`]|`histogram`|
173
+
|`quickwit_search`|`root_search_targeted_splits`| Number of splits targeted per root search gRPC request |[`status`]|`histogram`|
174
+
|`quickwit_search`|`leaf_search_requests_total`| Total number of leaf search gRPC requests processed |[`status`]|`counter`|
175
+
|`quickwit_search`|`leaf_search_request_duration_seconds`| Duration of leaf search gRPC requests in seconds |[`status`]|`histogram`|
176
+
|`quickwit_search`|`leaf_search_targeted_splits`| Number of splits targeted per leaf search gRPC request |[`status`]|`histogram`|
177
+
|`quickwit_search`|`leaf_searches_splits_total`| Number of leaf searches (count of splits) started ||`counter`|
178
+
|`quickwit_search`|`leaf_search_split_duration_secs`| Number of seconds required to run a leaf search over a single split. The timer starts after the semaphore is obtained ||`histogram`|
179
+
|`quickwit_search`|`leaf_search_single_split_tasks`| Number of single split search tasks pending or ongoing |[`status`]|`gauge`|
180
+
|`quickwit_search`|`leaf_search_single_split_warmup_num_bytes`| Size of the short lived cache for a single split once the warmup is done ||`histogram`|
181
+
|`quickwit_search`|`job_assigned_total`| Number of jobs assigned to searchers, per affinity rank |[`affinity`]|`counter`|
182
+
|`quickwit_search`|`searcher_local_kv_store_size_bytes`| Size of the searcher kv store in bytes. This store is used to cache scroll contexts ||`gauge`|
74
183
75
184
## Storage Metrics
76
185
186
+
| Namespace | Metric Name | Description | Labels | Type |
|`quickwit_storage`|`get_slice_timeout_outcome`| Outcome of get_slice operations. success_after_1_timeout means the operation succeeded after a retry caused by a timeout |[`outcome`]|`counter`|
189
+
|`quickwit_storage`|`object_storage_requests_total`| Number of requests to the object store, by action and status. Requests are recorded when the response headers are returned |[`action`, `status`]|`counter`|
190
+
|`quickwit_storage`|`object_storage_request_duration`| Durations until the response headers are returned from the object store, by action and status |[`action`, `status`]|`histogram`|
191
+
|`quickwit_storage`|`object_storage_download_num_bytes`| Amount of data downloaded from object storage |[`status`]|`counter`|
192
+
|`quickwit_storage`|`object_storage_download_errors`| Number of download requests that received successful response headers but failed during download |[`status`]|`counter`|
193
+
|`quickwit_storage`|`object_storage_upload_num_bytes`| Amount of data uploaded to object storage. The value recorded for failed and aborted uploads is the full payload size |[`status`]|`counter`|
194
+
195
+
## CLI Metrics
196
+
77
197
| Namespace | Metric Name | Description | Type |
78
198
| --------- | ----------- | ----------- | ---- |
79
-
|`quickwit_storage`|`object_storage_gets_total`| Number of objects fetched |`counter`|
80
-
|`quickwit_storage`|`object_storage_puts_total`| Number of objects uploaded. May differ from object_storage_requests_parts due to multipart upload |`counter`|
81
-
|`quickwit_storage`|`object_storage_puts_parts`| Number of object parts uploaded |`counter`|
82
-
|`quickwit_storage`|`object_storage_download_num_bytes`| Amount of data downloaded from an object storage |`counter`|
199
+
|`quickwit_cli`|`thread_unpark_duration_microseconds`| Duration for which a thread of the main tokio runtime is unparked |`histogram`|
0 commit comments