Skip to content

Commit 5cfa163

Browse files
adding pending tasks, stats and reroute cluster apis docs (#10311) (#10325)
1 parent c6c5b17 commit 5cfa163

File tree

3 files changed

+424
-0
lines changed

3 files changed

+424
-0
lines changed
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
layout: default
3+
title: Cluster pending tasks
4+
nav_order: 45
5+
parent: Cluster APIs
6+
has_children: false
7+
---
8+
9+
# Cluster pending tasks
10+
11+
The `/_cluster/pending_tasks` API returns a list of cluster-level changes that have not yet been executed. These pending tasks are typically queued operations such as index creation, template updates, shard allocation changes, and other cluster state updates.
12+
13+
This API is useful for monitoring the state of the cluster and diagnosing delays in cluster state updates, especially when tasks are backed up or stuck.
14+
15+
## Endpoint
16+
17+
```json
18+
GET /_cluster/pending_tasks
19+
```
20+
21+
## Query parameters
22+
23+
The following table lists the available query parameters. All query parameters are optional.
24+
25+
| Parameter | Data type | Description |
26+
| ---------------- | --------- | ----------------------------------------------------------------------------------------------------------------------- |
27+
| `local` | Boolean | Whether to return information from the local node only instead of the elected cluster manager node. Default is `false`. |
28+
| `cluster_manager_timeout` | Time | Specifies the timeout for connecting to the cluster manager node. Default is `30s`. |
29+
30+
## Example request
31+
32+
The following request returns the list of currently pending cluster state update tasks:
33+
34+
```json
35+
GET /_cluster/pending_tasks
36+
```
37+
38+
{% include copy-curl.html %}
39+
40+
### Example response
41+
42+
```json
43+
{
44+
"tasks": [
45+
{
46+
"insert_order": 1234,
47+
"priority": "HIGH",
48+
"source": "create-index [logs-2025.07.15]",
49+
"executing": false,
50+
"time_in_queue_millis": 28,
51+
"time_in_queue": "28ms"
52+
},
53+
{
54+
"insert_order": 1235,
55+
"priority": "URGENT",
56+
"source": "shard-started shard id [logs-2025.07.15][0]",
57+
"executing": true,
58+
"time_in_queue_millis": 3,
59+
"time_in_queue": "3ms"
60+
}
61+
]
62+
}
63+
```
64+
65+
The `_cluster/pending_tasks` API typically returns an empty array because the tasks are normally processed too quickly to be included in the response.
66+
{: .note}
67+
68+
## Response fields
69+
70+
The following table lists all response fields.
71+
72+
| Field | Data type | Description |
73+
| ------------------------------- | --------- | ------------------------------------------------------------------ |
74+
| `tasks` | Array | The list of pending cluster state update tasks. |
75+
| `tasks[n].insert_order` | Integer | The order in which the task was added to the queue. |
76+
| `tasks[n].priority` | String | The priority level of the task (for example, `HIGH`, `URGENT`). |
77+
| `tasks[n].source` | String | The description of the operation that submitted the task. |
78+
| `tasks[n].executing` | Boolean | Confirmation of whether the task is currently being executed. |
79+
| `tasks[n].time_in_queue_millis` | Integer | The amount of time the task has been waiting in the queue (in milliseconds). |
80+
| `tasks[n].time_in_queue` | String | A human-readable version of `time_in_queue_millis`. |
Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
---
2+
layout: default
3+
title: Cluster reroute
4+
nav_order: 46
5+
parent: Cluster APIs
6+
has_children: false
7+
---
8+
9+
# Cluster reroute
10+
11+
The `/_cluster/reroute` API allows you to manually control the allocation of individual shards within the cluster. This includes moving, allocating, or canceling shard allocations. It's typically used for advanced scenarios, such as manual recovery or custom load balancing.
12+
13+
Shard movement is subject to cluster allocation deciders. Always test reroute commands using `dry_run=true` before applying them in production environments. Use the `explain=true` parameter to obtain detailed insight into allocation decisions, which can assist in understanding why a particular reroute request may or may not be allowed. If shard allocation fails because of prior issues or cluster instability, you can reattempt allocation using the `retry_failed=true` parameter.
14+
15+
For more information regarding shard distribution and cluster health, see [Cluster health]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/) and [Cluster allocation explain]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-allocation/).
16+
17+
## Endpoints
18+
19+
```json
20+
POST /_cluster/reroute
21+
```
22+
23+
## Query parameters
24+
25+
| Parameter | Data type | Description |
26+
| ---------------- | --------- | -------------------------------------------------------------------------------------------------- |
27+
| `dry_run` | Boolean | If `true`, validates and simulates the reroute request without applying it. Default is `false`. |
28+
| `explain` | Boolean | If `true`, returns an explanation of why the command was accepted or rejected. Default is `false`. |
29+
| `retry_failed` | Boolean | If `true`, retries allocation of shards that previously failed. Default is `false`. |
30+
| `metric` | String | Limits the returned metadata. See [Metric options](#metric-options) for a list of available options. Default is `_all`. |
31+
| `cluster_manager_timeout` | Time | The timeout for connection to the cluster manager node. Default is `30s`. |
32+
| `timeout` | Time | The overall request timeout. Default is `30s`. |
33+
34+
### Metric options
35+
36+
The `metric` parameter filters the cluster state values returned by the Reroute API. This is useful for reducing response size or inspecting specific parts of the cluster state. This parameter supports the following values:
37+
38+
- `_all` _(Default)_: Returns all available cluster state sections.
39+
- `blocks`: Includes information about read- and write-level blocks in the cluster.
40+
- `cluster_manager_node`: Shows which node is currently acting as the cluster manager.
41+
- `metadata`: Returns index settings, mappings, and aliases. If specific indexes are targeted, only their metadata is returned.
42+
- `nodes`: Includes all nodes in the cluster and their metadata.
43+
- `routing_table`: Returns the routing information for all shards and replicas.
44+
- `version`: Displays the cluster state version number.
45+
46+
You can combine values in a comma-separated list, such as `metric=metadata,nodes,routing_table`.
47+
48+
## Request body fields
49+
50+
The `commands` array in the request body defines actions to apply to shard allocation. It supports the following actions.
51+
52+
### Move
53+
54+
The `move` command moves a started shard (primary or replica) from one node to another. This can be used to balance load or drain a node before maintenance. The shard must be in the `STARTED` state. Both primary and replica shards can be moved using this command.
55+
56+
The `move` command requires the following parameters:
57+
58+
* `index`: The name of the index.
59+
* `shard`: The shard number.
60+
* `from_node`: The name of the node to move the shard from.
61+
* `to_node`: The name of the node to move the shard to.
62+
63+
### Cancel
64+
65+
The `cancel` command cancels allocation of a shard (including recovery). This command forces resynchronization by canceling existing allocations and letting the system reinitialize them. Replica shard allocations can be canceled by default, but canceling a primary shard requires `allow_primary=true` in order to prevent accidental data disruption.
66+
67+
The `cancel` command requires the following parameters:
68+
69+
* `index`: The name of the index.
70+
* `shard`: The shard number.
71+
* `node`: The name or node ID of the node to perform the action on.
72+
* `allow_primary` _(Optional)_: If `true`, allows cancellation of primary shard allocations. Default is `false`.
73+
74+
### Allocate replica
75+
76+
The `allocate_replica` command assigns an unassigned replica to a specified node. This operation respects allocation deciders. Use this command to manually trigger allocation of replicas when automatic allocation fails.
77+
78+
The `allocate_replica` command requires the following parameters:
79+
80+
* `index`: The name of the index.
81+
* `shard`: The shard number.
82+
* `node`: The name or node ID of the node to perform the action on.
83+
84+
### Allocate stale primary
85+
86+
The `allocate_stale_primary` command force-allocates a primary shard to a node that holds a stale copy.
87+
88+
This command should be used with extreme caution. It bypasses safety checks and may lead to **data loss**, especially if a more recent shard copy exists on another node that is temporarily offline. If that node rejoins the cluster later, its data will be deleted or replaced by the stale copy that was forcefully promoted.
89+
{: .warning}
90+
91+
Use this command only when no up-to-date copies are available and you have no way to restore the original data.
92+
{: .tip}
93+
94+
The `allocate_stale_primary` command requires the following parameters:
95+
96+
* `index`: The name of the index.
97+
* `shard`: The shard number.
98+
* `node`: The name or node ID of the node to perform the action on.
99+
* `accept_data_loss`: Must be set to `true`.
100+
101+
### Allocate empty primary
102+
103+
The `allocate_empty_primary` command force-allocates a new empty primary shard to a node. This operation initializes a new primary shard without any existing data.
104+
105+
Any previous data for the shard will be **permanently lost**. If a node with valid data for that shard later rejoins the cluster, its copy will be erased. This command is intended for disaster recovery when **no valid shard copies exist** and recovery from backup or a snapshot is not possible.
106+
{: .warning}
107+
108+
The `allocate_empty_primary` command requires the following parameters:
109+
110+
* `index`: The name of the index.
111+
* `shard`: The shard number.
112+
* `node` : The name or node ID of the node to perform the action on.
113+
* `accept_data_loss`: Must be set to `true`.
114+
115+
## Example
116+
117+
The following are examples of using the Cluster Reroute API.
118+
119+
### Moving a shard
120+
121+
Create a sample index:
122+
123+
```json
124+
PUT /test-cluster-index
125+
{
126+
"settings": {
127+
"number_of_shards": 1,
128+
"number_of_replicas": 1
129+
}
130+
}
131+
```
132+
{% include copy-curl.html %}
133+
134+
Run the following reroute command to move shard `0` of the index `test-cluster-index` from node `node1` to node `node2`:
135+
136+
```json
137+
POST /_cluster/reroute
138+
{
139+
"commands": [
140+
{
141+
"move": {
142+
"index": "test-cluster-index",
143+
"shard": 0,
144+
"from_node": "node1",
145+
"to_node": "node2"
146+
}
147+
}
148+
]
149+
}
150+
```
151+
{% include copy-curl.html %}
152+
153+
### Simulating a reroute
154+
155+
To simulate a reroute without executing it, set `dry_run=true`:
156+
157+
```json
158+
POST /_cluster/reroute?dry_run=true
159+
{
160+
"commands": [
161+
{
162+
"move": {
163+
"index": "test-cluster-index",
164+
"shard": 0,
165+
"from_node": "node1",
166+
"to_node": "node2"
167+
}
168+
}
169+
]
170+
}
171+
```
172+
{% include copy-curl.html %}
173+
174+
### Retrying failed allocations
175+
176+
If some shards failed to allocate because of previous issues, you can reattempt allocation:
177+
178+
```json
179+
POST /_cluster/reroute?retry_failed=true
180+
```
181+
182+
{% include copy-curl.html %}
183+
184+
### Explaining reroute decisions
185+
186+
To understand why a reroute command is accepted or rejected, add `explain=true`:
187+
188+
```json
189+
POST /_cluster/reroute?explain=true
190+
{
191+
"commands": [
192+
{
193+
"move": {
194+
"index": "test-cluster-index",
195+
"shard": 0,
196+
"from_node": "node1",
197+
"to_node": "node3"
198+
}
199+
}
200+
]
201+
}
202+
```
203+
{% include copy-curl.html %}
204+
205+
This returns a `decisions` array explaining the outcome:
206+
207+
```json
208+
"decisions": [
209+
{
210+
"decider": "max_retry",
211+
"decision": "YES",
212+
"explanation": "shard has no previous failures"
213+
},
214+
{
215+
"decider": "replica_after_primary_active",
216+
"decision": "YES",
217+
"explanation": "shard is primary and can be allocated"
218+
},
219+
...
220+
{
221+
"decider": "remote_store_migration",
222+
"decision": "YES",
223+
"explanation": "[none migration_direction]: primary shard copy can be relocated to a non-remote node for strict compatibility mode"
224+
}
225+
]
226+
```
227+
228+
## Response body fields
229+
230+
The response includes cluster state metadata and, optionally, a `decisions` array if `explain=true` was used.
231+
232+
| Field | Data type | Description |
233+
| ---------------------------- | --------- | ----------------------------------------------------------------------- |
234+
| `acknowledged` | Boolean | States whether the reroute request was acknowledged. |
235+
| `state.cluster_uuid` | String | The unique identifier of the cluster. |
236+
| `state.version` | Integer | The version of the cluster state. |
237+
| `state.state_uuid` | String | The UUID for this specific state version. |
238+
| `state.master_node` | String | As with `cluster_manager_node`, this is maintained for backward compatibility. |
239+
| `state.cluster_manager_node` | String | The ID of the elected cluster manager node. |
240+
| `state.blocks` | Object | Any global or index-level cluster blocks. |
241+
| `state.nodes` | Object | The cluster node's metadata, including its name and address. |
242+
| `state.routing_table` | Object | The shard routing information for each index. |
243+
| `state.routing_nodes` | Object | The shard allocation organized by node. |
244+
| `commands` | List | A list of processed reroute commands. |
245+
| `explanations` | List | If `explain=true`, includes detailed explanations of the outcomes. |
246+

0 commit comments

Comments
 (0)