|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Cluster reroute |
| 4 | +nav_order: 46 |
| 5 | +parent: Cluster APIs |
| 6 | +has_children: false |
| 7 | +--- |
| 8 | + |
| 9 | +# Cluster reroute |
| 10 | + |
| 11 | +The `/_cluster/reroute` API allows you to manually control the allocation of individual shards within the cluster. This includes moving, allocating, or canceling shard allocations. It's typically used for advanced scenarios, such as manual recovery or custom load balancing. |
| 12 | + |
| 13 | +Shard movement is subject to cluster allocation deciders. Always test reroute commands using `dry_run=true` before applying them in production environments. Use the `explain=true` parameter to obtain detailed insight into allocation decisions, which can assist in understanding why a particular reroute request may or may not be allowed. If shard allocation fails because of prior issues or cluster instability, you can reattempt allocation using the `retry_failed=true` parameter. |
| 14 | + |
| 15 | +For more information regarding shard distribution and cluster health, see [Cluster health]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-health/) and [Cluster allocation explain]({{site.url}}{{site.baseurl}}/api-reference/cluster-api/cluster-allocation/). |
| 16 | + |
| 17 | +## Endpoints |
| 18 | + |
| 19 | +```json |
| 20 | +POST /_cluster/reroute |
| 21 | +``` |
| 22 | + |
| 23 | +## Query parameters |
| 24 | + |
| 25 | +| Parameter | Data type | Description | |
| 26 | +| ---------------- | --------- | -------------------------------------------------------------------------------------------------- | |
| 27 | +| `dry_run` | Boolean | If `true`, validates and simulates the reroute request without applying it. Default is `false`. | |
| 28 | +| `explain` | Boolean | If `true`, returns an explanation of why the command was accepted or rejected. Default is `false`. | |
| 29 | +| `retry_failed` | Boolean | If `true`, retries allocation of shards that previously failed. Default is `false`. | |
| 30 | +| `metric` | String | Limits the returned metadata. See [Metric options](#metric-options) for a list of available options. Default is `_all`. | |
| 31 | +| `cluster_manager_timeout` | Time | The timeout for connection to the cluster manager node. Default is `30s`. | |
| 32 | +| `timeout` | Time | The overall request timeout. Default is `30s`. | |
| 33 | + |
| 34 | +### Metric options |
| 35 | + |
| 36 | +The `metric` parameter filters the cluster state values returned by the Reroute API. This is useful for reducing response size or inspecting specific parts of the cluster state. This parameter supports the following values: |
| 37 | + |
| 38 | +- `_all` _(Default)_: Returns all available cluster state sections. |
| 39 | +- `blocks`: Includes information about read- and write-level blocks in the cluster. |
| 40 | +- `cluster_manager_node`: Shows which node is currently acting as the cluster manager. |
| 41 | +- `metadata`: Returns index settings, mappings, and aliases. If specific indexes are targeted, only their metadata is returned. |
| 42 | +- `nodes`: Includes all nodes in the cluster and their metadata. |
| 43 | +- `routing_table`: Returns the routing information for all shards and replicas. |
| 44 | +- `version`: Displays the cluster state version number. |
| 45 | + |
| 46 | +You can combine values in a comma-separated list, such as `metric=metadata,nodes,routing_table`. |
| 47 | + |
| 48 | +## Request body fields |
| 49 | + |
| 50 | +The `commands` array in the request body defines actions to apply to shard allocation. It supports the following actions. |
| 51 | + |
| 52 | +### Move |
| 53 | + |
| 54 | +The `move` command moves a started shard (primary or replica) from one node to another. This can be used to balance load or drain a node before maintenance. The shard must be in the `STARTED` state. Both primary and replica shards can be moved using this command. |
| 55 | + |
| 56 | +The `move` command requires the following parameters: |
| 57 | + |
| 58 | +* `index`: The name of the index. |
| 59 | +* `shard`: The shard number. |
| 60 | +* `from_node`: The name of the node to move the shard from. |
| 61 | +* `to_node`: The name of the node to move the shard to. |
| 62 | + |
| 63 | +### Cancel |
| 64 | + |
| 65 | +The `cancel` command cancels allocation of a shard (including recovery). This command forces resynchronization by canceling existing allocations and letting the system reinitialize them. Replica shard allocations can be canceled by default, but canceling a primary shard requires `allow_primary=true` in order to prevent accidental data disruption. |
| 66 | + |
| 67 | +The `cancel` command requires the following parameters: |
| 68 | + |
| 69 | +* `index`: The name of the index. |
| 70 | +* `shard`: The shard number. |
| 71 | +* `node`: The name or node ID of the node to perform the action on. |
| 72 | +* `allow_primary` _(Optional)_: If `true`, allows cancellation of primary shard allocations. Default is `false`. |
| 73 | + |
| 74 | +### Allocate replica |
| 75 | + |
| 76 | +The `allocate_replica` command assigns an unassigned replica to a specified node. This operation respects allocation deciders. Use this command to manually trigger allocation of replicas when automatic allocation fails. |
| 77 | + |
| 78 | +The `allocate_replica` command requires the following parameters: |
| 79 | + |
| 80 | +* `index`: The name of the index. |
| 81 | +* `shard`: The shard number. |
| 82 | +* `node`: The name or node ID of the node to perform the action on. |
| 83 | + |
| 84 | +### Allocate stale primary |
| 85 | + |
| 86 | +The `allocate_stale_primary` command force-allocates a primary shard to a node that holds a stale copy. |
| 87 | + |
| 88 | +This command should be used with extreme caution. It bypasses safety checks and may lead to **data loss**, especially if a more recent shard copy exists on another node that is temporarily offline. If that node rejoins the cluster later, its data will be deleted or replaced by the stale copy that was forcefully promoted. |
| 89 | +{: .warning} |
| 90 | + |
| 91 | +Use this command only when no up-to-date copies are available and you have no way to restore the original data. |
| 92 | +{: .tip} |
| 93 | + |
| 94 | +The `allocate_stale_primary` command requires the following parameters: |
| 95 | + |
| 96 | +* `index`: The name of the index. |
| 97 | +* `shard`: The shard number. |
| 98 | +* `node`: The name or node ID of the node to perform the action on. |
| 99 | +* `accept_data_loss`: Must be set to `true`. |
| 100 | + |
| 101 | +### Allocate empty primary |
| 102 | + |
| 103 | +The `allocate_empty_primary` command force-allocates a new empty primary shard to a node. This operation initializes a new primary shard without any existing data. |
| 104 | + |
| 105 | +Any previous data for the shard will be **permanently lost**. If a node with valid data for that shard later rejoins the cluster, its copy will be erased. This command is intended for disaster recovery when **no valid shard copies exist** and recovery from backup or a snapshot is not possible. |
| 106 | +{: .warning} |
| 107 | + |
| 108 | +The `allocate_empty_primary` command requires the following parameters: |
| 109 | + |
| 110 | +* `index`: The name of the index. |
| 111 | +* `shard`: The shard number. |
| 112 | +* `node` : The name or node ID of the node to perform the action on. |
| 113 | +* `accept_data_loss`: Must be set to `true`. |
| 114 | + |
| 115 | +## Example |
| 116 | + |
| 117 | +The following are examples of using the Cluster Reroute API. |
| 118 | + |
| 119 | +### Moving a shard |
| 120 | + |
| 121 | +Create a sample index: |
| 122 | + |
| 123 | +```json |
| 124 | +PUT /test-cluster-index |
| 125 | +{ |
| 126 | + "settings": { |
| 127 | + "number_of_shards": 1, |
| 128 | + "number_of_replicas": 1 |
| 129 | + } |
| 130 | +} |
| 131 | +``` |
| 132 | +{% include copy-curl.html %} |
| 133 | + |
| 134 | +Run the following reroute command to move shard `0` of the index `test-cluster-index` from node `node1` to node `node2`: |
| 135 | + |
| 136 | +```json |
| 137 | +POST /_cluster/reroute |
| 138 | +{ |
| 139 | + "commands": [ |
| 140 | + { |
| 141 | + "move": { |
| 142 | + "index": "test-cluster-index", |
| 143 | + "shard": 0, |
| 144 | + "from_node": "node1", |
| 145 | + "to_node": "node2" |
| 146 | + } |
| 147 | + } |
| 148 | + ] |
| 149 | +} |
| 150 | +``` |
| 151 | +{% include copy-curl.html %} |
| 152 | + |
| 153 | +### Simulating a reroute |
| 154 | + |
| 155 | +To simulate a reroute without executing it, set `dry_run=true`: |
| 156 | + |
| 157 | +```json |
| 158 | +POST /_cluster/reroute?dry_run=true |
| 159 | +{ |
| 160 | + "commands": [ |
| 161 | + { |
| 162 | + "move": { |
| 163 | + "index": "test-cluster-index", |
| 164 | + "shard": 0, |
| 165 | + "from_node": "node1", |
| 166 | + "to_node": "node2" |
| 167 | + } |
| 168 | + } |
| 169 | + ] |
| 170 | +} |
| 171 | +``` |
| 172 | +{% include copy-curl.html %} |
| 173 | + |
| 174 | +### Retrying failed allocations |
| 175 | + |
| 176 | +If some shards failed to allocate because of previous issues, you can reattempt allocation: |
| 177 | + |
| 178 | +```json |
| 179 | +POST /_cluster/reroute?retry_failed=true |
| 180 | +``` |
| 181 | + |
| 182 | +{% include copy-curl.html %} |
| 183 | + |
| 184 | +### Explaining reroute decisions |
| 185 | + |
| 186 | +To understand why a reroute command is accepted or rejected, add `explain=true`: |
| 187 | + |
| 188 | +```json |
| 189 | +POST /_cluster/reroute?explain=true |
| 190 | +{ |
| 191 | + "commands": [ |
| 192 | + { |
| 193 | + "move": { |
| 194 | + "index": "test-cluster-index", |
| 195 | + "shard": 0, |
| 196 | + "from_node": "node1", |
| 197 | + "to_node": "node3" |
| 198 | + } |
| 199 | + } |
| 200 | + ] |
| 201 | +} |
| 202 | +``` |
| 203 | +{% include copy-curl.html %} |
| 204 | + |
| 205 | +This returns a `decisions` array explaining the outcome: |
| 206 | + |
| 207 | +```json |
| 208 | +"decisions": [ |
| 209 | + { |
| 210 | + "decider": "max_retry", |
| 211 | + "decision": "YES", |
| 212 | + "explanation": "shard has no previous failures" |
| 213 | + }, |
| 214 | + { |
| 215 | + "decider": "replica_after_primary_active", |
| 216 | + "decision": "YES", |
| 217 | + "explanation": "shard is primary and can be allocated" |
| 218 | + }, |
| 219 | + ... |
| 220 | + { |
| 221 | + "decider": "remote_store_migration", |
| 222 | + "decision": "YES", |
| 223 | + "explanation": "[none migration_direction]: primary shard copy can be relocated to a non-remote node for strict compatibility mode" |
| 224 | + } |
| 225 | + ] |
| 226 | +``` |
| 227 | + |
| 228 | +## Response body fields |
| 229 | + |
| 230 | +The response includes cluster state metadata and, optionally, a `decisions` array if `explain=true` was used. |
| 231 | + |
| 232 | +| Field | Data type | Description | |
| 233 | +| ---------------------------- | --------- | ----------------------------------------------------------------------- | |
| 234 | +| `acknowledged` | Boolean | States whether the reroute request was acknowledged. | |
| 235 | +| `state.cluster_uuid` | String | The unique identifier of the cluster. | |
| 236 | +| `state.version` | Integer | The version of the cluster state. | |
| 237 | +| `state.state_uuid` | String | The UUID for this specific state version. | |
| 238 | +| `state.master_node` | String | As with `cluster_manager_node`, this is maintained for backward compatibility. | |
| 239 | +| `state.cluster_manager_node` | String | The ID of the elected cluster manager node. | |
| 240 | +| `state.blocks` | Object | Any global or index-level cluster blocks. | |
| 241 | +| `state.nodes` | Object | The cluster node's metadata, including its name and address. | |
| 242 | +| `state.routing_table` | Object | The shard routing information for each index. | |
| 243 | +| `state.routing_nodes` | Object | The shard allocation organized by node. | |
| 244 | +| `commands` | List | A list of processed reroute commands. | |
| 245 | +| `explanations` | List | If `explain=true`, includes detailed explanations of the outcomes. | |
| 246 | + |
0 commit comments