|
1 | 1 | ## Pruning deployments
|
2 | 2 |
|
3 |
| -Pruning is an operation that deletes data from a deployment that is only |
4 |
| -needed to respond to queries at block heights before a certain block. In |
5 |
| -GraphQL, those are only queries with a constraint `block { number: <n> } }` |
6 |
| -or a similar constraint by block hash where `n` is before the block to |
7 |
| -which the deployment is pruned. Queries that are run at a block height |
8 |
| -greater than that are not affected by pruning, and there is no difference |
9 |
| -between running these queries against an unpruned and a pruned deployment. |
| 3 | +Subgraphs, by default, store a full version history for entities, allowing |
| 4 | +consumers to query the subgraph as of any historical block. Pruning is an |
| 5 | +operation that deletes entity versions from a deployment older than a |
| 6 | +certain block, so it is no longer possible to query the deployment as of |
| 7 | +prior blocks. In GraphQL, those are only queries with a constraint `block { |
| 8 | +number: <n> } }` or a similar constraint by block hash where `n` is before |
| 9 | +the block to which the deployment is pruned. Queries that are run at a |
| 10 | +block height greater than that are not affected by pruning, and there is no |
| 11 | +difference between running these queries against an unpruned and a pruned |
| 12 | +deployment. |
10 | 13 |
|
11 | 14 | Because pruning reduces the amount of data in a deployment, it reduces the
|
12 | 15 | amount of storage needed for that deployment, and is beneficial for both
|
@@ -54,14 +57,28 @@ existing tables into new tables and then replaces the existing tables with
|
54 | 57 | these much smaller tables. Which strategy to use is determined for each
|
55 | 58 | table individually, and governed by the settings for
|
56 | 59 | `GRAPH_STORE_HISTORY_REBUILD_THRESHOLD` and
|
57 |
| -`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`: if we estimate that we will remove |
58 |
| -more than `REBUILD_THRESHOLD` of the table, the table will be rebuilt. If |
59 |
| -we estimate that we will remove a fraction between `REBUILD_THRESHOLD` and |
60 |
| -`DELETE_THRESHOLD` of the table, unneeded entity versions will be |
61 |
| -deleted. If we estimate to remove less than `DELETE_THRESHOLD`, the table |
62 |
| -is not changed at all. With both strategies, operations are broken into |
63 |
| -batches that should each take `GRAPH_STORE_BATCH_TARGET_DURATION` seconds |
64 |
| -to avoid causing very long-running transactions. |
| 60 | +`GRAPH_STORE_HISTORY_DELETE_THRESHOLD`, both numbers between 0 and 1: if we |
| 61 | +estimate that we will remove more than `REBUILD_THRESHOLD` of the table, |
| 62 | +the table will be rebuilt. If we estimate that we will remove a fraction |
| 63 | +between `REBUILD_THRESHOLD` and `DELETE_THRESHOLD` of the table, unneeded |
| 64 | +entity versions will be deleted. If we estimate to remove less than |
| 65 | +`DELETE_THRESHOLD`, the table is not changed at all. With both strategies, |
| 66 | +operations are broken into batches that should each take |
| 67 | +`GRAPH_STORE_BATCH_TARGET_DURATION` seconds to avoid causing very |
| 68 | +long-running transactions. |
| 69 | + |
| 70 | +Pruning, in most cases, runs in parallel with indexing and does not block |
| 71 | +it. When the rebuild strategy is used, pruning does block indexing while it |
| 72 | +copies non-final entities from the existing table to the new table. |
| 73 | + |
| 74 | +The initial prune started by `graphman prune` prints a progress report on |
| 75 | +the console. For the ongoing prune runs that are periodically performed, |
| 76 | +the following information is logged: a message `Start pruning historical |
| 77 | +entities` which includes the earliest and latest block, a message `Analyzed |
| 78 | +N tables`, and a message `Finished pruning entities` with details about how |
| 79 | +much was deleted or copied and how long that took. Pruning analyzes tables, |
| 80 | +if that seems necessary, because its estimates of how much of a table is |
| 81 | +likely not needed are based on Postgres statistics. |
65 | 82 |
|
66 | 83 | ### Caveats
|
67 | 84 |
|
|
0 commit comments