Skip to content

Commit b95f984

Browse files
andrejtonevmatea16
andauthored
Create snapshot parallelization (#1243)
* touch * parallel flags and info * Apply suggestions from code review --------- Co-authored-by: Matea Pesic <80577904+matea16@users.noreply.github.com>
1 parent c74b4fb commit b95f984

File tree

3 files changed

+31
-0
lines changed

3 files changed

+31
-0
lines changed

pages/database-management/configuration.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,8 @@ in Memgraph.
455455
| `--storage-snapshot-interval="300`" | Define periodic snapshot schedule via cron expression or as a period in seconds. Set to empty string to disable. | `[string]` |
456456
| `--storage-snapshot-on-exit=true` | Controls whether the storage creates another snapshot on exit. | `[bool]` |
457457
| `--storage-snapshot-retention-count=3` | The number of snapshots that should always be kept. | `[uint64]` |
458+
| `--storage-parallel-snapshot-creation=false` | Controls whether the snapshot creation can be done in a multi-threaded fashion. | `[bool]` |
459+
| `--storage-snapshot-thread-count` | The number of threads used to create snapshots. Defaults to using system's maximum thread count. | `[uint64]` |
458460
| `--storage-wal-enabled=true` | Controls whether the storage uses write-ahead-logging. To enable WAL, periodic snapshots must be enabled. | `[bool]` |
459461
| `--storage-wal-file-flush-every-n-tx=100000` | Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. | `[uint64]` |
460462
| `--storage-wal-file-size-kib=20480` | Minimum file size of each WAL file. | `[uint64]` |

pages/fundamentals/data-durability.mdx

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ on the value of the `--storage-snapshot-on-exit` configuration flag. When a
8787
snapshot creation is triggered, the entire data storage is written to the drive.
8888
Nodes and relationships are divided into groups called batches.
8989

90+
Snapshot creation can be made faster by using **multiple threads**. See [Parallelized execution](#parallelized-execution) for more information.
91+
9092
On startup, the database state is recovered from the most recent snapshot file.
9193
Memgraph can read the data and build the indexes on multiple threads, using
9294
batches as a parallelization unit: each thread will recover one batch at a time
@@ -155,6 +157,15 @@ storage mode is changed to `IN_MEMORY_TRANSACTIONAL` storage mode.
155157
Snapshots and WAL files are presently not compatible between Memgraph versions.
156158
</Callout>
157159

160+
### Parallelized execution
161+
162+
Snapshot creation in Memgraph can be optimized using multiple threads, which significantly reduces the time required to create snapshots for large datasets.
163+
164+
This behavior can be controlled using the following flags:
165+
- `--storage-parallel-snapshot-creation`: This flag determines whether snapshot creation is performed in a multi-threaded fashion. By default, it is set to `false`. To enable parallelized execution, set this flag to `true`.
166+
- `--storage-snapshot-thread-count`: This flag specifies the number of threads to be used for snapshot creation. By default, Memgraph uses the system's maximum thread count. You can override this value to fine-tune performance based on your system's resources.
167+
168+
When parallelized execution is enabled, Memgraph divides the data into batches, where the batch size is defined via `--storage-items-per-batch`. The optimal batch size and thread count may vary depending on the dataset size and system configuration.
158169

159170
## Storage modes
160171

pages/help-center/errors/snapshots.mdx

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,24 @@ for security reasons, it can't automatically create a new disk copy when you
8282
use `CREATE SNAPSHOT` in Memgraph. So, while the command creates a snapshot
8383
locally, it doesn't trigger a new snapshot in the Cloud interface.
8484

85+
## Why am I seeing corrupt snapshot files named `_edge_part_` and `_vertex_part_`?
86+
87+
These files are partial results from the multi-threaded execution of snapshot creation.
88+
When Memgraph creates snapshots using multiple threads, it divides the data into smaller parts. Each thread processes a specific part and writes intermediate results to files named with the `_edge_part_` and `_vertex_part_` patterns.
89+
90+
If the snapshot creation process is interrupted or fails, these partial files may remain on disk and appear as corrupt.
91+
Memgraph cannot load these incomplete files during startup, as they do not represent a valid snapshot.
92+
93+
### How to resolve this issue?
94+
95+
To resolve this issue, you can safely delete the partial files and restart Memgraph. The database will attempt to recover its state using the most recent valid snapshot and the write-ahead log (WAL) files.
96+
97+
```bash
98+
rm /var/lib/memgraph/snapshots/*_edge_part_*
99+
rm /var/lib/memgraph/snapshots/*_vertex_part_*
100+
```
101+
102+
85103
---
86104

87105
<CommunityLinks/>

0 commit comments

Comments
 (0)