Skip to content

Create snapshot parallelization #1243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pages/database-management/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,8 @@ in Memgraph.
| `--storage-snapshot-interval="300`" | Define periodic snapshot schedule via cron expression or as a period in seconds. Set to empty string to disable. | `[string]` |
| `--storage-snapshot-on-exit=true` | Controls whether the storage creates another snapshot on exit. | `[bool]` |
| `--storage-snapshot-retention-count=3` | The number of snapshots that should always be kept. | `[uint64]` |
| `--storage-parallel-snapshot-creation=false` | Controls whether the snapshot creation can be done in a multi-threaded fashion. | `[bool]` |
| `--storage-snapshot-thread-count` | The number of threads used to create snapshots. Defaults to using system's maximum thread count. | `[uint64]` |
| `--storage-wal-enabled=true` | Controls whether the storage uses write-ahead-logging. To enable WAL, periodic snapshots must be enabled. | `[bool]` |
| `--storage-wal-file-flush-every-n-tx=100000` | Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. | `[uint64]` |
| `--storage-wal-file-size-kib=20480` | Minimum file size of each WAL file. | `[uint64]` |
Expand Down
11 changes: 11 additions & 0 deletions pages/fundamentals/data-durability.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ on the value of the `--storage-snapshot-on-exit` configuration flag. When a
snapshot creation is triggered, the entire data storage is written to the drive.
Nodes and relationships are divided into groups called batches.

Snapshot creation can be made faster by using **multiple threads**. See [Parallelized execution](#parallelized-execution) for more information.

On startup, the database state is recovered from the most recent snapshot file.
Memgraph can read the data and build the indexes on multiple threads, using
batches as a parallelization unit: each thread will recover one batch at a time
Expand Down Expand Up @@ -155,6 +157,15 @@ storage mode is changed to `IN_MEMORY_TRANSACTIONAL` storage mode.
Snapshots and WAL files are presently not compatible between Memgraph versions.
</Callout>

### Parallelized execution

Snapshot creation in Memgraph can be optimized using multiple threads, which significantly reduces the time required to create snapshots for large datasets.

This behavior can be controlled using the following flags:
- `--storage-parallel-snapshot-creation`: This flag determines whether snapshot creation is performed in a multi-threaded fashion. By default, it is set to `false`. To enable parallelized execution, set this flag to `true`.
- `--storage-snapshot-thread-count`: This flag specifies the number of threads to be used for snapshot creation. By default, Memgraph uses the system's maximum thread count. You can override this value to fine-tune performance based on your system's resources.

When parallelized execution is enabled, Memgraph divides the data into batches, where the batch size is defined via `--storage-items-per-batch`. The optimal batch size and thread count may vary depending on the dataset size and system configuration.

## Storage modes

Expand Down
18 changes: 18 additions & 0 deletions pages/help-center/errors/snapshots.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,24 @@ for security reasons, it can't automatically create a new disk copy when you
use `CREATE SNAPSHOT` in Memgraph. So, while the command creates a snapshot
locally, it doesn't trigger a new snapshot in the Cloud interface.

## Why am I seeing corrupt snapshot files named `_edge_part_` and `_vertex_part_`?

These files are partial results from the multi-threaded execution of snapshot creation.
When Memgraph creates snapshots using multiple threads, it divides the data into smaller parts. Each thread processes a specific part and writes intermediate results to files named with the `_edge_part_` and `_vertex_part_` patterns.

If the snapshot creation process is interrupted or fails, these partial files may remain on disk and appear as corrupt.
Memgraph cannot load these incomplete files during startup, as they do not represent a valid snapshot.

### How to resolve this issue?

To resolve this issue, you can safely delete the partial files and restart Memgraph. The database will attempt to recover its state using the most recent valid snapshot and the write-ahead log (WAL) files.

```bash
rm /var/lib/memgraph/snapshots/*_edge_part_*
rm /var/lib/memgraph/snapshots/*_vertex_part_*
```


---

<CommunityLinks/>