Description
Describe the bug
make full sync replication from a ssd-tier master node (no replica) will crash or hang it.
To Reproduce
Steps to reproduce the behavior:
- run a single master node (no replica) with ssd-tier opening.
- write 20,000,000 ~ 30,000,000 to master, string type kv, key len 32 and value 2KB about that will consume 36G ssd storage.
- start another node (same config as master).
- replicaof (master ip:port).
- run not long time, the master will crash or hang.
- if hang, operation on the replica will output:
(error) LOADING Dragonfly is loading the dataset in memory
- if crash, the master stderr output:
*** SIGFPE received at time=1745151458 on cpu 2 ***
PC: @ 0x5b216091c605 (unknown) mi_free_generic_mt
or
*** SIGSEGV received at time=1745156745 on cpu 1 ***
PC: @ 0x569e90c3df04 (unknown) util::fb2::EventCount::await<>()
Expected behavior
full sync normally then replica transfer to partial replication and working well
Screenshots
Environment (please complete the following information):
- OS: [ubuntu 24.04]
- hang Kernel:
Linux ubuntu2404152192 6.8.0-57-generic #59-Ubuntu SMP PREEMPT_DYNAMIC Sat Mar 15 17:40:59 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- crash Kernal:
6.6.72+ #1 SMP PREEMPT_DYNAMIC Sun Mar 30 09:01:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
- Containerized?: [hang testing in IDC Metal's QEMU, crash testing in google cloud vm ]
- Dragonfly Version: [1.28.2]
Reproducible Code Snippet
# Minimal code snippet to reproduce this bug
Additional context