Skip to content

SSD-tier: full sync replication crash or hang master #4965

Closed
@zhyhang

Description

@zhyhang

Describe the bug
make full sync replication from a ssd-tier master node (no replica) will crash or hang it.

To Reproduce
Steps to reproduce the behavior:

  1. run a single master node (no replica) with ssd-tier opening.
  2. write 20,000,000 ~ 30,000,000 to master, string type kv, key len 32 and value 2KB about that will consume 36G ssd storage.
  3. start another node (same config as master).
  4. replicaof (master ip:port).
  5. run not long time, the master will crash or hang.
  6. if hang, operation on the replica will output: (error) LOADING Dragonfly is loading the dataset in memory
  7. if crash, the master stderr output:

*** SIGFPE received at time=1745151458 on cpu 2 ***
PC: @ 0x5b216091c605 (unknown) mi_free_generic_mt
or
*** SIGSEGV received at time=1745156745 on cpu 1 ***
PC: @ 0x569e90c3df04 (unknown) util::fb2::EventCount::await<>()

Expected behavior
full sync normally then replica transfer to partial replication and working well

Screenshots

Environment (please complete the following information):

  • OS: [ubuntu 24.04]
  • hang Kernel: Linux ubuntu2404152192 6.8.0-57-generic #59-Ubuntu SMP PREEMPT_DYNAMIC Sat Mar 15 17:40:59 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • crash Kernal: 6.6.72+ #1 SMP PREEMPT_DYNAMIC Sun Mar 30 09:01:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: [hang testing in IDC Metal's QEMU, crash testing in google cloud vm ]
  • Dragonfly Version: [1.28.2]

Reproducible Code Snippet

# Minimal code snippet to reproduce this bug

Additional context

flags-6679.txt

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions