Skip to content

SSD-tier: full sync replication crash or hang master #4965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhyhang opened this issue Apr 20, 2025 · 1 comment
Open

SSD-tier: full sync replication crash or hang master #4965

zhyhang opened this issue Apr 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@zhyhang
Copy link

zhyhang commented Apr 20, 2025

Describe the bug
make full sync replication from a ssd-tier master node (no replica) will crash or hang it.

To Reproduce
Steps to reproduce the behavior:

  1. run a single master node (no replica) with ssd-tier opening.
  2. write 20,000,000 ~ 30,000,000 to master, string type kv, key len 32 and value 2KB about that will consume 36G ssd storage.
  3. start another node (same config as master).
  4. replicaof (master ip:port).
  5. run not long time, the master will crash or hang.
  6. if hang, operation on the replica will output: (error) LOADING Dragonfly is loading the dataset in memory
  7. if crash, the master stderr output:

*** SIGFPE received at time=1745151458 on cpu 2 ***
PC: @ 0x5b216091c605 (unknown) mi_free_generic_mt
or
*** SIGSEGV received at time=1745156745 on cpu 1 ***
PC: @ 0x569e90c3df04 (unknown) util::fb2::EventCount::await<>()

Expected behavior
full sync normally then replica transfer to partial replication and working well

Screenshots

Environment (please complete the following information):

  • OS: [ubuntu 24.04]
  • hang Kernel: Linux ubuntu2404152192 6.8.0-57-generic #59-Ubuntu SMP PREEMPT_DYNAMIC Sat Mar 15 17:40:59 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • crash Kernal: 6.6.72+ #1 SMP PREEMPT_DYNAMIC Sun Mar 30 09:01:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
  • Containerized?: [hang testing in IDC Metal's QEMU, crash testing in google cloud vm ]
  • Dragonfly Version: [1.28.2]

Reproducible Code Snippet

# Minimal code snippet to reproduce this bug

Additional context

flags-6679.txt

@zhyhang zhyhang added the bug Something isn't working label Apr 20, 2025
@vyavdoshenko
Copy link
Contributor

@zhyhang
Thanks for reporting the issue.

@romange romange changed the title make full sync replication crash or hang master SSD-tier: full sync replication crash or hang master Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants