Skip to content

Update clusterMoveNodeSlots to also move importing slots and migrating slots #2370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 28, 2025

Conversation

enjoy-binbin
Copy link
Member

@enjoy-binbin enjoy-binbin commented Jul 21, 2025

In #2301, we added clusterMoveNodeSlots to implement the logic of
moving slots from old primary to new primary, when myself receives
the replica (old primary) message first and the new primary message
later in a shard failover.

However due to this, when myself receives the new primary message
later next time, there is no way to call clusterUpdateSlotsConfigWith,
because we have already updated the slots of the new primary before.
This result in, for example, importing slots and migrating slots
not being updated, see #445.

In this commit, we also make clusterMoveNodeSlots to move importing
slots and migrating slots.

Fixes #2363.

…g slots

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin
Copy link
Member Author

@PingXie Please take a look before we moving forward.

Copy link

codecov bot commented Jul 21, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.44%. Comparing base (663dac9) to head (baaa0ec).
⚠️ Report is 8 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2370      +/-   ##
============================================
+ Coverage     71.30%   71.44%   +0.13%     
============================================
  Files           123      123              
  Lines         67122    67156      +34     
============================================
+ Hits          47859    47977     +118     
+ Misses        19263    19179      -84     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.66% <100.00%> (-0.21%) ⬇️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@enjoy-binbin enjoy-binbin added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jul 22, 2025
Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin
Copy link
Member Author

A test log example:

37491:M 22 Jul 2025 11:23:37.405 # DEBUG LOG: ========== I am primary 1 ==========
37491:M 22 Jul 2025 11:23:38.716 * Cluster state changed: ok
### Starting test The role change and the slot ownership change should be an atomic operation in tests/unit/cluster/manual-failover.tcl

# cluster setslot
# Move slot 0 from R0 to R1. Move slot 5462 from R1 to R0.
37491:M 22 Jul 2025 11:23:47.123 * Importing slot 0 from node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 ()
37491:M 22 Jul 2025 11:23:47.124 * Migrating slot 5462 to node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 ()

# Failover occurred in R0/R3 shard.
# Move importing-source and migrating-target to R3.
37491:M 22 Jul 2025 11:23:47.160 * Failover occurred in migration source. Update importing source for slot 0 to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.
37491:M 22 Jul 2025 11:23:47.161 * Failover occurred in migration target. Slot 5462 is now being migrated to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.

# Move importing-source and migrating-target to R3.
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0; node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 () lost 5462 slot(s) and failed over to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () with a config epoch of 4
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in migration source. Update importing source of 1 slot(s) to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.
37491:M 22 Jul 2025 11:23:47.161 * A failover occurred in migration target. Update migrating target of 1 slot(s) to node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0.

37491:M 22 Jul 2025 11:23:47.161 * Node 2792b5e80e85df6d5ced4ef2ccbaab3f80c73cf4 () is now a replica of node b4bc7817919ab80ce412d34ed7ab5f2f72944117 () in shard 3637c7bb6b2cd9f5018a27b8639346ac881d31c0

@enjoy-binbin enjoy-binbin requested a review from PingXie July 22, 2025 03:27
Signed-off-by: Binbin <binloveplay1314@qq.com>
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @enjoy-binbin

Signed-off-by: Binbin <binloveplay1314@qq.com>
@PingXie
Copy link
Member

PingXie commented Jul 27, 2025

Hmm, after reading through #2363, I realize that we now have duplicated logic in two places clusterUpdateSlotsConfigWith and clusterMoveNodeSlots. need to think about this some more...

@PingXie
Copy link
Member

PingXie commented Jul 28, 2025

I realize that we now have duplicated logic in two places clusterUpdateSlotsConfigWith and clusterMoveNodeSlots. need to think about this some more...

ok clusterMoveNodeSlots is a special case of clusterUpdateSlotsConfigWith. I played with the code a bit but didn't see a clean way to single source the two. I think your implementation is good.

the only minor thing is that the same logging statements in clusterUpdateSlotsConfigWith are at NOTICE so I'd suggest lowering them to VERBOSE as well to be consistent, for the same-shard path only. we can keep the logging statements for the different-shard path at NOTICE.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin requested a review from PingXie July 28, 2025 03:05
Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@enjoy-binbin enjoy-binbin merged commit a481fe2 into valkey-io:unstable Jul 28, 2025
57 of 61 checks passed
@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 8.0 Jul 28, 2025
@enjoy-binbin enjoy-binbin moved this to To be backported in Valkey 8.1 Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
Status: To be backported
Status: To be backported
Status: Done
Development

Successfully merging this pull request may close these issues.

[test-failure] Slot-migration related
3 participants