-
Notifications
You must be signed in to change notification settings - Fork 951
The smaller config epoch primary will become the replica when two primaries in the same shard #2279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
…maries in the same shard Signed-off-by: Binbin <binloveplay1314@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the fix looks good to me.
Liked the GPT generated (slightly modified) ASCII seq diagram for this, pasting it here for others as well.
+------+ +------+ +------+
|Node A| |Node B| |Node C|
+------+ +------+ +------+
| | |
| <Primary> | <Replica> |
| | |
X (fails) | |
|---------------->|
| (promoted to primary)
| |
X (fails) |
|
|---------->
| (promoted to primary)
| (rejoins) |
| (no extensions shared) |
|---------------------------------->|
| (assigns random shard_id)|
|<----------------------------------|
| (direct ping: slot info) |
| Updates C’s shard_id correctly |
| Detects slot conflict |
|
|
+-----------------------------------------------+
| Outcome: Nodes A & C both see themselves as |
| primary temporarily until shard_id is updated |
| and converged correctly based on epoch value |
+-----------------------------------------------+
Signed-off-by: Binbin <binloveplay1314@qq.com>
I see the test
myself cluster nodes:
|
The problem seems to be, a primary used to had a larger config epoch, and then it became a replica, and then it update the shard_id, and then it doing the cluster reset soft and became a empty primary, but remain the larger config epoch and its shard_id, and causing this problem. IT is also the same issue, two primaries in the same shard, but the empty primary had a larger config epoch. See ##2283. I think when a replica doing the CLUSTER RESET SOFT, we should generate a new shard_id for it. |
This test case doesn's use CLUSTER RESET SOFT. ❓ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general.
# it lost all its slots to Node C, which is in another shard. | ||
# 8. Node A then updates the actual `shard_id` of Node C while processing `shard_id` in ping extensions. | ||
# 9. Node A and Node C end up being primaries in the same shard while Node C continues to own slots. | ||
start_cluster 3 2 {tags {external:skip cluster}} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can put some config we want for all nodes here.
start_cluster 3 2 {tags {external:skip cluster}} { | |
start_cluster 3 2 {tags {external:skip cluster} overrides {allow-replica-migration no}} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can, here i am only explicitly setting for R A (i will apply this change later)
This is the other test case, i handle it in #2283
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
src/cluster_legacy.c
Outdated
/* If after processing everything, we find that myself and sender are on the | ||
* same shard and are both primaries, if myself config epoch is smaller, make | ||
* it a replica. */ | ||
if (sender && nodeIsPrimary(myself) && nodeIsPrimary(sender) && areInSameShard(myself, sender) && | ||
nodeEpoch(sender) > nodeEpoch(myself)) { | ||
clusterHandlePrimariesSameShardCollision(sender); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new test added in ##2283 now currently will fail in here:
86095:M 05 Jul 2025 12:57:44.202 # === ASSERTION FAILED ===
86095:M 05 Jul 2025 12:57:44.202 # ==> cluster_legacy.c:6020 'myself->numslots == 0' is not true
------ STACK TRACE ------
Backtrace:
0 valkey-server 0x000000010227faac updateShardId + 0
1 valkey-server 0x000000010227d12c clusterReadHandler + 7736
2 valkey-server 0x00000001022f8040 connSocketEventHandler + 180
3 valkey-server 0x00000001021bedd8 aeProcessEvents + 372
4 valkey-server 0x00000001021e8f50 main + 26864
5 dyld 0x0000000180b13154 start + 2476
That is because we drop the link->support_extension in here #2283 (comment)
The shard_id was not updated in time, which led to a misjudgment. So we either need to support link->support_extension or add node->numslots judgment here.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
… faster during handshake (#2310) Add support_extension to cluster link, so that when myself receives a message from sender, even if myself does not know the sender, for example during handshake the sender node is still treated as random node, in thi case, the packet we reply to sender can also carry the extension, since the sender link explicitly indicates that it supports the extension, by doing this, we can speed up the spread of the extension. Also see #2283 and #2279 for more details.
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #2279 +/- ##
============================================
+ Coverage 71.42% 71.46% +0.03%
============================================
Files 123 123
Lines 67135 67147 +12
============================================
+ Hits 47952 47987 +35
+ Misses 19183 19160 -23
🚀 New features to boost your workflow:
|
Signed-off-by: Binbin <binloveplay1314@qq.com>
In #2261, there is a scenario that results in two primaries in a shard.
Let's say a shard has a primary(A) and a replica(B). Also let's assume that
cluster-allow-replica-migration is disabled.
shard_id
.a. Node C advertises the same set of slots that Node A was earlier owning.
b. Since Node A assigns a random shard ID to Node C, Node A thinks that it is still a
primary and it lost all its slots to Node C, which is in another shard.
shard_id
of Node C while processingshard_id
in ping extensions.So in the step 8, Node A and Node C has the same shard_id and both report themselves
as a primary, but Node C has a bigger config epoch, i guess in this case, we can try
to make Node A become a replica of Node C.
Fixes #2261.
An ASCII seq diagram: