Skip to content

stop master on AOF short write if there are enough good replicas #2375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: unstable
Choose a base branch
from

Conversation

kronwerk
Copy link
Contributor

@kronwerk kronwerk commented Jul 23, 2025

when we have a primary disk filled with AOF we might finally have it stalled forever in that state - in this PR the config option is added to kill master having enough good replicas on such occasion (with recently added failover attempt for cluster-version reused)

Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
Copy link

codecov bot commented Jul 24, 2025

Codecov Report

Attention: Patch coverage is 47.36842% with 10 lines in your changes missing coverage. Please review.

Project coverage is 71.41%. Comparing base (a739531) to head (e431225).

Files with missing lines Patch % Lines
src/aof.c 0.00% 7 Missing ⚠️
src/eval.c 50.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##           unstable    #2375   +/-   ##
=========================================
  Coverage     71.41%   71.41%           
=========================================
  Files           123      123           
  Lines         67139    67153   +14     
=========================================
+ Hits          47947    47959   +12     
- Misses        19192    19194    +2     
Files with missing lines Coverage Δ
src/config.c 78.47% <ø> (ø)
src/replication.c 86.73% <100.00%> (-0.07%) ⬇️
src/server.h 100.00% <ø> (ø)
src/valkey-benchmark.c 61.49% <ø> (+0.21%) ⬆️
src/eval.c 87.43% <50.00%> (-0.68%) ⬇️
src/aof.c 80.12% <0.00%> (-0.41%) ⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
@kronwerk kronwerk force-pushed the f/stop_master branch 3 times, most recently from 223d7a7 to 6489e52 Compare July 24, 2025 15:12
Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
@kronwerk kronwerk marked this pull request as ready for review July 24, 2025 17:25
Copy link
Contributor

@murphyjacob4 murphyjacob4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think there is definitely a gap here - thanks for identifying it and putting together a solution. I have a few comments about the approach, please take a look!

listIter li;
listNode *ln;
int good = 0;

if (!server.repl_min_replicas_to_write || !server.repl_min_replicas_max_lag) return;

listRewind(server.replicas, &li);
while ((ln = listNext(&li))) {
client *replica = ln->value;
time_t lag = server.unixtime - replica->repl_data->repl_ack_time;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lag here means "how long (in seconds) since our last REPLCONF ACK" response

If we are testing lag==0, that doesn't necessarily guarantee that the replica is fully caught up. It just means that we have gotten a REPLCONF ACK on the current second (this could have been up to 999ms ago in the worst case).

Should we compare the replication offset instead?

Copy link
Contributor Author

@kronwerk kronwerk Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@murphyjacob4 hmmm, this was introduced 12 years ago - maybe could be improved.
image

not sure though about the right way to do it - in my opinion repl_min_replicas_max_lag is a way to allow considering a replica "good" even if there is a set-by-user lag.

if we want to switch to offsets - what should we do? ignore repl_min_replicas_max_lag, deprecate it, introduce repl_min_replicas_max_offset? would it be convenient and clear for a user which offset is enough to use for the same use case? or repl_min_replicas_max_offset should be always 0 (equal offsets for primary and replica) - but in this case we lose this functionality (allowing to consider lagged replica "good"), is it ok for us?

Signed-off-by: kronwerk <kronwerk@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants