Race condition during leader stepdown

1. Node 1 is leader. It has 20 entries in its log, the first 10 of which are committed. It has pushed all 20 to Node 2, but let's assume Nodes 3, 4, and 5 haven't seen any of these uncommitted entries.
2. Somehow Node 3 gets elected while Node 1 isn't looking. It only has 10 entries in its log.
3. Node 3 sends Node 1 an AppendEntries with a higher term number, and Node 1 steps down. It truncates its log to entry 10.
4. In a separate goroutine, Node 1 prepares to send an AppendEntries request to Node 2 (that goroutine has no idea a step-down is in progress). It notes that the last entry it replicated to Node 2 was entry 20, and attempts to retrieve all entries in the log from 20 onwards.
5. `getEntriesAfter` panics with the message `"raft: Index is beyond end of log: 10 20"`

In general, the locking around log operations is super sketchy. I suspect, for instance, that the entirety of https://github.com/goraft/raft/blob/master/server.go#L904-920 ought to be protected by a single lock acquisition / release, for instance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Race condition during leader stepdown #168

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition during leader stepdown #168

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions