Skip to content

Conversation

apfitzge
Copy link

Problem

  • Alpenglow plans to allow us to switch parent mid slot
  • That would lead to a race condition:
1. worker begins executing txs on bank with slot N
2. parent switch initiates
3. "poh" shutdown channel, flushes, switches bank.
4. worker finishes execution, goes to record - succeeds because the slot is the same
  • The issue here is that PoH only checks equivalence on Slot.

Summary of Changes

  • Use BankId instead of Slot for checks on poh channels && recorder

Fixes #

@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (ee1b5d0) to head (a69a22f).
⚠️ Report is 39 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #8433   +/-   ##
=======================================
  Coverage    83.2%    83.2%           
=======================================
  Files         839      839           
  Lines      367808   367815    +7     
=======================================
+ Hits       306072   306082   +10     
+ Misses      61736    61733    -3     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@apfitzge apfitzge marked this pull request as ready for review October 13, 2025 21:39
@apfitzge
Copy link
Author

As an FYI, afaict this is not a new race introduced by the recent poh changes. Since we've always only gated poh-recording on the working bank's slot.

pub fn record(
&self,
bank_slot: Slot,
bank_id: Slot,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's all u64 in the end, but should this be BankId?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫠 I actually wish these were different types because it's really annoying that they can be passed/confused for one another.

Yes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool I think there are no more mismatches:

$ rg "Slot" | rg "bank_id"
runtime/src/prioritization_fee_cache.rs:    pub fn finalize_priority_fee(&self, slot: Slot, bank_id: BankId) {
accounts-db/src/accounts_db.rs:    pub fn purge_slot(&self, slot: Slot, bank_id: BankId, is_serialized_with_abs: bool) {
accounts-db/src/accounts_index.rs:    SlotRemoved { slot: Slot, bank_id: BankId },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants