Skip to content

Conversation

roryharr
Copy link

@roryharr roryharr commented Oct 9, 2025

Problem

  • Fastboot does not support Obsolete accounts as the obsolete accounts are not persisted. Since the storages used are the same files that were being used prior to restarting, the obsolete account information needs to be saved and restored as well.

Summary of Changes

  • Added functions to serialize obsolete accounts and deserialize them

Note: After this change there will be one more PR to hook up save/restore of obsolete accounts, remove the legacy completers, rev the fastboot version and remove validator constraints.

Fixes #

/// Returns the accounts that were marked obsolete as of the passed in slot
/// or earlier. Returned data includes the slots that the accounts were marked
/// obsolete at
pub fn obsolete_accounts_at_slot(&self, slot: Slot) -> ObsoleteAccounts {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to create a new API for serialization that also returns the slots that accounts are marked obsolete at (other use cases don't include the slot).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying you have to do this here, but when creating one-off pub functions like this, I've found that naming who is calling the fn, to be helpful.

For here, that'd be something like obsolete_accounts_for_snapshots().

The main purpose being to inform others they should not call this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

bank_snapshots.into_iter().next_back()
}

pub fn write_obsolete_accounts_to_snapshot(
Copy link
Author

@roryharr roryharr Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might look a little bit strange at first: The accounts are serialized with write_obsolete_accounts_to_snapshot and deserialized with deserialize_obsolete_accounts.

This is because the translation they are handling is as follows:
AccountStorageEntry -> SerdesObsoleteAccounts -> file -> SerdesObsoleteAccounts

serialize_obsolete_accounts/deserialize_obsolete_accounts handles SerdesObsoleteAccounts -> file -> SerdesObsoleteAccounts, which are matching functions

Another approach is separating collect_obsolete_accounts and serialize at the caller, but that bled the internals of SerdesObsoleteAccounts into core which I want to avoid.

Combining write_obsolete_accounts_to_snapshot and serialize_obsolete_accounts and just calling it serialize_obsolete_accounts made for difficult unit test writing.

Ok(consumed_size)
}

#[allow(dead_code)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will be used in the next PR.

@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 95.93220% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.2%. Comparing base (a9eee69) to head (94d0722).
⚠️ Report is 62 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #8414    +/-   ##
========================================
  Coverage    83.2%    83.2%            
========================================
  Files         846      847     +1     
  Lines      369251   369533   +282     
========================================
+ Hits       307229   307621   +392     
+ Misses      62022    61912   -110     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roryharr roryharr marked this pull request as ready for review October 9, 2025 17:52
Copy link

@brooksprumo brooksprumo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just over the serde types, which we need to modify. I'll do another pass to look at the actual serialization and deserialization code.

/// Returns the accounts that were marked obsolete as of the passed in slot
/// or earlier. Returned data includes the slots that the accounts were marked
/// obsolete at
pub fn obsolete_accounts_at_slot(&self, slot: Slot) -> ObsoleteAccounts {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying you have to do this here, but when creating one-off pub functions like this, I've found that naming who is calling the fn, to be helpful.

For here, that'd be something like obsolete_accounts_for_snapshots().

The main purpose being to inform others they should not call this function.

Comment on lines 54 to 65
#[derive(Debug, Default, Serialize, Deserialize)]
pub(crate) struct SerdeObsoleteAccounts {
/// The ID of the associated account file. Used for verification to ensure the restored
/// obsolete accounts correspond to the correct account file
pub id: SerializedAccountsFileId,
/// The number of obsolete bytes in the storage. These bytes are removed during archive
/// serialization/deserialization but are present when restoring from directories. This value
/// is used to validate the size when creating the accounts file.
pub bytes: u64,
/// A list of accounts that are obsolete in the storage being restored.
pub accounts: ObsoleteAccounts,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here's where we need to add Serialize and Deserialize. Plus the frozen ABI stuff.

e.g.

#[cfg_attr(
    feature = "frozen-abi",
    derive(AbiExample),
    frozen_abi(digest = "TODO")
)]
#[derive(Debug, Clone, PartialEq, Default, Serialize, Deserialize)]
struct SerdeObsoleteAccounts{
    id: SerializedAccountsFileId,
    bytes: u64,
    accounts: Vec<(Offset, usize, Slot)>
}

Note, we could leave the usize in the obsolete accounts tuple, or change to a u64. Either is probably fine (I think the mapping to u64 would end up being a noop once compiled?).

In either case, we'll want to set the bincode options to use fixed int encoded.

e.g.:

bincode::DefaultOptions::new().with_fixint_encoding()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't implemented the abi stuff yet. I'll take a look tomorrow.

@brooksprumo brooksprumo self-requested a review October 9, 2025 18:56
/// is used to validate the size when creating the accounts file.
pub bytes: u64,
/// A list of accounts that are obsolete in the storage being restored.
pub accounts: ObsoleteAccounts,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use Vec<(Offset, usize, slot)> directly here and remove Serialize/Deserialize from ObsoleteAccounts.

let obsolete_accounts_file = fs::File::create(&obsolete_accounts_path)?;
let mut file_stream = BufWriter::new(obsolete_accounts_file);

for (slot, obsolete_accounts) in obsolete_accounts_map {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashmap has non-deterministic order.
how about sort before serializing to ensure deterministic snapshot files?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not recommend this. We have many hashmaps in the snapshot already, and we don't sort those.

But we should not be serializing item by item. There are serde helpers for serializing the whole hashmap. See solana_runtime::serde_snapshot::utils::serialize_iter_as_seq() et al.

@HaoranYi
Copy link

HaoranYi commented Oct 9, 2025

overall LGTM. The main fix is brook's comment for Ser/Der type boundary. And a minor suggestion for deterministic order for reproducible snapshots.

pub use {
buffered_reader::large_file_buf_reader, file_io::validate_memlock_limit_for_disk_io,
obsolete_accounts::ObsoleteAccounts,
obsolete_accounts::{ObsoleteAccounts, ObsoleteAccountItem},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to make this public for access in storage.rs


#[derive(Debug, Clone, PartialEq)]
struct ObsoleteAccountItem {
pub struct ObsoleteAccountItem {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, public for access in storage.rs

#[test_case(0, 0)]
#[test_case(1, 0)]
#[test_case(10, 15)]
fn test_serialize_and_deserialize_obsolete_accounts(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to test in the latest review.

(serializable_bank, serializable_accounts_db, extra_fields).serialize(serializer)
}

pub(crate) fn serialize_obsolete_accounts<W>(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really clear where the right place for this to live is, the ser/des functions are spread all over the place. Any feedback?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... it's not great... We could create a new file runtime/src/serde_snapshot/obsolete_accounts.rs and put everything in there. The Serde struct, the serialize/deserialize functions, and the tests.

I kind of like that. Wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, makes sense.

let bincode = bincode::DefaultOptions::new().with_fixint_encoding();
bincode.serialize_into(
stream,
&utils::serialize_iter_as_tuple(obsolete_accounts_map.iter()),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to serialize as tuple

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the serialize_status_cache(), and I think we can pass obsoelte_accounts_map to serialize_into() and it'll just work?

Suggested change
&utils::serialize_iter_as_tuple(obsolete_accounts_map.iter()),
obsolete_accounts_map,

Minimally clippy is happy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If clippy is happy, it probably works!

Copy link
Author

@roryharr roryharr Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it out, it doesn't work:
It is because it is being serialized as a hashmap and deserialized as a dashmap.
I could use a dashmap on the serialization side, but dashmaps do not suppot serialize/deseralize, so it would need something like serialize_iter_as_tuple anyways in that situation.

Dashmap is definitely needed on the deserialize size as performance testing found. It could be deseralized as a hashmap and then converted to a dashmap, but that seems less efficient.

Thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a serde feature for DashMap. I haven't used it before. Maybe it'll work here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But otherwise, yeah I agree with you. Probably don't want to do extra allocations/conversions unnecessarily.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passes testing. I will check performance in a bit

@roryharr roryharr requested a review from HaoranYi October 9, 2025 23:44
@HaoranYi
Copy link

HaoranYi commented Oct 10, 2025

Do we need the slots info to ser/des, i.e. Vec<(Offset, usize, slot)> --> Vec<(Offset, usize)>?
I think we can only ser/des a vec of obsolete accounts without slot. At restart, we load the vec and add snapshot slot to them, then insert it into the map. This can save 1/3 of the space.

#[test_case(10, 15)]
fn test_serialize_and_deserialize_obsolete_accounts(
num_storages: u64,
num_accounts_per_storage: usize,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt

Suggested change
num_accounts_per_storage: usize,
num_obsolete_accounts_per_storage: usize,

// Deserialize the obsolete accounts
let cursor = Cursor::new(buf.as_slice());
let mut reader = BufReader::new(cursor);
let deserialized_accounts = deserialize_obsolete_accounts(&mut reader).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt

Suggested change
let deserialized_accounts = deserialize_obsolete_accounts(&mut reader).unwrap();
let deserialized_obsolete_accounts = deserialize_obsolete_accounts(&mut reader).unwrap();

obsolete_accounts.id,
));
}
let accounts = ObsoleteAccounts {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt

Suggested change
let accounts = ObsoleteAccounts {
let obsolete_accounts = ObsoleteAccounts {

let bincode = bincode::DefaultOptions::new().with_fixint_encoding();
bincode.serialize_into(
stream,
&utils::serialize_iter_as_tuple(obsolete_accounts_map.iter()),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the serialize_status_cache(), and I think we can pass obsoelte_accounts_map to serialize_into() and it'll just work?

Suggested change
&utils::serialize_iter_as_tuple(obsolete_accounts_map.iter()),
obsolete_accounts_map,

Minimally clippy is happy.

Comment on lines 1333 to 1334
let consumed_size = file_stream.stream_position()?;
Ok(consumed_size)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to check the size and set some limit, similar to serialize_snapshot_data_file_capped().

Likely the same on the deserialize side too (arguably more important on the deserialize side).

Comment on lines +3622 to +3633
let mut snapshot_storages = Vec::new();
for i in 0..num_storages {
let storage = Arc::new(AccountStorageEntry::new(
&PathBuf::new(),
i, // Incrementing slot
i as u32, // Incrementing id
1024,
AccountsFileProvider::AppendVec,
StorageAccess::File,
));
snapshot_storages.push(storage);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let snapshot_storages: Vec<_> = (0..num_storages).map(|i| {
    // create storage
})
.collect();

- Added Type SerdeObsoleteAccountsMap
- Consolidated some function under SerdeObsoleteAccountsMap
- Simplified serdes functions. Added serdes_snapshot::serialize_into
@roryharr
Copy link
Author

roryharr commented Oct 14, 2025

Changing to a dashmap ended up being a big change:

  • Frozen_ABI had to move onto the dashmap, necessitating a struct to use with frozen_abi
  • With the new type, some functions were moved onto the new type.
  • serde implementations went away, as the generic serde functions were now sufficient

Not sure they best way to review this. Might be best to review the entire thing.

frozen_abi(digest = "12qimMBghYs9dL4nhws7Xe1B5MXWBV75VTMTGLHxevYE")
)]
#[derive(Serialize, Deserialize, Debug)]
pub(crate) struct SerdeObsoleteAccountsMap {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new type added, and the frozen_abi


impl SerdeObsoleteAccountsMap {
/// Creates a new `SerdeObsoleteAccountsMap` from a list of storage entries and a snapshot slot.
pub(crate) fn new_from_storages(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this new and pushed the complexity of creating it into here.

}

/// Removes and returns the obsolete accounts data for a given slot.
pub(crate) fn remove(&self, slot: &Slot) -> Option<(ObsoleteAccounts, AccountsFileId, usize)> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the transformation to a more general type here as well.

.deserialize_from::<R, T>(reader)
}

pub fn serialize_into<W, T>(writer: W, value: &T) -> bincode::Result<()>
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this function and made the above one public.
It likely makes sense to have the with_limit parameter be passed in, but I think that would be a separate refactor.

let updated_len = current_len + obsolete_accounts.bytes as usize;
let id = id as SerializedAccountsFileId;
if obsolete_accounts.id != id {
let updated_len = current_len + obsolete_accounts.2;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more clear if you do a diff to the last two commits, but there was a translation from the SerdeObsoleteAccounts to ObsoleteAccounts performed in this function. I moved the translation into obsolete_accounts.rs.

Generally I would like to avoid the .0, .1, .2 calls here, but the other options I see are

  1. passing it in as 3 individual options. This seems very confusing as they are tied together.
  2. Adding a struct just for this function call...

@roryharr roryharr requested a review from brooksprumo October 14, 2025 20:19
snapshot_slot: Slot,
) -> Self {
let map = DashMap::with_capacity(snapshot_storages.len());
snapshot_storages.par_iter().for_each(|storage| {
Copy link
Author

@roryharr roryharr Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to par_iter. Providing large benefits:

without par_iter

./agave-validator.log:[2025-10-14T18:44:49.214644854Z INFO  solana_core::snapshot_packager_service] Saving obsolete accounts...
./agave-validator.log:[2025-10-14T18:44:49.497557413Z INFO  solana_core::snapshot_packager_service] Saving obsolete accounts... Done in 282.908062ms

With par_iter

./agave-validator.log:[2025-10-14T19:36:40.040171417Z INFO  solana_core::snapshot_packager_service] Saving obsolete accounts...
./agave-validator.log:[2025-10-14T19:36:40.141304000Z INFO  solana_core::snapshot_packager_service] Saving obsolete accounts... Done in 101.127826ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants