Skip to content

Conversation

emmazzz
Copy link
Contributor

@emmazzz emmazzz commented Dec 14, 2024

Description

Added prune implementations for pipeline inside indexer alt schema, built upon Will's cp mapping PR.

Test plan

Will add tests.


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Dec 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 9, 2025 5:01pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 5:01pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 5:01pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Jan 9, 2025 5:01pm

@emmazzz emmazzz temporarily deployed to sui-typescript-aws-kms-test-env December 14, 2024 04:11 — with GitHub Actions Inactive
@emmazzz emmazzz marked this pull request as ready for review December 14, 2024 08:01
@emmazzz emmazzz temporarily deployed to sui-typescript-aws-kms-test-env December 14, 2024 08:01 — with GitHub Actions Inactive
async fn prune(range: PrunableRange, conn: &mut db::Connection<'_>) -> Result<usize> {
let (from, to) = range.containing_epochs();
let filter = kv_epoch_starts::table
.filter(kv_epoch_starts::epoch.between(from as i64, to as i64 - 1));
Copy link
Contributor Author

@emmazzz emmazzz Dec 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not fully convinced myself that pruning epoch grained tables will just work like this, even when our watermark and retention are checkpoint grained. I will add a test tomorrow to make sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed -- and I would say that if this implementation doesn't behave correctly, it would be good to change the helper function in PrunableRange so that all prune impls can follow the same pattern (regardless of whether they are epoch-, checkpoint-, or transaction-grained), rather than change the epoch-grained impls to have a slightly different structure.

@emmazzz emmazzz requested review from amnn and wlmyng December 14, 2024 08:04
Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @emmazzz. There are some suggested changes on @wlmyng's PR that may affect this one (how the epoch helpers are implemented might introduce an off-by-one difference in this PR, and there is also a suggestion about changing the PrunableRange interface to move the responsibility to translate bounds into the individual prune impls).

I'll leave it with you and @wlmyng to coordinate those changes and then land both PRs!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we can now delete by tx sequence number, should we get rid of the index on cp_sequence_number and write this prune impl based on the tx_interval?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have an index on cp_sequence_number? Seems like it is a field on the table, but no corresponding index. We might as well proceed with implementation based on tx_interval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I must've missed something ... it looks like kv_transactions has only the cp_sequence_number field. So we'd need to backfill that. And since the primary key is on tx_digest, we'd need to introduce an index on tx_sequence_number

async fn prune(range: PrunableRange, conn: &mut db::Connection<'_>) -> Result<usize> {
let (from, to) = range.containing_epochs();
let filter = kv_epoch_starts::table
.filter(kv_epoch_starts::epoch.between(from as i64, to as i64 - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed -- and I would say that if this implementation doesn't behave correctly, it would be good to change the helper function in PrunableRange so that all prune impls can follow the same pattern (regardless of whether they are epoch-, checkpoint-, or transaction-grained), rather than change the epoch-grained impls to have a slightly different structure.

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think again this PR would change because of comments on the earlier PR, but I think those changes should be mechanical, so accepting to unblock!

@wlmyng wlmyng force-pushed the indexer-alt-cp-mapping-for-pruning branch 3 times, most recently from 92c75e5 to 87504e7 Compare December 26, 2024 17:11
@wlmyng wlmyng force-pushed the indexer-alt-prune-impls branch from b07a86e to 4726c70 Compare December 26, 2024 17:12
@wlmyng wlmyng temporarily deployed to sui-typescript-aws-kms-test-env December 26, 2024 17:12 — with GitHub Actions Inactive
@wlmyng wlmyng force-pushed the indexer-alt-prune-impls branch from 4726c70 to e8252bf Compare December 26, 2024 18:45
@wlmyng wlmyng temporarily deployed to sui-typescript-aws-kms-test-env December 26, 2024 18:45 — with GitHub Actions Inactive
.await?)
}

async fn prune(from: u64, to: u64, conn: &mut db::Connection<'_>) -> Result<usize> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the to to to_exclusive in all implementations?

} = tx_interval(conn, from..to).await?;

let filter = ev_emit_mod::table
.filter(ev_emit_mod::tx_sequence_number.between(from_tx as i64, to_tx as i64 - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why - 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_tx is the first tx of the to_exclusive checkpoint and should not be pruned

@wlmyng wlmyng force-pushed the indexer-alt-cp-mapping-for-pruning branch from 87504e7 to 4e0e03b Compare January 8, 2025 00:17
Base automatically changed from indexer-alt-cp-mapping-for-pruning to main January 8, 2025 00:54
@wlmyng wlmyng force-pushed the indexer-alt-prune-impls branch from 9fd872e to 5ef7c04 Compare January 9, 2025 16:56
@wlmyng wlmyng temporarily deployed to sui-typescript-aws-kms-test-env January 9, 2025 16:56 — with GitHub Actions Inactive
@wlmyng wlmyng merged commit a114f51 into main Jan 9, 2025
50 of 51 checks passed
@wlmyng wlmyng deleted the indexer-alt-prune-impls branch January 9, 2025 19:17
Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some post-land questions/notes!

start: from_epoch,
end: to_epoch,
} = epoch_interval(conn, from..to_exclusive).await?;
if from_epoch < to_epoch {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check strictly necessary? If they were equal, the between would contain conflicting constraints and nothing would be captured by it, right?

The reason I'm asking is that it piqued my interest that this impl had this check, but the previous ones didn't. If there was a correctness reason to have this check (i.e. that we will get some sort of error by having the bounds passed to BETWEEN inverted), then we are relying on that situation not coming up in practice because we produce checkpoints (and therefore transactions) more often than we prune -- that makes me nervous because we could definitely end up falling foul of this in test scenarios.

If there is not a correctness reason to have this test, then the only other reason I could think of was performance, but this is one of the cheapest tables we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Postgres has an issue with BETWEEN being inverted, but this was just something explicit to handle a possibility unique to epoch. Given [from, to) checkpoints, they may both fall under the same epoch. Conceptually we don't want to prune that epoch

pruning from chkpt 0 to 1
from_epoch: 0, to_epoch: 0
skipping because from_epoch >= to_epoch
pruning from chkpt 1 to 2
from_epoch: 0, to_epoch: 1
pruning from 0 to 0
pruning from chkpt 2 to 3
from_epoch: 1, to_epoch: 1
skipping because from_epoch >= to_epoch
pruning from chkpt 3 to 4
from_epoch: 1, to_epoch: 1
skipping because from_epoch >= to_epoch

If you do run a query ... delete between 0 and 0 - 1 Postgres returns no results from execution

Comment on lines +77 to +78
// TODO: use tx_interval. `tx_sequence_number` needs to be added to this table, and an index
// created as its primary key is on `tx_digest`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep this table as it is. Its schema needs to match the schema and indices need to match what is offered in the KV store as much as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants