-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[gql] Graphql transactions queries can handle db with pruning enabled #19237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
a694df0
to
381b792
Compare
3b078e1
to
0714a5c
Compare
0714a5c
to
81178c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi_cp_timestamp_ms: 1, | ||
epoch: 0, | ||
lo_cp: 0, | ||
lo_tx: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we record hi_tx
as well? It would seem to be symmetric and we do need to use that for the transaction queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we care about the transaction upperbound for whatever the checkpoint upperbound is, so it doesn't really help to track hi_tx
here.
// If we've entered this function, we already fetched `checkpoint_viewed_at` from the | ||
// `Watermark`, and so we must be able to retrieve `lo_cp` as well. | ||
let Watermark { lo_cp, lo_tx, .. } = *ctx.data_unchecked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pull this out next to the other calls that fetch from the context.
return Err(Error::Client( | ||
"Requested data is pruned and no longer available".to_string(), | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @lxfind mentioned that it would be better to treat pruned data the same as if it had never existed -- that seems like a reasonable position for transactions given that the timeframe is measured in months rather than minutes.
That should simplify the logic here as well, because I think you can skip this check entirely.
let tx_lo = match lo_record.1 { | ||
Some(lo) => lo, | ||
// Ostensibly this shouldn't happen in production, but should it occur, we can use | ||
// `network_total_transactions` to exclude checkpoints < this one | ||
None => lo_record.2, | ||
} as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this an internal error -- I want to know if something should never happen in production, and yet it does.
// SAFETY: we can unwrap because of the `Some(checkpoint_viewed_at) | ||
let cp_hi = min_option([cp_before_inclusive, cp_at, Some(checkpoint_viewed_at)]).unwrap(); | ||
|
||
// Read from the `checkpoints` table rather than the `pruner_cp_watermark` table, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an instance at which we read from pruner_cp_watermark
? (In other words, can we get rid of that table?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it's used by the indexer, although just proposed a change in data platform channel that could allow us to get rid of this outright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed that these are duplicate info between pruner_cp_watermark and checkpoints, and I was looking at the de-dup, but prob no longer relevant given indexer-alt.
@amnn I plan to wait for the |
…l, while we have a closed interval due to shift away from using network_total_transactions
51ab464
to
4ddfb4d
Compare
…ark tracks lo and hi, because lo is not static - and won't be until we introduce watermarks table, we actually have to keep fetching the min cp no?
## Description - main changes from #19237 and this is the only unknown blocker - plus rebase, addressing comments and some testing changes. ## Test plan cargo nextest run --no-capture sui-graphql-e2e-tests::tests run_test::stable/prune --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [X] GraphQL: graphql changes to enable pruning, mainly around watermark - [ ] CLI: - [ ] Rust SDK: - [ ] REST API:
Description
Extend the watermark task to also keep track of the min unpruned checkpoint and its min tx. The watermark task reads the min and max checkpoints from the
checkpoints
table, since this is the first table pruned by the pruner.While
checkpoint_viewed_at
is inherited from a field's parent, the lower checkpoint bound will always be from the watermark, because any data < and potentially = to it must have been pruned from the db already.Consequently, rather than the current assumption that the inclusive lower bound for transactions queries starts from zero if the caller did not specify a starting point, we instead use the min unpruned checkpoint and tx from the watermark task.
Returns an error if a caller tries to fetch data outside of the unpruned range (or should we return an empty response? I mirrored this off objects queries - if someone queries outside the available range, they will get an explicit error). At epoch boundary when we prune data, callers may receive an empty response instead of an error if the request was made before pruning but completed during or after pruning.
Later on, a dedicated watermarks table will simplify the query patterns needed to support the watermark task on graphql. And we should expose the unpruned range info on graphql api as well
Test plan
pruning/transactions.move
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.