-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[indexer-alt] add prune impls for each pipeline #20635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
async fn prune(range: PrunableRange, conn: &mut db::Connection<'_>) -> Result<usize> { | ||
let (from, to) = range.containing_epochs(); | ||
let filter = kv_epoch_starts::table | ||
.filter(kv_epoch_starts::epoch.between(from as i64, to as i64 - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not fully convinced myself that pruning epoch grained tables will just work like this, even when our watermark and retention are checkpoint grained. I will add a test tomorrow to make sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- and I would say that if this implementation doesn't behave correctly, it would be good to change the helper function in PrunableRange
so that all prune impls can follow the same pattern (regardless of whether they are epoch-, checkpoint-, or transaction-grained), rather than change the epoch-grained impls to have a slightly different structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks @emmazzz. There are some suggested changes on @wlmyng's PR that may affect this one (how the epoch helpers are implemented might introduce an off-by-one difference in this PR, and there is also a suggestion about changing the PrunableRange
interface to move the responsibility to translate bounds into the individual prune
impls).
I'll leave it with you and @wlmyng to coordinate those changes and then land both PRs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we can now delete by tx sequence number, should we get rid of the index on cp_sequence_number
and write this prune
impl based on the tx_interval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually have an index on cp_sequence_number
? Seems like it is a field on the table, but no corresponding index. We might as well proceed with implementation based on tx_interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I must've missed something ... it looks like kv_transactions
has only the cp_sequence_number
field. So we'd need to backfill that. And since the primary key is on tx_digest
, we'd need to introduce an index on tx_sequence_number
async fn prune(range: PrunableRange, conn: &mut db::Connection<'_>) -> Result<usize> { | ||
let (from, to) = range.containing_epochs(); | ||
let filter = kv_epoch_starts::table | ||
.filter(kv_epoch_starts::epoch.between(from as i64, to as i64 - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed -- and I would say that if this implementation doesn't behave correctly, it would be good to change the helper function in PrunableRange
so that all prune impls can follow the same pattern (regardless of whether they are epoch-, checkpoint-, or transaction-grained), rather than change the epoch-grained impls to have a slightly different structure.
160d0a8
to
6cabf8a
Compare
2ce69cb
to
438479b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think again this PR would change because of comments on the earlier PR, but I think those changes should be mechanical, so accepting to unblock!
92c75e5
to
87504e7
Compare
b07a86e
to
4726c70
Compare
4726c70
to
e8252bf
Compare
.await?) | ||
} | ||
|
||
async fn prune(from: u64, to: u64, conn: &mut db::Connection<'_>) -> Result<usize> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the to
to to_exclusive
in all implementations?
} = tx_interval(conn, from..to).await?; | ||
|
||
let filter = ev_emit_mod::table | ||
.filter(ev_emit_mod::tx_sequence_number.between(from_tx as i64, to_tx as i64 - 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why - 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The to_tx
is the first tx of the to_exclusive checkpoint and should not be pruned
87504e7
to
4e0e03b
Compare
e8252bf
to
040df88
Compare
9fd872e
to
5ef7c04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some post-land questions/notes!
start: from_epoch, | ||
end: to_epoch, | ||
} = epoch_interval(conn, from..to_exclusive).await?; | ||
if from_epoch < to_epoch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check strictly necessary? If they were equal, the between
would contain conflicting constraints and nothing would be captured by it, right?
The reason I'm asking is that it piqued my interest that this impl had this check, but the previous ones didn't. If there was a correctness reason to have this check (i.e. that we will get some sort of error by having the bounds passed to BETWEEN
inverted), then we are relying on that situation not coming up in practice because we produce checkpoints (and therefore transactions) more often than we prune -- that makes me nervous because we could definitely end up falling foul of this in test scenarios.
If there is not a correctness reason to have this test, then the only other reason I could think of was performance, but this is one of the cheapest tables we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Postgres has an issue with BETWEEN
being inverted, but this was just something explicit to handle a possibility unique to epoch. Given [from, to)
checkpoints, they may both fall under the same epoch. Conceptually we don't want to prune that epoch
pruning from chkpt 0 to 1
from_epoch: 0, to_epoch: 0
skipping because from_epoch >= to_epoch
pruning from chkpt 1 to 2
from_epoch: 0, to_epoch: 1
pruning from 0 to 0
pruning from chkpt 2 to 3
from_epoch: 1, to_epoch: 1
skipping because from_epoch >= to_epoch
pruning from chkpt 3 to 4
from_epoch: 1, to_epoch: 1
skipping because from_epoch >= to_epoch
If you do run a query ... delete between 0 and 0 - 1
Postgres returns no results from execution
// TODO: use tx_interval. `tx_sequence_number` needs to be added to this table, and an index | ||
// created as its primary key is on `tx_digest`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, let's keep this table as it is. Its schema needs to match the schema and indices need to match what is offered in the KV store as much as possible.
Description
Added
prune
implementations for pipeline inside indexer alt schema, built upon Will's cp mapping PR.Test plan
Will add tests.
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.