-
Notifications
You must be signed in to change notification settings - Fork 89
Reset invocation #3186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Reset invocation #3186
Conversation
E2E test restatedev/e2e#377 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @slinkydeveloper. I think this is a very powerful feature that our users will like a lot :-) Before we can merge it we need to make sure that there isn't a problem with modifying the JournalMetadata
while a concurrent retry is happening in the invoker. I suspect that there is a possibility that the invoker might read the JournalMetadata
from before a trim operation and the journal from after a trim operation because it is not using a snapshot of RocksDB. Before this was never a problem because modifying the JournalMetadata
stopped the retry mechanism (via the JournalTracker
) but with the time travel command this might have changed.
1f8141e
to
c9e958a
Compare
Followup PRs will be:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for updating this PR @slinkydeveloper. The changes look good to me. I mainly had minor comments. The ones that need some clarification are the following:
- What's the backwards compatibility story if this feature gets activated in the next released version? Alternatively, we could make the feature opt-in by waiving the right to go back. For the latter, how would this be enforced?
- We seem to always remove signals that are beyond the trim point. Aren't signals used by awakeables to complete them? If yes, can't it happen that we truncate the completion for an awakeable that won't get truncated? If yes, then we might end up being stuck because we might not re-run the logic to tell some other service to complete the awakeable again.
- Maybe it's worth unifying a little bit the terminology we are using. The feature to reset the journal is called "reset". Reset has a "truncate_from" index which gets translated into a
TrimInvocation
command which causes the journal to be rewritten. Maybe we can unify reset/truncate/trim or truncate and trim?
e9fe815
to
1e75ba9
Compare
Will rebase this PR on #3227 when ready. |
After some internal conversations, we agreed on for now merging this feature without reverting state, and give it a try like this. This feature also needs to retain the old journals, something i'll do for #3347 primarily and then re-adapt this PR here. I'm gonna rebase few things, figure out retaining old journals, and then i'll ask another review for this PR. |
1e75ba9
to
4b946c5
Compare
4b946c5
to
1e4d204
Compare
… storing "archived" journals and statuses. These APIs will be used by the following commits.
…vocation/purge journal to selectively remove an invocation epoch
…an invariant that when you remove the last epoch, it removes all the old attempts too. This simplifies a bit the behavior and how we want to show it.
…t is not set. This is a new condition that can happen with reset, because we don't re-migrate the journal table back to v1.
1e4d204
to
a1f8211
Compare
Fix #2890, depends on #2904