Skip to content

Reset invocation #3186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

slinkydeveloper
Copy link
Contributor

@slinkydeveloper slinkydeveloper commented Apr 22, 2025

Fix #2890, depends on #2904

@slinkydeveloper slinkydeveloper changed the title [WIP] Trim and restart Time travel Apr 23, 2025
@slinkydeveloper
Copy link
Contributor Author

E2E test restatedev/e2e#377

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @slinkydeveloper. I think this is a very powerful feature that our users will like a lot :-) Before we can merge it we need to make sure that there isn't a problem with modifying the JournalMetadata while a concurrent retry is happening in the invoker. I suspect that there is a possibility that the invoker might read the JournalMetadata from before a trim operation and the journal from after a trim operation because it is not using a snapshot of RocksDB. Before this was never a problem because modifying the JournalMetadata stopped the retry mechanism (via the JournalTracker) but with the time travel command this might have changed.

@slinkydeveloper
Copy link
Contributor Author

Followup PRs will be:

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for updating this PR @slinkydeveloper. The changes look good to me. I mainly had minor comments. The ones that need some clarification are the following:

  1. What's the backwards compatibility story if this feature gets activated in the next released version? Alternatively, we could make the feature opt-in by waiving the right to go back. For the latter, how would this be enforced?
  2. We seem to always remove signals that are beyond the trim point. Aren't signals used by awakeables to complete them? If yes, can't it happen that we truncate the completion for an awakeable that won't get truncated? If yes, then we might end up being stuck because we might not re-run the logic to tell some other service to complete the awakeable again.
  3. Maybe it's worth unifying a little bit the terminology we are using. The feature to reset the journal is called "reset". Reset has a "truncate_from" index which gets translated into a TrimInvocation command which causes the journal to be rewritten. Maybe we can unify reset/truncate/trim or truncate and trim?

@slinkydeveloper
Copy link
Contributor Author

Will rebase this PR on #3227 when ready.

@slinkydeveloper slinkydeveloper changed the title Reset invocation [WIP] Reset invocation May 26, 2025
@slinkydeveloper
Copy link
Contributor Author

After some internal conversations, we agreed on for now merging this feature without reverting state, and give it a try like this. This feature also needs to retain the old journals, something i'll do for #3347 primarily and then re-adapt this PR here. I'm gonna rebase few things, figure out retaining old journals, and then i'll ask another review for this PR.

@slinkydeveloper slinkydeveloper changed the title [WIP] Reset invocation Reset invocation Jun 5, 2025
@slinkydeveloper slinkydeveloper added the release-blocker Blocker for the next release label Jun 5, 2025
@slinkydeveloper
Copy link
Contributor Author

slinkydeveloper commented Jun 5, 2025

I rebased this PR on #3360, it also contains the changes from #3362 (1:1, so once that one is merged, it should disappear from this diff). This got a bit messy with the different branches :D For this feature specifically, only the last commit matters.

… storing "archived" journals and statuses. These APIs will be used by the following commits.
…vocation/purge journal to selectively remove an invocation epoch
…an invariant that when you remove the last epoch, it removes all the old attempts too. This simplifies a bit the behavior and how we want to show it.
@slinkydeveloper slinkydeveloper removed the release-blocker Blocker for the next release label Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trimming journal
3 participants