Skip to content

[controller] admin tool command to clean execution IDs #1730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

arjun4084346
Copy link
Contributor

@arjun4084346 arjun4084346 commented Apr 25, 2025

Problem Statement

We store the successful execution IDs for each store in ZK path //executionIds/succeededPerStore. This is like a checkpoint till what point we have successfully processed the admin messages for each store.
But when we delete a store, we do not clean this map. This can result into the size of this ZNode grow beyond the allowed limit of 1MB.
At this point, no admin messages can be updated into this, hence cannot be consumed, for ANY store in that cluster.

Solution

This admin tool command will remove the entries for the deleted stores and help to reduce the size of this ZNode.
However, still it can reach to the max size and hence puts a limit to how many stores we can have in a venice cluster. Also note that there is a limit to how many children a ZNode can have (100K) which also limits the number of venice stores in a cluster.

Code changes

  • Added new code behind a config. If so list the config names and their default values in the PR description.
  • Introduced new log lines.
    • Confirmed if logs need to be rate limited to avoid excessive logging.

Concurrency-Specific Checks

Both reviewer and PR author to verify

  • Code has no race conditions or thread safety issues.
  • Proper synchronization mechanisms (e.g., synchronized, RWLock) are used where needed.
  • No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
  • Verified thread-safe collections are used (e.g., ConcurrentHashMap, CopyOnWriteArrayList).
  • Validated proper exception handling in multi-threaded code to avoid silent thread termination.

How was this PR tested?

  • New unit tests added.
  • New integration tests added.
  • Modified or extended existing tests.
  • Verified backward compatibility (if applicable).

Does this PR introduce any user-facing or breaking changes?

  • No. You can skip the rest of this section.
  • Yes. Clearly explain the behavior change and its impact.

@arjun4084346 arjun4084346 changed the title add test [controller] admin tool command to clean execution IDs Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants