Skip to content

[ENH]: Leader election for SysDB #5104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2025
Merged

[ENH]: Leader election for SysDB #5104

merged 1 commit into from
Jul 17, 2025

Conversation

tanujnay112
Copy link
Contributor

@tanujnay112 tanujnay112 commented Jul 14, 2025

Description of changes

This change introduces leader election in the SysDB service in preparation to scale out this layer. After this change, only the leader monitors and updates the member list. Not that the leader election in question just reuses the Kubernetes leader election API so it is not strongly consistent and can lead to split-brain scenarios. These split-brain scenarios are not harmful here.

Since this change reuses the leader election code that's already in the log service, that code has been moved out of the log service directory.

  • Improvements & Bug fixes
    • Gated member list service operations by a leader election.
  • New functionality
    • Described above. This should enable the SysDB service to scale out.

Test plan

Observing log statements on local tilt.

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

N/A. Before scaling out the SysDB layer, make sure this change is in the SysDB binary.

Observability plan

N/A

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@tanujnay112 tanujnay112 changed the title [ENH] Leader election for sysdb [ENH] Leader election for SysDB Jul 14, 2025
Copy link

⚠️ The Helm chart was updated without a version bump. Your changes will only be published if the version field in k8s/distributed-chroma/Chart.yaml is updated.

@tanujnay112 tanujnay112 marked this pull request as ready for review July 14, 2025 22:40
@tanujnay112 tanujnay112 changed the title [ENH] Leader election for SysDB [ENH]: Leader election for SysDB Jul 14, 2025
@tanujnay112 tanujnay112 requested a review from sanketkedia July 14, 2025 22:40
Copy link
Contributor

propel-code-bot bot commented Jul 14, 2025

Leader Election Support for SysDB and Codebase Refactor

This PR implements leader election in the SysDB service using the Kubernetes leader election API, ensuring that only the elected leader performs memberlist management. To enable this, the previously log-service-specific leader election code is refactored and moved to a shared package, with corresponding changes across Go service wiring and Kubernetes deployment files. Additional RBAC and environment variable settings are introduced for correct operation in Kubernetes, and the lease-watcher role is separated into its own YAML manifest.

Key Changes

• Added leader election for SysDB via shared 'pkg/leader' (moved from log service code).
• Refactored SysDB GRPC server startup to gate memberlist management on leadership acquisition.
• Modified Kubernetes manifests for sysdb: added POD_NAME and POD_NAMESPACE env vars and dedicated lease-watcher Role and bindings.
• Updated logservice and sysdb deployments to align with new leader election logic.
• RBAC: Split lease-watcher Role from logservice.yaml into a reusable lease-watcher-role.yaml.

Affected Areas

• SysDB server initialization and lifecycle (Go)
• Leader election utility code
• Kubernetes deployment YAML for SysDB and logservice
• RBAC and service account configuration

This summary was automatically generated by @propel-code-bot

@tanujnay112 tanujnay112 force-pushed the sysdb_leader_election branch from 3d5d50c to 41a9205 Compare July 15, 2025 22:57
@tanujnay112 tanujnay112 force-pushed the sysdb_leader_election branch from 41a9205 to f5f71a3 Compare July 15, 2025 23:00
@tanujnay112 tanujnay112 requested a review from sanketkedia July 16, 2025 18:24
Copy link
Contributor

@sanketkedia sanketkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should sync up with Evan/Jason on how to deploy the newly created/updated helm charts

@tanujnay112 tanujnay112 merged commit 65ad4c2 into main Jul 17, 2025
58 checks passed
@tanujnay112 tanujnay112 mentioned this pull request Jul 18, 2025
1 task
jasonvigil pushed a commit that referenced this pull request Jul 18, 2025
## Description of changes

Bump helm chart. This should have been done as a part
[of](#5104).

## Test plan

N/A

- [ ] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan
N/A

## Observability plan

N/A

## Documentation Changes

N/A
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants