Skip to content

SOLR-17704: Refactor and use curator for distributed locks #3345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

heythm
Copy link
Contributor

@heythm heythm commented May 10, 2025

https://issues.apache.org/jira/browse/SOLR-17704

Description

This PR refactors Solr's distributed lock implementation to use Curator's ZK primitives for all ZooKeeper operations, while preserving the legacy Solr lock semantics. The new implementation, CuratorDistributedLocks, replaces the previous use of Curator's high-level lock recipes and closely matches the behavior of the old ZkDistributedLock, ensuring that read locks are only blocked by lower-numbered write locks and write locks are blocked by any lower-numbered lock. The lock acquisition strategy is documented and explained in the code.

Solution

  • Introduced a new CuratorDistributedLocks class that uses Curator's CuratorFramework for all ZK operations (node creation, deletion, children listing, and watchers).
  • The lock acquisition logic matches the legacy Solr semantics:
    • Read locks only block on lower-numbered write locks. Write locks block on any lower-numbered lock.
    • The implementation does not sort children or only watch the immediate predecessor, but instead watches the first blocking node found, which may cause more wakeups but is simpler and matches Solr's historical behavior.
  • Updated the lock factories to use this new implementation and removed the old nested lock logic
  • Improved documentation and code comments to clarify the design and trade-offs.

Tests

  • Existing distributed lock tests (e.g., CuratorDistributedLockFactoryTest) were run and pass, confirming that the new implementation preserves the expected semantics.
  • Manual verification that lock acquisition and release behave as before, including correct exception handling when calling isAcquired() after release().
  • No regressions observed in distributed locking behavior.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@github-actions github-actions bot removed documentation Improvements or additions to documentation dependencies Dependency upgrades test-framework labels May 10, 2025
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing, Haythem!

I noticed that you renamed the classes to replace "Zk" with "Curator". But Curator is more of an implementation choice. Fundamentally, the locks are in ZooKeeper; the existing names made sense. As Solr uses Curator more and more, I don't think we need to follow this renaming pattern of Curator showing up in class names.

Overall, this PR isn't what I expected. I expected to see usage of Curator recipes that would thus save us from maintaining lower level code (whether it be ZK level of Curator "framework" level). But I don't see that at all; although a comment hints at it but I don't see it in the code. I just see some basic swap-outs of "framework" level Curator which is basically 1:1 to ZooKeeper, with barely any value-add. Consequently, there isn't much point to this PR as it stands. I take responsibility for not underscoring the intent in the vague JIRA description.

The PR description says a lot; it led me to believe that there is much more going on here than there is. It said things that were probably true previously. I suspect an LLM wrote it; and the LLM didn't recognize the refactoring nature here. Even if my hunch is correct, you are always responsible for the LLM's output, be it code, PRs, or documentation as if you wrote it yourself. If you had included an LLM generated disclaimer, with a horizontal line separating your commentary from the LLM, then that would help readers know the trustworthiness of the parts of the PR text.

@@ -47,7 +47,8 @@ public void monothreadedApiLockTests() throws Exception {
.build()) {
ConfigSetApiLockFactory apiLockFactory =
new ConfigSetApiLockFactory(
new ZkDistributedConfigSetLockFactory(zkClient, "/apiLockTestRoot"));
new CuratorDistributedConfigSetLockFactory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it was reasonable to pass SolrZkClient; consider as a convenience constructor. This relates to my view as Curator as an implementation detail.

Comment on lines +37 to +39
* <p>Each config set has its own lock directory under the root path. This is distinct from the
* collection lock factory, which supports hierarchical locking for collections, shards, and
* replicas.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to say this at all; I wouldn't.

* znode hierarchy, for use with the distributed Collection API implementation. The locks are
* implemented using ephemeral nodes placed below the "directory" nodes.
* A distributed lock factory for Solr collections, shards, and replicas, using Apache Curator's
* {@link org.apache.curator.framework.recipes.locks.InterProcessReadWriteLock} to manage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the usage of InterProcessReadWriteLock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants