Skip to content

Solve issues with mutexes while running tests on MacOS M1 #677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 22, 2025

Conversation

JoanFM
Copy link
Contributor

@JoanFM JoanFM commented May 21, 2025

Describe the changes in the pull request

2 issues are handled in this PR:

1- An abort was identified while running a test related to PyBatchIterator. The abort happened when the test was being shutdown. The error comes from the deleterpassed to the PyBatchIterator shared_ptr which captures the PyIndexby reference. The problem is that there is no guarantee that the PyIndexlifetime will survive the Iterator. In that deleter a mutex is released (which could be in an invalid state). The solution is to have the mutexbe wrapped in a shared_ptr that will share the ownership with the PyBatchIterator deleter as it captures it by value (incrementing its reference count).

2- Deadlock found on tests/flow/test_hnsw_parallel.py::test_parallel_insert_batch_search:

  • Thread #N-1 has the indexGuard locked in shared mode, and tries to recover the GIL in the getNextResult
  • Thread #N tries to acquire indexGuard exclusively, while N-1 has the indexGuard in shared mode. (WAIT)
  • Thread #N+1 is another iterator that want indexGuard in shared mode (It goes in the queue) but cannot get it in shared mode (I guess is System dependant if they would schedule a reader to bypass the writer ) potential starvation. Thread N+1 has the GIL.

This is solved by releasing the GIL in Thread N+1 before asking for indexGuard.

@CLAassistant
Copy link

CLAassistant commented May 21, 2025

CLA assistant check
All committers have signed the CLA.

@JoanFM JoanFM changed the title refactor to use RAII when locking mutexes refactor to use RAII when locking mutexes + solve GIL deadlock May 21, 2025
@JoanFM JoanFM force-pushed the raii-mutex-lock branch from d2eb18a to 5afe288 Compare May 21, 2025 17:23
@JoanFM JoanFM marked this pull request as ready for review May 21, 2025 17:24
@JoanFM JoanFM changed the title refactor to use RAII when locking mutexes + solve GIL deadlock Solve issues with mutexes while running tests on MacOS M1 May 22, 2025
Copy link

codecov bot commented May 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.24%. Comparing base (e4fc3e7) to head (7ac126c).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #677   +/-   ##
=======================================
  Coverage   96.24%   96.24%           
=======================================
  Files         112      112           
  Lines        6278     6278           
=======================================
  Hits         6042     6042           
  Misses        236      236           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@JoanFM JoanFM requested a review from alonre24 May 22, 2025 11:18
@JoanFM JoanFM enabled auto-merge May 22, 2025 13:16
@JoanFM JoanFM added this pull request to the merge queue May 22, 2025
Merged via the queue into main with commit 205f6bc May 22, 2025
23 checks passed
@JoanFM JoanFM deleted the raii-mutex-lock branch May 22, 2025 16:05
Copy link

Backport failed for 0.7, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.7
git worktree add -d .worktree/backport-677-to-0.7 origin/0.7
cd .worktree/backport-677-to-0.7
git switch --create backport-677-to-0.7
git cherry-pick -x 205f6bc2913e19442801cb78350616cf80919306

Copy link

Backport failed for 0.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 0.8
git worktree add -d .worktree/backport-677-to-0.8 origin/0.8
cd .worktree/backport-677-to-0.8
git switch --create backport-677-to-0.8
git cherry-pick -x 205f6bc2913e19442801cb78350616cf80919306

Copy link

Successfully created backport PR for 8.0:

github-actions bot pushed a commit that referenced this pull request May 22, 2025
* refactor to use RAII when locking mutexes

* capture mutex shared pointer by value to ensure validity at del time

* Avoid deadlock involving GIL, and indexGuard

* revert change to barrier and guard scoped locks

* add small comment

(cherry picked from commit 205f6bc)
github-merge-queue bot pushed a commit that referenced this pull request May 26, 2025
Solve issues with mutexes while running tests on MacOS M1  (#677)

* refactor to use RAII when locking mutexes

* capture mutex shared pointer by value to ensure validity at del time

* Avoid deadlock involving GIL, and indexGuard

* revert change to barrier and guard scoped locks

* add small comment

(cherry picked from commit 205f6bc)

Co-authored-by: Joan Fontanals <jfontanalsmartinez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants