Parallelize query adapter code #154

ThomasDelsart · 2025-06-04T10:20:21Z

Related to this issue: #152

Copilot

Pull Request Overview

This PR parallelizes heavy per-evaluation loops in both the query adapter updater and evaluation runner by adding a max_workers parameter and using ThreadPoolExecutor to process each eval concurrently.

Adds max_workers arguments to update_query_adapter and answer_evals to control parallelism.
Introduces process_eval helper functions with thread-local sessions to handle per-eval work.
Replaces sequential loops over evals with futures, progress bars, and result aggregation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/raglite/_query_adapter.py	Parallelize `update_query_adapter` by spawning threads per eval
src/raglite/_eval.py	Parallelize `answer_evals` by spawning threads per eval

Comments suppressed due to low confidence (3)

src/raglite/_query_adapter.py:47

The new max_workers parameter should be documented in the function's docstring (describe how it controls concurrency).

max_workers: int | None = None,

src/raglite/_query_adapter.py:195

Variable N is not defined in process_eval before being passed to _optimize_query_target, leading to a NameError. You need to build N (negative embeddings) as you did for P.

t = _optimize_query_target(q, P, N, α=optimize_gap)

src/raglite/_eval.py:198

The new parallel branch controlled by max_workers is not covered by existing tests; add tests for answer_evals with max_workers > 1 to ensure correct ordering and error handling.

max_workers: int | None = None,

Copilot · 2025-06-04T10:22:39Z

src/raglite/_query_adapter.py

+    with ThreadPoolExecutor(max_workers=max_workers) as executor:
+        # Submit all tasks with their original index
+        future_to_eval_idx = {
+            executor.submit(process_eval, eval_): (eval_, idx) 


[nitpick] Calling create_database_engine inside each thread for every eval can be expensive; consider creating the engine once outside and passing a shared session factory to avoid repeated engine initialization.

Suggested change

executor.submit(process_eval, eval_): (eval_, idx)

executor.submit(process_eval, eval_, engine): (eval_, idx)

lsorber · 2025-06-06T20:24:04Z

Superseded by #157.

ThomasDelsart added 3 commits June 4, 2025 12:04

Concurrent execution for answer_evals function

040b7c4

Update query adapter concurrently

0784885

Ensure float32 for correct linear algebra operations

46e70c3

ThomasDelsart requested review from Copilot and lsorber June 4, 2025 10:20

Copilot AI reviewed Jun 4, 2025

View reviewed changes

lsorber closed this Jun 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parallelize query adapter code #154

Parallelize query adapter code #154

Uh oh!

ThomasDelsart commented Jun 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jun 4, 2025

Uh oh!

lsorber commented Jun 6, 2025

Uh oh!

Uh oh!

	executor.submit(process_eval, eval_): (eval_, idx)
	executor.submit(process_eval, eval_, engine): (eval_, idx)

Parallelize query adapter code #154

Parallelize query adapter code #154

Uh oh!

Conversation

ThomasDelsart commented Jun 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

lsorber commented Jun 6, 2025

Uh oh!

Uh oh!