Make pipeline work with parallel runs #119

stellasia · 2024-09-04T08:05:22Z

Description

Initial issue

The component's results were saved using the component name as key, and the component status were saved in the component directly, which makes the following not behaving as expected (and returning the same result for both 'runs'):

    pipe = Pipeline()
    pipe.add_component(ComponentAdd(), "add")
    run_params = [[1, 20], [10, 2]]
    runs = []
    for a, b in run_params:
        runs.append(pipe.run({"add": {"number1": a, "number2": b}}))
    results = await asyncio.gather(*runs)
    print(result)

(note: this code has been turned into a unit test)

What's in this PR?

This PR introduces a run_id (uuid) in the Orchestrator and this run ID is used to build the key to access the components' status and results.
By doing so, it appeared that the callback was not needed anymore and was removed, all the processing now happen in the Orchestrator.run_task method that can call its own on_task_complete method without need for partial and protocol and such.
In order to make the intermediate results still available after a run is done, this PR introduces the ResultStore, that's able to find the results for a given run ID and a given component. Only InMemoryStore implemented so far. Pipeline.run method also return the run_id in a new PipelineResult object (breaking change).
Not mandatory but possible follow-up PR, that moves the status to the store instead of having them hanging in the Task.

Type of Change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity: Medium

How Has This Been Tested?

Unit tests
E2E tests
Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

Documentation has been updated
Unit tests have been updated
E2E tests have been updated
Examples have been updated
New files have copyright header
CLA (https://neo4j.com/developer/cla/) has been signed
CHANGELOG.md updated if appropriate

…to HEAD

…igation

…to fix/async-pipeline

examples/pipeline/rag.py

src/neo4j_genai/experimental/pipeline/pipeline.py

willtai · 2024-09-06T09:11:42Z

examples/pipeline/rag.py



 class PromptTemplateComponent(Component):
    def __init__(self, prompt: PromptTemplate) -> None:
        self.prompt = prompt

-    async def run(self, query: str, context: list[str]) -> StringDataModel:
+    async def run(self, query: str, context: List[str]) -> ComponentResultDataModel:


Is it possible to use list[str] here?

It doesn't work with python 3.8 when we do the introspection to find the expected inputs.

tests/unit/experimental/pipeline/test_orchestrator.py

…to fix/async-pipeline

willtai

LGTM 🏔️

* Add failing test * Define a "run_id" in Orchestrator - save results per run_id * Make unit test work * Make intermediate results accessible from outside pipeline for investigation * Remove unused imports * Update examples and CHANGELOG * Cleaning: remove deprecated code * Fix ruff * Fix examples * Fix examples again * PR reviews * Removing useless status assignment

stellasia added 23 commits June 25, 2024 09:36

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

67d430c

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

e965499

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

ed0baa7

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

ea232ff

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python in…

43c7b3c

…to HEAD

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

8367daa

Merge remote-tracking branch 'origin/main'

3c3c00e

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

7182523

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

212a5a3

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

32364c6

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

f481025

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python

56435bf

Add failing test

dc2fe93

Define a "run_id" in Orchestrator - save results per run_id

2fb6448

Make unit test work

8d48c5d

Make intermediate results accessible from outside pipeline for invest…

5b6a7e3

…igation

Remove unused imports

84f9b7f

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python in…

a140774

…to fix/async-pipeline

Update examples and CHANGELOG

f7d7d7d

Cleaning: remove deprecated code

fbc8391

Fix ruff

439c5ad

Fix examples

2156c65

Fix examples again

5184688

stellasia marked this pull request as ready for review September 4, 2024 15:12

stellasia requested review from alexthomas93 and willtai September 4, 2024 15:12

willtai reviewed Sep 5, 2024

View reviewed changes

examples/pipeline/rag.py Outdated Show resolved Hide resolved

willtai reviewed Sep 5, 2024

View reviewed changes

examples/pipeline/rag.py Outdated Show resolved Hide resolved

willtai reviewed Sep 5, 2024

View reviewed changes

src/neo4j_genai/experimental/pipeline/pipeline.py Outdated Show resolved Hide resolved

PR reviews

11a47b7

willtai reviewed Sep 6, 2024

View reviewed changes

tests/unit/experimental/pipeline/test_orchestrator.py Outdated Show resolved Hide resolved

stellasia added 2 commits September 6, 2024 11:47

Removing useless status assignment

16c8ff1

Merge branch 'main' of https://github.com/neo4j/neo4j-genai-python in…

d7baf2a

…to fix/async-pipeline

willtai approved these changes Sep 6, 2024

View reviewed changes

stellasia merged commit c284b08 into neo4j:main Sep 8, 2024
11 checks passed

stellasia mentioned this pull request Sep 8, 2024

Async pipeline improvements #123

Merged

15 tasks

stellasia deleted the fix/async-pipeline branch September 16, 2024 08:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make pipeline work with parallel runs #119

Make pipeline work with parallel runs #119

Uh oh!

stellasia commented Sep 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willtai Sep 6, 2024

Uh oh!

stellasia Sep 6, 2024

Uh oh!

Uh oh!

willtai left a comment

Uh oh!

Uh oh!

Uh oh!

Make pipeline work with parallel runs #119

Make pipeline work with parallel runs #119

Uh oh!

Conversation

stellasia commented Sep 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Initial issue

What's in this PR?

Type of Change

Complexity

How Has This Been Tested?

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willtai Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

stellasia Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

willtai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stellasia commented Sep 4, 2024 •

edited

Loading