Skip to content

Commit cc48eef

Browse files
stellasiawilltaialexthomas93
authored
Feature/kg builder (neo4j#91)
* Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Added a TextChunkEmbedder (neo4j#87) * Added a TextChunkEmbedder * Added the copyright header to test_embedder.py * Updated test_text_chunk_embedder_run * Adds a knowledge graph writer (neo4j#83) * Added copyright header to new files * Added copyright header to kg_writer.py * Added __future__ import to kg_writer.py for backwards compatibility of type hints * Added E2E test for Neo4jWriter * Added a copyright header to test_kg_builder_e2e.py * Added upsert_vector test for relationship embeddings * Moved KG writer and its tests * Moved Neo4jGraph and associated objects to a new file * Renamed KG builder fixture * Added unit tests for KG writer * Split upsert_vector into 2 functions * Fixed broken cypher query strings * Removed embedding creation from Neo4jWriter * Fixed setup_neo4j_for_kg_construction fixture * Added KGWriterModel class * Fixed minor mistake in test_weaviate_e2e.py * Renamed kg_construction folder to components * Updated unit tests with new folder structure * Fixed broken import * Fixed copyright headers * Added missing docstrings * Fixed typo * Add documentation for pipeline exceptions (neo4j#90) * Fixes and refactors the KG writer component (neo4j#92) * Fixes and refactors the KG writer component * Fixed mypy error * Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY * Add schema for kg builder (neo4j#88) * Add schema for kg builder and tests * Fixed mypy checks * Reverted kg builder example with schema * Revert to List and Dict due to Python3.8 issue with using get_type_hints * Added properties to Entity and Relation * Add test for missing properties * Fix type annotations in test * Add property types * Refactored entity, relation, and property types * Unused import * Moved schema to components/ (neo4j#96) * Add entity / Relation extraction component (neo4j#85) * Pipeline (neo4j#81) * First draft of pipeline/component architecture. Example using the RAG pipeline. * More complex implementation of pipeline to deal with branching and aggregations - no async yet * Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default * Test RAG with new Pipeline implementation * File refactoring * Pipeline orchestration with async support * Import sorting * Pipeline rerun + exception on cyclic graph (for now) * Mypy * Python version compat * Rename process->run for Components for consistency with Pipeline * Move components test in the example folder - add some tests * Race condition fix - documentation - ruff * Fix import sorting * mypy on tests * Mark test as async * Tests were not testing... * Ability to create Pipeline templates * Ruff * Future + header * Renaming + update import structure to make it more compatible with rest of the repo * Check input parameters before starting the pipeline * Introduce output model for component - Validate pipeline before running - More unit tests * Import.. * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... * Finally installed pre-commit hooks... and struggling with pydantic.. * Mypy on examples * Add missing header * Update doc * Fix import in doc * Update changelog * Update docs/source/user_guide_pipeline.rst Co-authored-by: willtai <wtaisen@gmail.com> * Refactor tests folder to match src structure * Move exceptions to separate file and rename them to make it clearer they are related to pipeline * Mypy * Rename def => config * Introduce generic type to remove most of the "type: ignore" comments * Remove unnecessary comment * Ruff * Document and test is_cyclic method * Remove find_all method from store (simplify data retrieval) * value is not a list anymore (or, if it is, it's on purpose) * Remove comments, fix example in doc * Remove core directory - move files to /pipeline * Expose stores from pipeline subpackage * Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input * Component subclasses can return DataModel * Add note on async + schema to illustrate parameter propagation --------- Co-authored-by: willtai <wtaisen@gmail.com> * Entity / Relation extraction component * Adds a Text Splitter (neo4j#82) * Added text splitter adapter class * Added copyright header to new files * Added __future__ import to text_splitters.py for backwards compatibility of type hints * Moved text splitter file and tests * Split text splitter adapter into 2 adapters * Added optional metadata to text chunks * Fixed typos * Moved text splitters inside of the components folder * Fixed Component import * Add tests * Keep it simple: remove deps to jinja for now * Update example with existing components * log config in example * Fix tests * Rm unused import * Add copyright headers * Rm debug code * Try and fix tests * Unused import * get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions * Return model is also conditioned to the existence of the run method => should raise an error if run is not implemented? * Log when we do not raise exception to keep track of the failure * Update prompt to match new KGwriter expected type * Fix test * Fix type for `examples` * Use SchemaConfig as input for the ER Extractor component * The "base" EntityRelationExtractor is an ABC that must be subclassed * Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp * Option to build lexical graph in the ERExtractor component * Fix one test * Fix some more tests * Fix some more tests * Remove "type: ignore" comments --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> * Update lock file after merge * Remove pipeline/components folder (again) * Updated component docs (neo4j#99) * Updated component docs * Removed weaviate test update * Updated pipeline user guide with link to components in the API section * Feature/kg builder e2e tests (neo4j#98) * End to end tests for KG builder pipeline * Adding chunk embedder to the pipeline and e2e tests * Fix how the chunk embedding is saved * Fix e2e tests * Fix mypy * mypy stuff :'( * WIP: update e2e tests * Check counts also here * Enable e2e tests on this PR only * Fix e2e tests (was not mocking the correct method for Embedder) * Revert CI to normal * Updated CHANGLOG and set max-parallel: 1 for E2E tests in pr-e2e-tests.yaml --------- Co-authored-by: willtai <wtaisen@gmail.com> Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com> Co-authored-by: willtai <william.tai@neo4j.com>
1 parent 242c77c commit cc48eef

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+6498
-1204
lines changed

.github/workflows/pr-e2e-tests.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ concurrency:
1313
jobs:
1414
e2e-tests:
1515
runs-on: ubuntu-latest
16+
strategy:
17+
max-parallel: 1
1618
strategy:
1719
matrix:
1820
python-version: ['3.8', '3.12']

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@
66

77
### Added
88
- Add optional custom_prompt arg to the Text2CypherRetriever class.
9+
- Introduced support for Component/Pipeline flexible architecture.
10+
- Added new components for knowledge graph construction, including text splitters, schema builders, entity-relation extractors, and Neo4j writers.
11+
- Implemented end-to-end tests for the new knowledge graph builder pipeline.
912

1013
### Changed
1114
- `GraphRAG.search` method first parameter has been renamed `query_text` (was `query`) for consistency with the retrievers interface.

docs/source/api.rst

Lines changed: 100 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,74 @@
33
API Documentation
44
#################
55

6+
.. _components-section:
7+
8+
**********
9+
Components
10+
**********
11+
12+
KGWriter
13+
========
14+
15+
.. autoclass:: neo4j_genai.components.kg_writer.KGWriter
16+
:members: run
17+
18+
Neo4jWriter
19+
===========
20+
21+
.. autoclass:: neo4j_genai.components.kg_writer.Neo4jWriter
22+
:members: run
23+
24+
TextSplitter
25+
============
26+
27+
.. autoclass:: neo4j_genai.components.text_splitters.base.TextSplitter
28+
:members: run
29+
30+
LangChainTextSplitterAdapter
31+
============================
32+
33+
.. autoclass:: neo4j_genai.components.text_splitters.langchain.LangChainTextSplitterAdapter
34+
:members: run
35+
36+
LlamaIndexTextSplitterAdapter
37+
=============================
38+
39+
.. autoclass:: neo4j_genai.components.text_splitters.llamaindex.LlamaIndexTextSplitterAdapter
40+
:members: run
41+
42+
TextChunkEmbedder
43+
=================
44+
45+
.. autoclass:: neo4j_genai.components.embedder.TextChunkEmbedder
46+
:members: run
47+
48+
SchemaBuilder
49+
=============
50+
51+
.. autoclass:: neo4j_genai.components.schema.SchemaBuilder
52+
:members: run
53+
54+
EntityRelationExtractor
55+
=======================
56+
57+
.. autoclass:: neo4j_genai.components.entity_relation_extractor.EntityRelationExtractor
58+
:members: run
59+
60+
LLMEntityRelationExtractor
61+
==========================
62+
63+
.. autoclass:: neo4j_genai.components.entity_relation_extractor.LLMEntityRelationExtractor
64+
:members: run
65+
666
.. _retrievers-section:
767

868
**********
969
Retrievers
1070
**********
1171

1272
RetrieverInterface
13-
===================
73+
==================
1474

1575
.. autoclass:: neo4j_genai.retrievers.base.Retriever
1676
:members:
@@ -70,39 +130,39 @@ PineconeNeo4jRetriever
70130
:members: search
71131

72132

73-
**********
133+
********
74134
Embedder
75-
**********
135+
********
76136

77137
.. autoclass:: neo4j_genai.embedder.Embedder
78138
:members:
79139

80140
SentenceTransformerEmbeddings
81141
================================
82142

83-
.. autoclass:: neo4j_genai.embeddings.SentenceTransformerEmbeddings
143+
.. autoclass:: neo4j_genai.embeddings.sentence_transformers.SentenceTransformerEmbeddings
84144
:members:
85145

86146
**********
87147
Generation
88148
**********
89149

90150
LLMInterface
91-
======================
151+
============
92152

93153
.. autoclass:: neo4j_genai.llm.LLMInterface
94154
:members:
95155

96156

97157
OpenAILLM
98-
======================
158+
=========
99159

100160
.. autoclass:: neo4j_genai.llm.OpenAILLM
101161
:members:
102162

103163

104164
PromptTemplate
105-
======================
165+
==============
106166

107167
.. autoclass:: neo4j_genai.generation.prompts.PromptTemplate
108168
:members:
@@ -125,6 +185,8 @@ Database Interaction
125185

126186
.. autofunction:: neo4j_genai.indexes.upsert_vector
127187

188+
.. autofunction:: neo4j_genai.indexes.upsert_vector_on_relationship
189+
128190

129191
******
130192
Errors
@@ -157,6 +219,12 @@ Errors
157219

158220
* :class:`neo4j_genai.exceptions.LLMGenerationError`
159221

222+
* :class:`neo4j_genai.pipeline.exceptions.PipelineDefinitionError`
223+
224+
* :class:`neo4j_genai.pipeline.exceptions.PipelineMissingDependencyError`
225+
226+
* :class:`neo4j_genai.pipeline.exceptions.PipelineStatusUpdateError`
227+
160228

161229
Neo4jGenAiError
162230
===============
@@ -222,7 +290,7 @@ Neo4jVersionError
222290

223291

224292
Text2CypherRetrievalError
225-
==========================
293+
=========================
226294

227295
.. autoclass:: neo4j_genai.exceptions.Text2CypherRetrievalError
228296
:show-inheritance:
@@ -236,21 +304,42 @@ SchemaFetchError
236304

237305

238306
RagInitializationError
239-
==========================
307+
======================
240308

241309
.. autoclass:: neo4j_genai.exceptions.RagInitializationError
242310
:show-inheritance:
243311

244312

245313
PromptMissingInputError
246-
==========================
314+
=======================
247315

248316
.. autoclass:: neo4j_genai.exceptions.PromptMissingInputError
249317
:show-inheritance:
250318

251319

252320
LLMGenerationError
253-
==========================
321+
==================
254322

255323
.. autoclass:: neo4j_genai.exceptions.LLMGenerationError
256324
:show-inheritance:
325+
326+
327+
PipelineDefinitionError
328+
=======================
329+
330+
.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineDefinitionError
331+
:show-inheritance:
332+
333+
334+
PipelineMissingDependencyError
335+
==============================
336+
337+
.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineMissingDependencyError
338+
:show-inheritance:
339+
340+
341+
PipelineStatusUpdateError
342+
=========================
343+
344+
.. autoclass:: neo4j_genai.pipeline.exceptions.PipelineStatusUpdateError
345+
:show-inheritance:

docs/source/index.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ Python versions supported:
3030
Topics
3131
******
3232

33-
+ :ref:`user-guide`
33+
+ :ref:`user-guide-rag`
34+
+ :ref:`user-guide-pipeline`
3435
+ :ref:`api-documentation`
3536
+ :ref:`types-documentation`
3637

@@ -39,7 +40,8 @@ Topics
3940
:caption: Contents:
4041
:hidden:
4142

42-
user_guide.rst
43+
user_guide_rag.rst
44+
user_guide_pipeline.rst
4345
api.rst
4446
types.rst
4547

docs/source/types.rst

Lines changed: 55 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,30 +5,80 @@ Types
55
*****
66

77
RawSearchResult
8-
==================
8+
===============
99

1010
.. autoclass:: neo4j_genai.types.RawSearchResult
1111

1212

1313
RetrieverResult
14-
==================
14+
===============
1515

1616
.. autoclass:: neo4j_genai.types.RetrieverResult
1717

1818

1919
RetrieverResultItem
20-
====================
20+
===================
2121

2222
.. autoclass:: neo4j_genai.types.RetrieverResultItem
2323

2424

2525
LLMResponse
26-
====================
26+
===========
2727

2828
.. autoclass:: neo4j_genai.llm.types.LLMResponse
2929

3030

3131
RagResultModel
32-
====================
32+
==============
3333

3434
.. autoclass:: neo4j_genai.generation.types.RagResultModel
35+
36+
TextChunk
37+
=========
38+
39+
.. autoclass:: neo4j_genai.components.types.TextChunk
40+
41+
TextChunks
42+
==========
43+
44+
.. autoclass:: neo4j_genai.components.types.TextChunks
45+
46+
Neo4jNode
47+
=========
48+
49+
.. autoclass:: neo4j_genai.components.types.Neo4jNode
50+
51+
Neo4jRelationship
52+
=================
53+
54+
.. autoclass:: neo4j_genai.components.types.Neo4jRelationship
55+
56+
Neo4jGraph
57+
==========
58+
59+
.. autoclass:: neo4j_genai.components.types.Neo4jGraph
60+
61+
KGWriterModel
62+
=============
63+
64+
.. autoclass:: neo4j_genai.components.kg_writer.KGWriterModel
65+
66+
SchemaProperty
67+
==============
68+
69+
.. autoclass:: neo4j_genai.components.schema.SchemaProperty
70+
71+
SchemaEntity
72+
============
73+
74+
.. autoclass:: neo4j_genai.components.schema.SchemaEntity
75+
76+
SchemaRelation
77+
==============
78+
79+
.. autoclass:: neo4j_genai.components.schema.SchemaEntity
80+
81+
SchemaConfig
82+
============
83+
84+
.. autoclass:: neo4j_genai.components.schema.SchemaConfig

0 commit comments

Comments
 (0)