-
Notifications
You must be signed in to change notification settings - Fork 103
Commit 4c36f6c
KG builder components user guide (#102)
* Pipeline (#81)
* First draft of pipeline/component architecture. Example using the RAG pipeline.
* More complex implementation of pipeline to deal with branching and aggregations - no async yet
* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default
* Test RAG with new Pipeline implementation
* File refactoring
* Pipeline orchestration with async support
* Import sorting
* Pipeline rerun + exception on cyclic graph (for now)
* Mypy
* Python version compat
* Rename process->run for Components for consistency with Pipeline
* Move components test in the example folder - add some tests
* Race condition fix - documentation - ruff
* Fix import sorting
* mypy on tests
* Mark test as async
* Tests were not testing...
* Ability to create Pipeline templates
* Ruff
* Future + header
* Renaming + update import structure to make it more compatible with rest of the repo
* Check input parameters before starting the pipeline
* Introduce output model for component - Validate pipeline before running - More unit tests
* Import..
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks... and struggling with pydantic..
* Mypy on examples
* Add missing header
* Update doc
* Fix import in doc
* Update changelog
* Update docs/source/user_guide_pipeline.rst
Co-authored-by: willtai <wtaisen@gmail.com>
* Refactor tests folder to match src structure
* Move exceptions to separate file and rename them to make it clearer they are related to pipeline
* Mypy
* Rename def => config
* Introduce generic type to remove most of the "type: ignore" comments
* Remove unnecessary comment
* Ruff
* Document and test is_cyclic method
* Remove find_all method from store (simplify data retrieval)
* value is not a list anymore (or, if it is, it's on purpose)
* Remove comments, fix example in doc
* Remove core directory - move files to /pipeline
* Expose stores from pipeline subpackage
* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input
* Component subclasses can return DataModel
* Add note on async + schema to illustrate parameter propagation
---------
Co-authored-by: willtai <wtaisen@gmail.com>
* Pipeline (#81)
* First draft of pipeline/component architecture. Example using the RAG pipeline.
* More complex implementation of pipeline to deal with branching and aggregations - no async yet
* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default
* Test RAG with new Pipeline implementation
* File refactoring
* Pipeline orchestration with async support
* Import sorting
* Pipeline rerun + exception on cyclic graph (for now)
* Mypy
* Python version compat
* Rename process->run for Components for consistency with Pipeline
* Move components test in the example folder - add some tests
* Race condition fix - documentation - ruff
* Fix import sorting
* mypy on tests
* Mark test as async
* Tests were not testing...
* Ability to create Pipeline templates
* Ruff
* Future + header
* Renaming + update import structure to make it more compatible with rest of the repo
* Check input parameters before starting the pipeline
* Introduce output model for component - Validate pipeline before running - More unit tests
* Import..
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks... and struggling with pydantic..
* Mypy on examples
* Add missing header
* Update doc
* Fix import in doc
* Update changelog
* Update docs/source/user_guide_pipeline.rst
Co-authored-by: willtai <wtaisen@gmail.com>
* Refactor tests folder to match src structure
* Move exceptions to separate file and rename them to make it clearer they are related to pipeline
* Mypy
* Rename def => config
* Introduce generic type to remove most of the "type: ignore" comments
* Remove unnecessary comment
* Ruff
* Document and test is_cyclic method
* Remove find_all method from store (simplify data retrieval)
* value is not a list anymore (or, if it is, it's on purpose)
* Remove comments, fix example in doc
* Remove core directory - move files to /pipeline
* Expose stores from pipeline subpackage
* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input
* Component subclasses can return DataModel
* Add note on async + schema to illustrate parameter propagation
---------
Co-authored-by: willtai <wtaisen@gmail.com>
* Adds a Text Splitter (#82)
* Added text splitter adapter class
* Added copyright header to new files
* Added __future__ import to text_splitters.py for backwards compatibility of type hints
* Moved text splitter file and tests
* Split text splitter adapter into 2 adapters
* Added optional metadata to text chunks
* Fixed typos
* Moved text splitters inside of the components folder
* Fixed Component import
* Added a TextChunkEmbedder (#87)
* Added a TextChunkEmbedder
* Added the copyright header to test_embedder.py
* Updated test_text_chunk_embedder_run
* Adds a knowledge graph writer (#83)
* Added copyright header to new files
* Added copyright header to kg_writer.py
* Added __future__ import to kg_writer.py for backwards compatibility of type hints
* Added E2E test for Neo4jWriter
* Added a copyright header to test_kg_builder_e2e.py
* Added upsert_vector test for relationship embeddings
* Moved KG writer and its tests
* Moved Neo4jGraph and associated objects to a new file
* Renamed KG builder fixture
* Added unit tests for KG writer
* Split upsert_vector into 2 functions
* Fixed broken cypher query strings
* Removed embedding creation from Neo4jWriter
* Fixed setup_neo4j_for_kg_construction fixture
* Added KGWriterModel class
* Fixed minor mistake in test_weaviate_e2e.py
* Renamed kg_construction folder to components
* Updated unit tests with new folder structure
* Fixed broken import
* Fixed copyright headers
* Added missing docstrings
* Fixed typo
* Add documentation for pipeline exceptions (#90)
* Start documentation for KG construction pipeline
* Fixes and refactors the KG writer component (#92)
* Fixes and refactors the KG writer component
* Fixed mypy error
* Made start_node_id and end_node_id parameters in UPSERT_RELATIONSHIP_QUERY
* Add schema for kg builder (#88)
* Add schema for kg builder and tests
* Fixed mypy checks
* Reverted kg builder example with schema
* Revert to List and Dict due to Python3.8 issue with using get_type_hints
* Added properties to Entity and Relation
* Add test for missing properties
* Fix type annotations in test
* Add property types
* Refactored entity, relation, and property types
* Unused import
* Moved schema to components/ (#96)
* Add entity / Relation extraction component (#85)
* Pipeline (#81)
* First draft of pipeline/component architecture. Example using the RAG pipeline.
* More complex implementation of pipeline to deal with branching and aggregations - no async yet
* Introduce Store to add flexibility as where to store pipeline results - Only return the leaf components results by default
* Test RAG with new Pipeline implementation
* File refactoring
* Pipeline orchestration with async support
* Import sorting
* Pipeline rerun + exception on cyclic graph (for now)
* Mypy
* Python version compat
* Rename process->run for Components for consistency with Pipeline
* Move components test in the example folder - add some tests
* Race condition fix - documentation - ruff
* Fix import sorting
* mypy on tests
* Mark test as async
* Tests were not testing...
* Ability to create Pipeline templates
* Ruff
* Future + header
* Renaming + update import structure to make it more compatible with rest of the repo
* Check input parameters before starting the pipeline
* Introduce output model for component - Validate pipeline before running - More unit tests
* Import..
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks...
* Finally installed pre-commit hooks... and struggling with pydantic..
* Mypy on examples
* Add missing header
* Update doc
* Fix import in doc
* Update changelog
* Update docs/source/user_guide_pipeline.rst
Co-authored-by: willtai <wtaisen@gmail.com>
* Refactor tests folder to match src structure
* Move exceptions to separate file and rename them to make it clearer they are related to pipeline
* Mypy
* Rename def => config
* Introduce generic type to remove most of the "type: ignore" comments
* Remove unnecessary comment
* Ruff
* Document and test is_cyclic method
* Remove find_all method from store (simplify data retrieval)
* value is not a list anymore (or, if it is, it's on purpose)
* Remove comments, fix example in doc
* Remove core directory - move files to /pipeline
* Expose stores from pipeline subpackage
* Ability to pass the full output of one component to the next one - useful when a component accepts a pydantic model as input
* Component subclasses can return DataModel
* Add note on async + schema to illustrate parameter propagation
---------
Co-authored-by: willtai <wtaisen@gmail.com>
* Entity / Relation extraction component
* Adds a Text Splitter (#82)
* Added text splitter adapter class
* Added copyright header to new files
* Added __future__ import to text_splitters.py for backwards compatibility of type hints
* Moved text splitter file and tests
* Split text splitter adapter into 2 adapters
* Added optional metadata to text chunks
* Fixed typos
* Moved text splitters inside of the components folder
* Fixed Component import
* Add tests
* Keep it simple: remove deps to jinja for now
* Update example with existing components
* log config in example
* Fix tests
* Rm unused import
* Add copyright headers
* Rm debug code
* Try and fix tests
* Unused import
* get_type_hints is failing for python 3.8/3.9, even when using __future__ annotations => back to the typing.Dict annotation which is compatible with all python versions
* Return model is also conditioned to the existence of the run method
=> should raise an error if run is not implemented?
* Log when we do not raise exception to keep track of the failure
* Update prompt to match new KGwriter expected type
* Fix test
* Fix type for `examples`
* Use SchemaConfig as input for the ER Extractor component
* The "base" EntityRelationExtractor is an ABC that must be subclassed
* Make node IDs unique across several runs of the pipeline by prefixing them with a timestamp
* Option to build lexical graph in the ERExtractor component
* Fix one test
* Fix some more tests
* Fix some more tests
* Remove "type: ignore" comments
---------
Co-authored-by: willtai <wtaisen@gmail.com>
Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com>
* Update lock file after merge
* Remove pipeline/components folder (again)
* Updated component docs (#99)
* Updated component docs
* Removed weaviate test update
* Updated pipeline user guide with link to components in the API section
* Feature/kg builder e2e tests (#98)
* End to end tests for KG builder pipeline
* Adding chunk embedder to the pipeline and e2e tests
* Fix how the chunk embedding is saved
* Fix e2e tests
* Fix mypy
* mypy stuff :'(
* WIP: update e2e tests
* Check counts also here
* Enable e2e tests on this PR only
* Fix e2e tests (was not mocking the correct method for Embedder)
* Revert CI to normal
* User guide for KG builder pipeline
* Update line length
* Review comments 1
* Address review comments - add missing file (image)
* Nicer lists
---------
Co-authored-by: willtai <wtaisen@gmail.com>
Co-authored-by: Alex Thomas <alexthomas93@users.noreply.github.com>
Co-authored-by: willtai <william.tai@neo4j.com>1 parent 8b6cf43 commit 4c36f6cCopy full SHA for 4c36f6c
File tree
Expand file treeCollapse file tree
4 files changed
+429
-2
lines changedFilter options
- docs/source
- images
Expand file treeCollapse file tree
4 files changed
+429
-2
lines changed+17-2Lines changed: 17 additions & 2 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
55 | 55 |
| |
56 | 56 |
| |
57 | 57 |
| |
58 |
| - | |
| 58 | + | |
| 59 | + | |
59 | 60 |
| |
60 | 61 |
| |
61 | 62 |
| |
| |||
138 | 139 |
| |
139 | 140 |
| |
140 | 141 |
| |
141 |
| - | |
| 142 | + | |
142 | 143 |
| |
143 | 144 |
| |
144 | 145 |
| |
| |||
167 | 168 |
| |
168 | 169 |
| |
169 | 170 |
| |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
170 | 185 |
| |
171 | 186 |
| |
172 | 187 |
| |
|
docs/source/images/kg_builder_pipeline.png
Copy file name to clipboard136 KB
Loading
+2Lines changed: 2 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
31 | 31 |
| |
32 | 32 |
| |
33 | 33 |
| |
| 34 | + | |
34 | 35 |
| |
35 | 36 |
| |
36 | 37 |
| |
| |||
41 | 42 |
| |
42 | 43 |
| |
43 | 44 |
| |
| 45 | + | |
44 | 46 |
| |
45 | 47 |
| |
46 | 48 |
| |
|
0 commit comments