Skip to content

Ability to create lexical graph only #127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Oct 23, 2024

Conversation

stellasia
Copy link
Contributor

@stellasia stellasia commented Sep 11, 2024

Description

This PR adds a LexicalGraphBuilder component that enables users to create the lexical graph without entity/relation extraction (see added example). It also updates the ERExtraction component to use the new LexicalGraphBuilder.

Why

Pipelines that can be created after this PR:

  1. Pdf Loader
  2. Text splitter
  3. Chunk embedder
  4. Lexical Graph Builder
  5. Neo4j Writer

A follow-up PR (#135) will implement the Neo4jChunkReader component that will enable to start the ER extraction from chunks already saved in the DB.

Configuration

All parameters for the lexical graph are stored in a LexicalGraphConfig object (chunk and document node labels, relationship types, property names). This object is totally optional, default values are provided for all parameters.

Future

We could consider deprecating the create_lexical_graph behavior in the EntityRelationExtractor and force users who want to build the lexical graph to use the LexicalGraphBuilder -> Neo4jWriter pattern.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Complexity: Medium

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@stellasia stellasia marked this pull request as ready for review September 11, 2024 13:45
@stellasia stellasia marked this pull request as draft September 11, 2024 14:51
@stellasia stellasia marked this pull request as ready for review September 17, 2024 08:13
@stellasia stellasia marked this pull request as draft September 17, 2024 11:22
…into feature/lexical-graph-component

# Conflicts:
#	docs/source/user_guide_kg_builder.rst
#	examples/pipeline/kg_builder_from_text.py
…into feature/lexical-graph-component

# Conflicts:
#	examples/customize/build_graph/pipeline/lexical_graph_from_text.py
@stellasia stellasia marked this pull request as ready for review October 21, 2024 19:10
@stellasia stellasia requested a review from a team as a code owner October 21, 2024 19:10
schema: Union[SchemaConfig, None] = None,
examples: str = "",
**kwargs: Any,
) -> Neo4jGraph:
"""Perform entity and relation extraction for all chunks in a list."""
"""Perform entity and relation extraction for all chunks in a list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define 'lexical graph' either here or in the user guide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I tried to do here. What is missing? I'll add a note in this docstring anyway to clarify the behavior of create_lexical_graph and lexical_graph_config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps how is the term 'lexical graph' from any other knowledge graph. I think a newcomer will be confused with that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more details and link to the user guide, let me know if you think more information is needd.

Copy link
Contributor

@willtai willtai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@stellasia stellasia merged commit 9391662 into neo4j:main Oct 23, 2024
7 checks passed
@stellasia stellasia mentioned this pull request Oct 23, 2024
15 tasks
@stellasia stellasia deleted the feature/lexical-graph-component branch December 11, 2024 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants