Skip to content

Add SpaCy Semantic match resolver for KG Builder #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

NathalieCharbel
Copy link
Contributor

Description

This PR introduces a new SpaCySemanticMatchResolver component that leverages spaCy embeddings to merge nodes based on textual similarity. It expands upon the existing resolver framework (which was previously limited to exact matching) to support semantic comparisons of a given set of textual properties (the "name" property is still the default choice) via cosine similarity of embeddings. The main motivation behind this is to provide a more robust entity resolution approach that avoid missing merges (e.g., Two Person nodes whose name properties are respectively "John Smith" and "Jonathan Smith" and SSN properties are the same, etc.), and false merges (e.g., Two Person nodes with exact name properties and different SSN properties, etc.).

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity:

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@NathalieCharbel NathalieCharbel requested a review from a team as a code owner March 18, 2025 11:08
try:
return spacy.load(model_name)
except OSError as e:
# The exact error message can differ slightly depending on spaCy version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the message here, does that mean that the fallback does not work in some cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user would like to use the spaCy resolver two things are required:
1- installing the spaCy library
2- downloading the language model.
The first one is handled in poetry. This function is to handle downloading the model. Maybe we can also trigger it from poetry? it's just that I found it easier this way. any preference?

@NathalieCharbel
Copy link
Contributor Author

Closing this PR as it was handled by PR#319

@NathalieCharbel NathalieCharbel deleted the kg-builder-entity-resolution-spacy branch April 3, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants