-
Notifications
You must be signed in to change notification settings - Fork 103
Add SpaCy Semantic match resolver for KG Builder #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SpaCy Semantic match resolver for KG Builder #310
Conversation
examples/customize/build_graph/components/resolvers/spacy_entity_resolver_pre_filter.py
Show resolved
Hide resolved
try: | ||
return spacy.load(model_name) | ||
except OSError as e: | ||
# The exact error message can differ slightly depending on spaCy version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the message here, does that mean that the fallback does not work in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user would like to use the spaCy resolver two things are required:
1- installing the spaCy library
2- downloading the language model.
The first one is handled in poetry. This function is to handle downloading the model. Maybe we can also trigger it from poetry? it's just that I found it easier this way. any preference?
Closing this PR as it was handled by PR#319 |
Description
This PR introduces a new
SpaCySemanticMatchResolver
component that leverages spaCy embeddings to merge nodes based on textual similarity. It expands upon the existing resolver framework (which was previously limited to exact matching) to support semantic comparisons of a given set of textual properties (the "name" property is still the default choice) via cosine similarity of embeddings. The main motivation behind this is to provide a more robust entity resolution approach that avoid missing merges (e.g., TwoPerson
nodes whosename
properties are respectively "John Smith" and "Jonathan Smith" andSSN
properties are the same, etc.), and false merges (e.g., TwoPerson
nodes with exactname
properties and differentSSN
properties, etc.).Type of Change
Complexity
Complexity:
How Has This Been Tested?
Checklist
The following requirements should have been met (depending on the changes in the branch):