Add Fuzzy match resolver for KG builder #319
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR extends the work introduced in PR#310 by adding a new fuzzy match resolver based on RapidFuzz.
Main changes:
BasePropertySimilarityResolver
, which main purpose is to resolve entities with same label and similar set of textual properties whether similarity measure is calculated using spaCy or fuzzy matching.FuzzyMatchResolver
for string fuzzy matching. This resolver uses the weighted ratio (WRatio
) function of RapidFuzz. This function combines multiple methods (including token sort, token set, and partial ratios) to produce a composite similarity score. It seems a convenient measure that is robust across a variety of string differences. The resolver also applies text preprocessing (via RapidFuzz's default processor), ensuring that text is normalized (e.g., lowercasing, whitespace stripping), and normalizes the score to be always between 0 and 1.Type of Change
Complexity
Complexity:
How Has This Been Tested?
Checklist
The following requirements should have been met (depending on the changes in the branch):