Skip to content

Add Fuzzy match resolver for KG builder #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

NathalieCharbel
Copy link
Contributor

Description

This PR extends the work introduced in PR#310 by adding a new fuzzy match resolver based on RapidFuzz.

Main changes:

  • New base class BasePropertySimilarityResolver, which main purpose is to resolve entities with same label and similar set of textual properties whether similarity measure is calculated using spaCy or fuzzy matching.
  • FuzzyMatchResolver for string fuzzy matching. This resolver uses the weighted ratio (WRatio) function of RapidFuzz. This function combines multiple methods (including token sort, token set, and partial ratios) to produce a composite similarity score. It seems a convenient measure that is robust across a variety of string differences. The resolver also applies text preprocessing (via RapidFuzz's default processor), ensuring that text is normalized (e.g., lowercasing, whitespace stripping), and normalizes the score to be always between 0 and 1.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Note

Please provide an estimated complexity of this PR of either Low, Medium or High

Complexity:

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@NathalieCharbel NathalieCharbel requested a review from a team as a code owner April 1, 2025 13:51
Copy link
Contributor

@stellasia stellasia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Some minors comments above but it looks great!

Copy link
Contributor

@stellasia stellasia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NathalieCharbel NathalieCharbel force-pushed the kg-builder-entity-resolution-fuzzy branch from 5ad7160 to b8fb899 Compare April 2, 2025 16:00
@NathalieCharbel NathalieCharbel force-pushed the kg-builder-entity-resolution-fuzzy branch from b8fb899 to 0381f3b Compare April 2, 2025 16:26
@NathalieCharbel NathalieCharbel merged commit 2d62e4c into neo4j:main Apr 3, 2025
7 checks passed
@NathalieCharbel NathalieCharbel deleted the kg-builder-entity-resolution-fuzzy branch April 3, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants