Skip to content

Commit 7232d1c

Browse files
Update change log, docs and examples
1 parent 3c9606c commit 7232d1c

File tree

6 files changed

+57
-6
lines changed

6 files changed

+57
-6
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
### Added
66
- Added a new semantic match resolver to the KG Builder for entity resolution based on spaCy embeddings and cosine similarities so that nodes with similar textual properties get merged.
7+
- Added a new fuzzy match resolver to the KG Builder for entity resolution based on RapiFuzz string fuzzy matching.
78

89
## 1.6.0
910

docs/source/api.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,17 @@ SinglePropertyExactMatchResolver
105105
:members: run
106106

107107
SpaCySemanticMatchResolver
108-
================================
108+
==========================
109109

110110
.. autoclass:: neo4j_graphrag.experimental.components.resolver.SpaCySemanticMatchResolver
111111
:members: run
112112

113+
FuzzyMatchResolver
114+
==================
115+
116+
.. autoclass:: neo4j_graphrag.experimental.components.resolver.FuzzyMatchResolver
117+
:members: run
118+
113119
.. _pipeline-section:
114120

115121
*********

docs/source/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,8 @@ List of extra dependencies:
101101
- Warning: this requires `pygraphviz`. Installation instructions can be found `here <https://pygraphviz.github.io/documentation/stable/install.html>`_.
102102
- nlp:
103103
- **spaCy**: load spaCy trained models for nlp pipelines, used by `SpaCySemanticMatchResolver` component from the Knowledge Graph creation pipelines.
104+
- fuzzy-matching:
105+
- **rapidfuzz**: apply fuzzy matching using string similarity, used by `FuzzyMatchResolver` component from the Knowledge Graph creation pipelines.
104106

105107
********
106108
Examples

docs/source/user_guide_kg_builder.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1028,17 +1028,19 @@ without making assumptions about entity similarity. The Entity Resolver
10281028
is responsible for refining the created knowledge graph by merging entity
10291029
nodes that represent the same real-world object.
10301030

1031-
In practice, this package implements two resolvers:
1031+
In practice, this package implements three resolvers:
10321032

10331033
- a simple resolver that merges nodes with the same label and identical "name" property;
1034-
- a semantic match resolver that merges nodes with the same label and similar set of textual properties (by default it uses the "name" property).
1035-
So far, the semantic matching is based on spaCy embeddings and cosine similarities of embedding vectors.
1034+
- two similarity-based resolvers that merge nodes with the same label and similar set of textual properties (by default they use the "name" property):
1035+
1036+
- a semantic match resolver, which is based on spaCy embeddings and cosine similarities of embedding vectors;
1037+
- a fuzzy match resolver, which is based on RapidFuzz for Rapid fuzzy string matching using the Levenshtein Distance.
10361038

10371039
.. warning::
10381040

1039-
- The `SinglePropertyExactMatchResolver` and `SpaCySemanticMatchResolver` **replace** the nodes created by the KG writer.
1041+
- The `SinglePropertyExactMatchResolver`, `SpaCySemanticMatchResolver`, and `FuzzyMatchResolver` **replace** the nodes created by the KG writer.
10401042

1041-
- Check the :ref:`installation` section to make sure you have the required dependencies installed when using `SpaCySemanticMatchResolver`.
1043+
- Check the :ref:`installation` section to make sure you have the required dependencies installed when using `SpaCySemanticMatchResolver`, and `FuzzyMatchResolver`.
10421044

10431045

10441046
The resolvers can be used like this:
@@ -1047,9 +1049,12 @@ The resolvers can be used like this:
10471049
10481050
from neo4j_graphrag.experimental.components.resolver import (
10491051
SinglePropertyExactMatchResolver,
1052+
# SpaCySemanticMatchResolver,
1053+
# FuzzyMatchResolver,
10501054
)
10511055
resolver = SinglePropertyExactMatchResolver(driver) # exact match resolver
10521056
# resolver = SpaCySemanticMatchResolver(driver) # semantic match with spaCy
1057+
# resolver = FuzzyMatchResolver(driver) # fuzzy match with RapidFuzz
10531058
res = await resolver.run()
10541059
10551060
.. warning::

examples/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ are listed in [the last section of this file](#customize).
126126
- [Neo4j writer](./customize/build_graph/components/writers/neo4j_writer.py)
127127
- [Custom](./customize/build_graph/components/writers/custom_writer.py)
128128
- Entity Resolver:
129+
- [FuzzyMatchResolver](./customize/build_graph/components/resolvers/fuzzy_match_entity_resolver_pre_filter.py)
129130
- [SinglePropertyExactMatchResolver with pre-filter](./customize/build_graph/components/resolvers/simple_entity_resolver_pre_filter.py)
130131
- [SpaCySemanticMatchResolver with pre-filter](./customize/build_graph/components/resolvers/spacy_entity_resolver_pre_filter.py)
131132
- [Custom resolver](./customize/build_graph/components/resolvers/custom_resolver.py)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""The FuzzyMatchResolver merges nodes with same label
2+
and similar textual properties (by default using the "name" property) based on RapidFuzz
3+
for string matching.
4+
5+
If the resolution is intended to be applied only on some nodes, for instance nodes that
6+
belong to a specific document, a "WHERE" query can be added. The only variable in the
7+
query scope is "entity".
8+
9+
WARNING: this process is destructive, initial nodes are deleted and replaced
10+
by the resolved ones, but all relationships are kept.
11+
See apoc.refactor.mergeNodes documentation for more details.
12+
"""
13+
14+
from neo4j_graphrag.experimental.components.resolver import (
15+
FuzzyMatchResolver,
16+
)
17+
from neo4j_graphrag.experimental.components.types import ResolutionStats
18+
19+
import neo4j
20+
21+
22+
async def main(driver: neo4j.Driver) -> None:
23+
resolver = FuzzyMatchResolver(
24+
driver,
25+
# let's filter all entities that belong to a certain docId
26+
filter_query="WHERE (entity)-[:FROM_CHUNK]->(:Chunk)-[:FROM_DOCUMENT]->(doc:"
27+
"Document {id = 'docId'}",
28+
# optionally, change the properties used for resolution (default is "name")
29+
# resolve_properties=["name", "ssn"],
30+
# the similarity threshold (default is 0.8)
31+
# similarity_threshold=0.9
32+
# and the neo4j database where data is updated
33+
# neo4j_database="neo4j",
34+
)
35+
res: ResolutionStats = await resolver.run()
36+
print(res)

0 commit comments

Comments
 (0)