Skip to content

Commit 9732246

Browse files
Update CHANGELOG, docs, and examples
1 parent 13a89e6 commit 9732246

File tree

7 files changed

+89
-6
lines changed

7 files changed

+89
-6
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
## Next
44

5+
### Added
6+
- Added a new semantic match resolver to the KG Builder for entity resolution based on spaCy embeddings and cosine similarities so that nodes with similar textual properties get merged.
7+
58
## 1.6.0
69

710
### Added

docs/source/api.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,11 @@ SinglePropertyExactMatchResolver
104104
.. autoclass:: neo4j_graphrag.experimental.components.resolver.SinglePropertyExactMatchResolver
105105
:members: run
106106

107+
SpaCySemanticMatchResolver
108+
================================
109+
110+
.. autoclass:: neo4j_graphrag.experimental.components.resolver.SpaCySemanticMatchResolver
111+
:members: run
107112

108113
.. _pipeline-section:
109114

docs/source/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,8 @@ List of extra dependencies:
9999
- **qdrant**: store vectors in Qdrant
100100
- **experimental**: experimental features mainly from the Knowledge Graph creation pipelines.
101101
- Warning: this requires `pygraphviz`. Installation instructions can be found `here <https://pygraphviz.github.io/documentation/stable/install.html>`_.
102-
102+
- nlp:
103+
- **spaCy**: load spaCy trained models for nlp pipelines, used by `SpaCySemanticMatchResolver` component from the Knowledge Graph creation pipelines.
103104

104105
********
105106
Examples

docs/source/user_guide_kg_builder.rst

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1028,22 +1028,28 @@ without making assumptions about entity similarity. The Entity Resolver
10281028
is responsible for refining the created knowledge graph by merging entity
10291029
nodes that represent the same real-world object.
10301030

1031-
In practice, this package implements a simple resolver that merges nodes
1032-
with the same label and identical "name" property.
1031+
In practice, this package implements two resolvers:
1032+
1033+
- a simple resolver that merges nodes with the same label and identical "name" property;
1034+
- a semantic match resolver that merges nodes with the same label and similar set of textual properties (by default it uses the "name" property).
1035+
So far, the semantic matching is based on spaCy embeddings and cosine similarities of embedding vectors.
10331036

10341037
.. warning::
10351038

1036-
The `SinglePropertyExactMatchResolver` **replaces** the nodes created by the KG writer.
1039+
- The `SinglePropertyExactMatchResolver` and `SpaCySemanticMatchResolver` **replace** the nodes created by the KG writer.
1040+
1041+
- Check the :ref:`installation` section to make sure you have the required dependencies installed when using `SpaCySemanticMatchResolver`.
10371042

10381043

1039-
It can be used like this:
1044+
The resolvers can be used like this:
10401045

10411046
.. code:: python
10421047
10431048
from neo4j_graphrag.experimental.components.resolver import (
10441049
SinglePropertyExactMatchResolver,
10451050
)
1046-
resolver = SinglePropertyExactMatchResolver(driver)
1051+
resolver = SinglePropertyExactMatchResolver(driver) # exact match resolver
1052+
# resolver = SpaCySemanticMatchResolver(driver) # semantic match with spaCy
10471053
res = await resolver.run()
10481054
10491055
.. warning::

examples/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,8 @@ are listed in [the last section of this file](#customize).
128128
- Entity Resolver:
129129
- [SinglePropertyExactMatchResolver](./customize/build_graph/components/resolvers/simple_entity_resolver.py)
130130
- [SinglePropertyExactMatchResolver with pre-filter](./customize/build_graph/components/resolvers/simple_entity_resolver_pre_filter.py)
131+
- [SpaCySemanticMatchResolver](./customize/build_graph/components/resolvers/spacy_entity_resolver.py)
132+
- [SpaCySemanticMatchResolver with pre-filter](./customize/build_graph/components/resolvers/spacy_entity_resolver_pre_filter.py)
131133
- [Custom resolver](./customize/build_graph/components/resolvers/custom_resolver.py)
132134
- [Custom component](./customize/build_graph/components/custom_component.py)
133135

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
"""The SpaCySemanticMatchResolver merge nodes with same label
2+
and similar textual properties (by default using the "name" property) based on spaCy
3+
embeddings and cosine similarities of embedding vectors.
4+
5+
WARNING: this process is destructive, initial nodes are deleted and replaced
6+
by the resolved ones, but all relationships are kept.
7+
See apoc.refactor.mergeNodes documentation for more details.
8+
"""
9+
10+
import neo4j
11+
from neo4j_graphrag.experimental.components.resolver import (
12+
SpaCySemanticMatchResolver,
13+
)
14+
from neo4j_graphrag.experimental.components.types import ResolutionStats
15+
16+
17+
async def main(driver: neo4j.Driver) -> None:
18+
resolver = SpaCySemanticMatchResolver(
19+
driver,
20+
# optionally, change the properties used for resolution (default is "name")
21+
# resolve_properties=["name", "ssn"],
22+
# the similarity threshold (default is 0.8)
23+
# similarity_threshold=0.9
24+
# the spaCy trained model (default is "en_core_web_lg")
25+
# spacy_model="en_core_web_sm"
26+
# and the neo4j database where data is updated
27+
# neo4j_database="neo4j",
28+
)
29+
res: ResolutionStats = await resolver.run()
30+
print(res)
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""The SpaCySemanticMatchResolver merges nodes with same label
2+
and similar textual properties (by default using the "name" property).
3+
4+
If the resolution is intended to be applied only on some nodes, for instance nodes that
5+
belong to a specific document, a "WHERE" query can be added. The only variable in the
6+
query scope is "entity".
7+
8+
WARNING: this process is destructive, initial nodes are deleted and replaced
9+
by the resolved ones, but all relationships are kept.
10+
See apoc.refactor.mergeNodes documentation for more details.
11+
"""
12+
13+
import neo4j
14+
from neo4j_graphrag.experimental.components.resolver import (
15+
SpaCySemanticMatchResolver,
16+
)
17+
from neo4j_graphrag.experimental.components.types import ResolutionStats
18+
19+
20+
async def main(driver: neo4j.Driver) -> None:
21+
resolver = SpaCySemanticMatchResolver(
22+
driver,
23+
# let's filter all entities that belong to a certain docId
24+
filter_query="WHERE (entity)-[:FROM_CHUNK]->(:Chunk)-[:FROM_DOCUMENT]->(doc:"
25+
"Document {id = 'docId'}",
26+
# optionally, change the properties used for resolution (default is "name")
27+
# resolve_properties=["name", "ssn"],
28+
# the similarity threshold (default is 0.8)
29+
# similarity_threshold=0.9
30+
# the spaCy trained model (default is "en_core_web_lg")
31+
# spacy_model="en_core_web_sm"
32+
# and the neo4j database where data is updated
33+
# neo4j_database="neo4j",
34+
)
35+
res: ResolutionStats = await resolver.run()
36+
print(res)

0 commit comments

Comments
 (0)