Skip to content

Fix issue requiring to install spacy and rapidfuzz even if not used #337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Next

### Fixed

- Fixed a bug where `spacy` and `rapidfuzz` needed to be installed even if not using the relevant entity resolvers.

## 1.7.0

### Added
Expand Down
6 changes: 2 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,10 +98,8 @@ List of extra dependencies:
- **pinecone**: store vectors in Pinecone
- **qdrant**: store vectors in Qdrant
- **experimental**: experimental features mainly from the Knowledge Graph creation pipelines.
- nlp:
- **spaCy**: load spaCy trained models for nlp pipelines, used by `SpaCySemanticMatchResolver` component from the Knowledge Graph creation pipelines.
- fuzzy-matching:
- **rapidfuzz**: apply fuzzy matching using string similarity, used by `FuzzyMatchResolver` component from the Knowledge Graph creation pipelines.
- **nlp**: installs spaCy for nlp pipelines, used by `SpaCySemanticMatchResolver` component from the Knowledge Graph creation pipelines.
- **fuzzy-matching**: installs **rapidfuzz** to fuzzy matching using string similarity, used by `FuzzyMatchResolver` component from the Knowledge Graph creation pipelines.

********
Examples
Expand Down
65 changes: 53 additions & 12 deletions src/neo4j_graphrag/experimental/components/resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,36 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from __future__ import annotations

import abc
import logging
from itertools import combinations
from typing import Any, List, Optional
from typing import Any, List, Optional, TYPE_CHECKING


try:
from rapidfuzz import fuzz
from rapidfuzz import utils

IS_RAPIDFUZZ_INSTALLED = True
except ImportError:
IS_RAPIDFUZZ_INSTALLED = False

try:
import spacy
from spacy.cli.download import download as spacy_download
from spacy.language import Language
import numpy as np

IS_SPACY_INSTALLED = True
except ImportError:
IS_SPACY_INSTALLED = False

import numpy as np
import rapidfuzz.fuzz
import spacy
from numpy.typing import NDArray
from rapidfuzz import utils
from spacy.cli.download import download as spacy_download
from spacy.language import Language

if TYPE_CHECKING:
import numpy as np
from numpy.typing import NDArray

import neo4j
from neo4j_graphrag.experimental.components.types import ResolutionStats
Expand Down Expand Up @@ -334,6 +352,11 @@ def __init__(
spacy_model: str = "en_core_web_lg",
neo4j_database: Optional[str] = None,
) -> None:
if not IS_SPACY_INSTALLED:
raise ImportError("""`spacy` python module needs to be installed to use
the SpaCySemanticMatchResolver. Install it with:
`pip install "neo4j-graphrag[nlp]"`
""")
super().__init__(
driver,
filter_query,
Expand Down Expand Up @@ -398,6 +421,27 @@ class FuzzyMatchResolver(BasePropertySimilarityResolver):
and 1.
"""

def __init__(
self,
driver: neo4j.Driver,
filter_query: Optional[str] = None,
resolve_properties: Optional[List[str]] = None,
similarity_threshold: float = 0.8,
neo4j_database: Optional[str] = None,
) -> None:
if not IS_RAPIDFUZZ_INSTALLED:
raise ImportError("""`rapidfuzz` python module needs to be installed to use
the SpaCySemanticMatchResolver. Install it with:
`pip install "neo4j-graphrag[fuzzy-matching]"`
""")
super().__init__(
driver,
filter_query,
resolve_properties,
similarity_threshold,
neo4j_database,
)

async def run(self) -> ResolutionStats:
return await super().run()

Expand All @@ -406,7 +450,4 @@ def compute_similarity(self, text_a: str, text_b: str) -> float:
# normalize the input strings before the comparison is done (processor=utils.default_process)
# e.g., lowercase the text, strip whitespace, and remove punctuation
# normalize the score to the 0..1 range
return (
rapidfuzz.fuzz.WRatio(text_a, text_b, processor=utils.default_process)
/ 100.0
)
return fuzz.WRatio(text_a, text_b, processor=utils.default_process) / 100.0