Skip to content

Commit a414ca6

Browse files
authored
Refactors index functions (#269)
* Refactors _get_hybrid_query * Added effective search ratio to vector and hybrid searches * Removes id warning from get_search_query * Removed 'elementId(node) AS id' from tests * Refactors vector and full text index search queries to support relationships * get_search_query refactoring * Reverted embedding_node_property change * More reversions * Reverted _get_hybrid_query * Small bug fix * Added vector relationship query tests * Added use_parallel_runtime to _get_filtered_vector_query * Added remove_lucene_chars * Fixed remove_lucene_chars bugs * Renamed remove lucene chars function * Added retrieve_vector_index_info and retrieve_fulltext_index_info * Fixed get_search_query bug * Docstring updates * Adds upsert_texts_and_vectors utility function * Updates docs * Added NODES_MISSING_EMBEDDINGS_QUERY * Renamed upsert_texts_and_vectors * Removes indexes_2 * Removes upsert_texts_and_embeddings * Renamed IndexType to EntityType * Renaming params for get_search_query * Removed DistanceStrategy and updated CHANGELOG * Hybrid retriever no longer sanitises text before embedding it * Re-added id parameter to retrievers * Readded deprecation warning * Removed the _remove_lucene_chars function * Removed unneeded lines * Small bug fix
1 parent 970158f commit a414ca6

File tree

13 files changed

+752
-99
lines changed

13 files changed

+752
-99
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,15 @@
22

33
## Next
44

5+
### Added
6+
7+
- Utility functions to retrieve metadata for vector and full-text indexes.
8+
- Support for effective_search_ratio parameter in vector and hybrid searches.
9+
10+
### Changed
11+
12+
- Refactored index-related functions for improved compatibility and functionality.
13+
514
## 1.4.3
615

716
### Added
@@ -18,6 +27,7 @@
1827
- Refactored schema creation code to reduce duplication and improve maintainability.
1928

2029
### Fixed
30+
2131
- Removed the `uuid` package from dependencies (not needed with Python 3).
2232
- Fixed a bug in the `AnthropicLLM` class preventing it from being used in `GraphRAG` pipeline.
2333

docs/source/api.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,10 @@ Database Interaction
390390

391391
.. autofunction:: neo4j_graphrag.indexes.async_upsert_vector_on_relationship
392392

393+
.. autofunction:: neo4j_graphrag.indexes.retrieve_vector_index_info
394+
395+
.. autofunction:: neo4j_graphrag.indexes.retrieve_fulltext_index_info
396+
393397
.. autofunction:: neo4j_graphrag.schema.get_structured_schema
394398

395399
.. autofunction:: neo4j_graphrag.schema.get_schema

src/neo4j_graphrag/indexes.py

Lines changed: 106 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from __future__ import annotations
1616

1717
import logging
18-
from typing import Literal, Optional
18+
from typing import List, Literal, Optional
1919

2020
import neo4j
2121
from pydantic import ValidationError
@@ -469,3 +469,108 @@ async def async_upsert_vector_on_relationship(
469469
raise Neo4jInsertionError(
470470
f"Upserting vector to Neo4j failed: {e.message}"
471471
) from e
472+
473+
474+
def _sort_by_index_name(
475+
records: List[neo4j.Record], index_name: str
476+
) -> List[neo4j.Record]:
477+
"""
478+
Sorts the provided list of dictionaries containing index information so
479+
that any item whose 'name' key matches the given 'index_name' appears at
480+
the front of the list.
481+
482+
Args:
483+
records (List[Dict[str, Any]]): The list of records containing index
484+
information to sort.
485+
index_name (str): The index name to match against the 'name' key of
486+
each dictionary.
487+
488+
Returns:
489+
List[Dict[str, Any]]: A newly sorted list with items matching
490+
'index_name' placed first.
491+
"""
492+
return sorted(records, key=lambda x: x.get("name") != index_name)
493+
494+
495+
def retrieve_vector_index_info(
496+
driver: neo4j.Driver, index_name: str, label_or_type: str, embedding_property: str
497+
) -> Optional[neo4j.Record]:
498+
"""
499+
Check if a vector index exists in a Neo4j database and return its
500+
information. If no matching index is found, returns None.
501+
502+
Args:
503+
driver (neo4j.Driver): Neo4j Python driver instance.
504+
index_name (str): The name of the index to look up.
505+
label_or_type (str): The label (for nodes) or type (for relationships)
506+
of the index.
507+
embedding_property (str): The name of the property containing the
508+
embeddings.
509+
510+
Returns:
511+
Optional[Dict[str, Any]]:
512+
A dictionary containing the first matching index's information if found,
513+
or None otherwise.
514+
"""
515+
result = driver.execute_query(
516+
query_=(
517+
"SHOW INDEXES YIELD name, type, entityType, labelsOrTypes, "
518+
"properties, options WHERE type = 'VECTOR' AND (name = $index_name "
519+
"OR (labelsOrTypes[0] = $label_or_type AND "
520+
"properties[0] = $embedding_property)) "
521+
"RETURN name, type, entityType, labelsOrTypes, properties, options"
522+
),
523+
parameters_={
524+
"index_name": index_name,
525+
"label_or_type": label_or_type,
526+
"embedding_property": embedding_property,
527+
},
528+
)
529+
index_information = _sort_by_index_name(result.records, index_name)
530+
if len(index_information) > 0:
531+
return index_information[0]
532+
else:
533+
return None
534+
535+
536+
def retrieve_fulltext_index_info(
537+
driver: neo4j.Driver,
538+
index_name: str,
539+
label_or_type: str,
540+
text_properties: List[str] = [],
541+
) -> Optional[neo4j.Record]:
542+
"""
543+
Check if a full text index exists in a Neo4j database and return its
544+
information. If no matching index is found, returns None.
545+
546+
Args:
547+
driver (neo4j.Driver): Neo4j Python driver instance.
548+
index_name (str): The name of the index to look up.
549+
label_or_type (str): The label (for nodes) or type (for relationships)
550+
of the index.
551+
text_properties (List[str]): The names of the text properties indexed.
552+
553+
Returns:
554+
Optional[Dict[str, Any]]:
555+
A dictionary containing the first matching index's information if found,
556+
or None otherwise.
557+
"""
558+
result = driver.execute_query(
559+
query_=(
560+
"SHOW INDEXES YIELD name, type, entityType, labelsOrTypes, properties, options "
561+
"WHERE type = 'FULLTEXT' AND (name = $index_name "
562+
"OR (labelsOrTypes = [$label_or_type] AND "
563+
"properties = $text_properties)) "
564+
"RETURN name, type, entityType, labelsOrTypes, properties, options"
565+
),
566+
parameters_={
567+
"index_name": index_name,
568+
"label_or_type": label_or_type,
569+
"text_properties": text_properties,
570+
},
571+
)
572+
index_information = _sort_by_index_name(result.records, index_name)
573+
if len(index_information) > 0:
574+
return index_information[0]
575+
else:
576+
return None

0 commit comments

Comments
 (0)