Neo4jVector doesn't work well with HuggingFaceEmbeddings when reusing the graph #24295
-
Checked other resources
Commit to Help
Example Codefrom langchain_community.vectorstores import Neo4jVector
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-mpnet-base-v2"
)
self.existing_graph_parts = Neo4jVector.from_existing_graph(
embedding=embeddings,
url=uri,
username=username,
password=password,
node_label="part",
text_node_properties=["name"],
embedding_node_property="embedding",
) DescriptionSorry for my poor English! When I run code above, if all nodes have its embedding, it will run error.
I think it error because all nodes has its embedding already, so when run the code in the lib below: while True:
fetch_query = (
f"MATCH (n:`{node_label}`) "
f"WHERE n.{embedding_node_property} IS null "
"AND any(k in $props WHERE n[k] IS NOT null) "
f"RETURN elementId(n) AS id, reduce(str='',"
"k IN $props | str + '\\n' + k + ':' + coalesce(n[k], '')) AS text "
"LIMIT 1000"
)
data = store.query(fetch_query, params={"props": text_node_properties})
text_embeddings = embedding.embed_documents([el["text"] for el in data]) This code will fetch some nodes which don't have embedding_node_property. Since all nodes in my neo4j already have embedding, so the data is a empty list. elif convert_to_numpy:
if not isinstance(all_embeddings, np.ndarray):
if all_embeddings[0].dtype == torch.bfloat16:
all_embeddings = np.asarray([emb.float().numpy() for emb in all_embeddings])
else:
all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings]) That's where the error happened. Thanks! System Info"pip langchain" |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 2 replies
-
Hello, @SeeleZaych! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together! To handle the while True:
fetch_query = (
f"MATCH (n:`{node_label}`) "
f"WHERE n.{embedding_node_property} IS null "
"AND any(k in $props WHERE n[k] IS NOT null) "
f"RETURN elementId(n) AS id, reduce(str='',"
"k IN $props | str + '\\n' + k + ':' + coalesce(n[k], '')) AS text "
"LIMIT 1000"
)
data = store.query(fetch_query, params={"props": text_node_properties})
# Check if data is empty to avoid IndexError
if not data:
break
text_embeddings = embedding.embed_documents([el["text"] for el in data])
params = {
"data": [
{"id": el["id"], "embedding": embedding}
for el, embedding in zip(data, text_embeddings)
]
}
store.query(
"UNWIND $data AS row "
f"MATCH (n:`{node_label}`) "
"WHERE elementId(n) = row.id "
f"CALL db.create.setVectorProperty(n, "
f"'{embedding_node_property}', row.embedding) "
"YIELD node RETURN count(*)",
params=params,
)
# If embedding calculation should be stopped
if len(data) < 1000:
break This modification ensures that if the |
Beta Was this translation helpful? Give feedback.
-
I encountered exactly the same error given the same context. It worked well before I reconfigured my virtual environment (which is strange). What @dosubot provided does work well. However this is hard code as it is mentioned above. Suggest Langchain community team to add this fix into their next release. |
Beta Was this translation helpful? Give feedback.
-
I met the same issue when trying to use a fine-tuned Hugging Face embeddings. I hope that the issue will be fixed soon. Thanks. |
Beta Was this translation helpful? Give feedback.
-
I am currently encountering the same issue, any fixes? |
Beta Was this translation helpful? Give feedback.
-
The bug has been fixed in pull 24912. |
Beta Was this translation helpful? Give feedback.
Hello, @SeeleZaych! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!
To handle the
IndexError
that occurs when all nodes in your Neo4j database already have their embeddings, you can add a check for the length of thedata
list before proceeding with further operations. Here is an example of how you can modify the code to include this check: