Neo4jVector doesn't work well with HuggingFaceEmbeddings when reusing the graph #24295

TripleCamellya · 2024-07-16T06:58:39Z

TripleCamellya
Jul 16, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_community.vectorstores import Neo4jVector
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)
self.existing_graph_parts = Neo4jVector.from_existing_graph(
    embedding=embeddings,
    url=uri,
    username=username,
    password=password,
    node_label="part",
    text_node_properties=["name"],
    embedding_node_property="embedding",
)

Description

Sorry for my poor English!

When I run code above, if all nodes have its embedding, it will run error.

Traceback (most recent call last):
  File "D:\graph_rag.py", line 133, in <module>
    graph_rag = GraphRag()
                ^^^^^^^^^^
  File "D:\graph_rag.py", line 44, in __init__
    self.existing_graph_parts = Neo4jVector.from_existing_graph(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\syh\AppData\Local\Programs\Python\Python312\Lib\site-packages\langchain_community\vectorstores\neo4j_vector.py", line 1431, in from_existing_graph
    text_embeddings = embedding.embed_documents([el["text"] for el in data])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\syh\AppData\Local\Programs\Python\Python312\Lib\site-packages\langchain_huggingface\embeddings\huggingface.py", line 87, in embed_documents
    embeddings = self.client.encode(
                 ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\syh\AppData\Local\Programs\Python\Python312\Lib\site-packages\sentence_transformers\SentenceTransformer.py", line 565, in encode
    if all_embeddings[0].dtype == torch.bfloat16:
       ~~~~~~~~~~~~~~^^^
IndexError: list index out of range

I think it error because all nodes has its embedding already, so when run the code in the lib below:
<file: langchain_community\vectorstores\neo4j_vector.py>
from line 1421

        while True:
            fetch_query = (
                f"MATCH (n:`{node_label}`) "
                f"WHERE n.{embedding_node_property} IS null "
                "AND any(k in $props WHERE n[k] IS NOT null) "
                f"RETURN elementId(n) AS id, reduce(str='',"
                "k IN $props | str + '\\n' + k + ':' + coalesce(n[k], '')) AS text "
                "LIMIT 1000"
            )
            data = store.query(fetch_query, params={"props": text_node_properties})
            text_embeddings = embedding.embed_documents([el["text"] for el in data])

This code will fetch some nodes which don't have embedding_node_property. Since all nodes in my neo4j already have embedding, so the data is a empty list.
Then in the code following, 0 is out of an empty list's index.
<file: sentence_transformers\SentenceTransformer.py>
from line 563

        elif convert_to_numpy:
            if not isinstance(all_embeddings, np.ndarray):
                if all_embeddings[0].dtype == torch.bfloat16:
                    all_embeddings = np.asarray([emb.float().numpy() for emb in all_embeddings])
                else:
                    all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])

That's where the error happened.

Thanks!

System Info

"pip langchain"
windows 11
python3.12

Answered by dosubot[bot]

Jul 16, 2024

Hello, @SeeleZaych! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

To handle the IndexError that occurs when all nodes in your Neo4j database already have their embeddings, you can add a check for the length of the data list before proceeding with further operations. Here is an example of how you can modify the code to include this check:

while True:
    fetch_query = (
        f"MATCH (n:`{node_label}`) "
        f"WHERE n.{embedding_node_property} IS null "
        "AND any(k in $props WHERE n[k] IS NOT null) "
        f"RETURN elementId(n) AS id, reduce(str='',"
        "k IN $props | str + '\\n' + k + ':' + coalesce(n[k], '')) …

View full answer

@dosu · 2024-07-16T07:07:50Z

dosubot[bot]
bot Jul 16, 2024

Hello, @SeeleZaych! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's tackle this together!

To handle the IndexError that occurs when all nodes in your Neo4j database already have their embeddings, you can add a check for the length of the data list before proceeding with further operations. Here is an example of how you can modify the code to include this check:

while True:
    fetch_query = (
        f"MATCH (n:`{node_label}`) "
        f"WHERE n.{embedding_node_property} IS null "
        "AND any(k in $props WHERE n[k] IS NOT null) "
        f"RETURN elementId(n) AS id, reduce(str='',"
        "k IN $props | str + '\\n' + k + ':' + coalesce(n[k], '')) AS text "
        "LIMIT 1000"
    )
    data = store.query(fetch_query, params={"props": text_node_properties})
    
    # Check if data is empty to avoid IndexError
    if not data:
        break
    
    text_embeddings = embedding.embed_documents([el["text"] for el in data])
    params = {
        "data": [
            {"id": el["id"], "embedding": embedding}
            for el, embedding in zip(data, text_embeddings)
        ]
    }

    store.query(
        "UNWIND $data AS row "
        f"MATCH (n:`{node_label}`) "
        "WHERE elementId(n) = row.id "
        f"CALL db.create.setVectorProperty(n, "
        f"'{embedding_node_property}', row.embedding) "
        "YIELD node RETURN count(*)",
        params=params,
    )
    
    # If embedding calculation should be stopped
    if len(data) < 1000:
        break

This modification ensures that if the data list is empty, the loop will break, preventing the IndexError from occurring ^[1].

To continue talking to Dosu, mention @dosu.

1 reply

TripleCamellya Jul 16, 2024
Author

Emmmm but the code you modify is the lib code.
OpenAIEmbedding will not encounter this bug, this bug only happen when using HuggingFaceEmbeddings.

JianhengHou · 2024-07-18T08:55:43Z

JianhengHou
Jul 18, 2024

I encountered exactly the same error given the same context. It worked well before I reconfigured my virtual environment (which is strange).

What @dosubot provided does work well. However this is hard code as it is mentioned above.

Suggest Langchain community team to add this fix into their next release.

0 replies

YuffieHuang · 2024-07-20T03:03:27Z

YuffieHuang
Jul 20, 2024

I met the same issue when trying to use a fine-tuned Hugging Face embeddings. I hope that the issue will be fixed soon. Thanks.

0 replies

Astroa7m · 2024-08-10T22:46:23Z

Astroa7m
Aug 10, 2024

I am currently encountering the same issue, any fixes?

1 reply

TripleCamellya Aug 10, 2024
Author

The bug has been fixed in pull 24912. Thanks for you to remind me to close the discussuion!
#24912

TripleCamellya · 2024-08-10T23:24:25Z

TripleCamellya
Aug 10, 2024
Author

The bug has been fixed in pull 24912.
#24912

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Neo4jVector doesn't work well with HuggingFaceEmbeddings when reusing the graph #24295

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Neo4jVector doesn't work well with HuggingFaceEmbeddings when reusing the graph #24295

Uh oh!

TripleCamellya Jul 16, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 5 comments · 2 replies

Uh oh!

dosubot[bot] bot Jul 16, 2024

Uh oh!

TripleCamellya Jul 16, 2024 Author

Uh oh!

Uh oh!

JianhengHou Jul 18, 2024

Uh oh!

YuffieHuang Jul 20, 2024

Uh oh!

Astroa7m Aug 10, 2024

Uh oh!

TripleCamellya Aug 10, 2024 Author

Uh oh!

TripleCamellya Aug 10, 2024 Author

TripleCamellya
Jul 16, 2024

Replies: 5 comments 2 replies

dosubot[bot]
bot Jul 16, 2024

TripleCamellya Jul 16, 2024
Author

JianhengHou
Jul 18, 2024

YuffieHuang
Jul 20, 2024

Astroa7m
Aug 10, 2024

TripleCamellya Aug 10, 2024
Author

TripleCamellya
Aug 10, 2024
Author