Set Dimensions on Ollama Embeddings for Query #17613

stealthier-ai · 2024-02-16T02:34:43Z

stealthier-ai
Feb 16, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

embeddings: OllamaEmbeddings = OllamaEmbeddings(model="XYZMODEL")

Description

I chunked a bunch of a documents using the Ollama model embedding option and inserted the vectors into LanceDB. However, when I query the data using the same exact Ollama embedding model I receive the following error: "ValueError: Query vector size 8192 does not match index column size 384". If I am using the same exact embedding model for the documents in RAG, why is my query vector a different size than the vector in the index?

System Info

Linux Debian

@stealthier-ai · 2024-02-16T02:35:00Z

dosubot[bot]
bot Feb 16, 2024

🤖

Hey @stealthier-ai, nice to see you diving deep into the bits and bytes again! 🤖

Based on the information you've provided and the context from the LangChain repository, it seems like the issue you're encountering is due to a mismatch in the dimensions of the embeddings generated by the embed_documents and embed_query methods of the OllamaEmbeddings model.

The embed_documents method is used when adding documents to the LanceDB database, while the embed_query method is used when performing a query. The error message you're seeing suggests that the embed_documents method is generating embeddings of size 384, while the embed_query method is generating embeddings of size 8192.

This discrepancy could be due to using different models or configurations for the embed_documents and embed_query methods. To resolve this issue, you should ensure that the same model and configuration are used for both methods, so that they produce embeddings of the same size.

Here's the relevant code from the LanceDB class in the LangChain repository:

# In the add_texts method
embeddings = self._embedding.embed_documents(list(texts))

# In the similarity_search method
embedding = self._embedding.embed_query(query)

As for the default dimension size for the OllamaEmbeddings model in LangChain, I wasn't able to find this information in the repository. It's possible that this information is available elsewhere, or I may have missed it. If you have more information about the configuration of the OllamaEmbeddings model you're using, it could help in diagnosing the issue.

For more details, you can refer to the LanceDB class in the LangChain repository: libs/community/langchain_community/vectorstores/lancedb.py

Please let me know if you need further assistance or if there's anything else I can help with.

Sources

libs/community/langchain_community/vectorstores/lancedb.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 reply

stealthier-ai Feb 16, 2024
Author

Does LangChain integrate other embedding options that allow me to set the vector size?

victor-enogwe · 2024-09-07T19:19:53Z

victor-enogwe
Sep 7, 2024

Hi @stealthier-ai

I hope this helps.

The dimension size property is set within the model.
For example, with ollama, you can view it for the mxbai-embed-large model with the show API. /api/show
prop key: 'bert.embedding_length'.

I typically pick an embedding model, find this configuration parameter, and then create a field and an index in my vector store with this value.

It might be configurable via the client ollama instance, but I don't see how at the moment.
You can try to modify the model instead. See https://github.com/ollama/ollama/blob/main/docs/modelfile.md

0 replies

@dosu · 2024-09-07T19:21:32Z

dosubot[bot]
bot Sep 7, 2024

I found a similar unsolved discussion that might be relevant to your issue: Chromadb: InvalidDimensionException: Embedding dimension 1024 does not match collection dimensionality 384 ^[1].

Regarding your specific issue, it seems like there might be a mismatch in the embedding dimensions between the documents and the query. Unfortunately, I couldn't find specific information on how to set or configure the embedding dimensions for the OllamaEmbeddings model in LangChain.

However, you can check the following points to troubleshoot the issue:

Ensure Consistency in Model Usage: Double-check that the same model and configuration are being used for both embedding the documents and the query.
Verify Embedding Dimensions: If possible, verify the dimensions of the embeddings generated by the OllamaEmbeddings model for both the documents and the query. This can help identify if there is an inconsistency.
Check LanceDB Configuration: Ensure that LanceDB is correctly configured to handle the embedding dimensions. LanceDB handles embedding dimensions through the embedding object provided during initialization, and it should be capable of generating embeddings for the given texts or images ^[2].

If the issue persists, you might want to look into the specific configuration or parameters of the OllamaEmbeddings class that could affect the embedding dimensions. Unfortunately, I couldn't find detailed information on this within the repository.

For more detailed documentation on OllamaEmbeddings features and configuration options, you can refer to the API reference ^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set Dimensions on Ollama Embeddings for Query #17613

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

About Dosu

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Set Dimensions on Ollama Embeddings for Query #17613

Uh oh!

stealthier-ai Feb 16, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

dosubot[bot] bot Feb 16, 2024

Sources

About Dosu

Uh oh!

stealthier-ai Feb 16, 2024 Author

Uh oh!

Uh oh!

victor-enogwe Sep 7, 2024

Uh oh!

dosubot[bot] bot Sep 7, 2024

stealthier-ai
Feb 16, 2024

Replies: 3 comments 1 reply

dosubot[bot]
bot Feb 16, 2024

stealthier-ai Feb 16, 2024
Author

victor-enogwe
Sep 7, 2024

dosubot[bot]
bot Sep 7, 2024