Dealing with duplicate documents to create an AzureSearch with a vector store #8861

pietz · 2023-08-07T11:26:36Z

pietz
Aug 7, 2023

I want to create an Azure Cognitive Search application using a vector store for document embeddings plus additional meta data I can filter for. It's basically the second to last example on langchains Python Docs on Azure Search here.

There is one twist I have to deal with: A large number of meta data combinations will point to the same document. If I use langchain with AzureSearch providing an embedding function to create embeddings, I will create a very large number of duplicate embeddings because it will run through the same documents multiple times. Given that the OpenAI embedding API is not cheap for large databases, I really don't want that.

Any ideas how I can get around that?

nadworny · 2024-03-11T16:17:57Z

nadworny
Mar 11, 2024

hi @pietz Have you found a solution for that?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dealing with duplicate documents to create an AzureSearch with a vector store #8861

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Dealing with duplicate documents to create an AzureSearch with a vector store #8861

Uh oh!

pietz Aug 7, 2023

Replies: 1 comment

Uh oh!

Uh oh!

nadworny Mar 11, 2024

pietz
Aug 7, 2023

nadworny
Mar 11, 2024