Using AzureSearch with embeddings.embed_documents instead of embeddings.embed_query #28964
Replies: 2 comments 5 replies
-
To batch-process documents faster using def add_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
*,
keys: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
"""Add texts data to an existing index."""
if isinstance(self.embedding_function, Embeddings):
try:
embeddings = self.embedding_function.embed_documents(list(texts))
except NotImplementedError:
embeddings = [self.embedding_function.embed_query(x) for x in texts]
else:
embeddings = [self.embedding_function(x) for x in texts]
if len(embeddings) == 0:
logger.debug("Nothing to insert, skipping.")
return []
if (not keys) and ("ids" in kwargs) and (len(kwargs["ids"]) == len(embeddings)):
keys = kwargs["ids"]
return self.add_embeddings(zip(texts, embeddings), metadatas, keys=keys) Additionally, Langchain provides asynchronous methods such as async def aadd_texts(
self,
texts: Iterable[str],
metadatas: Optional[List[dict]] = None,
*,
keys: Optional[List[str]] = None,
**kwargs: Any,
) -> List[str]:
if isinstance(self.embedding_function, Embeddings):
try:
embeddings = await self.embedding_function.aembed_documents(list(texts))
except NotImplementedError:
embeddings = [
await self.embedding_function.aembed_query(x) for x in texts
]
else:
embeddings = [self.embedding_function(x) for x in texts]
if len(embeddings) == 0:
logger.debug("Nothing to insert, skipping.")
return []
if (not keys) and ("ids" in kwargs) and (len(kwargs["ids"]) == len(embeddings)):
keys = kwargs["ids"]
return await self.aadd_embeddings(zip(texts, embeddings), metadatas, keys=keys) Moreover, the from langchain_openai import AzureOpenAIEmbeddings
embeddings = AzureOpenAIEmbeddings(
model="text-embedding-3-large",
# dimensions: Optional[int] = None, # Can specify dimensions with new text-embedding-3 models
# azure_endpoint="https://<your-endpoint>.openai.azure.com/", If not provided, will read env variable AZURE_OPENAI_ENDPOINT
# api_key=... # Can provide an API key directly. If missing read env variable AZURE_OPENAI_API_KEY
# openai_api_version=..., # If not provided, will read env variable AZURE_OPENAI_API_VERSION
) By using these methods and adjusting the batch size, you can efficiently batch-process and index documents in AzureSearch using Langchain's embedding functionalities [1][2][3]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
Beta Was this translation helpful? Give feedback.
-
My question is how to I correctly pass aembed_documents to the AzureSearch instance? When I make my embeddings function to be
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have an Azure search vector store defined which uses AzureOpenAIEmbeddings to imbed incoming documents:
such that
batch
is a list of Document objects.then I use this script to index a my batches:
Instead of using
embed_query
I would like to use OpenAIEmbeddings'embed_documents
to embed my batch in lesser time. When I replaceembeddings.embed_query
withembeddings.embed_documents
I encounter the error:My ultimate goal is to batch-process my documents faster than I currently do. Can you help me incorporate
embed_documents
correctly? are there any other langchain functionalities I can use to speed up indexing for large documents?Beta Was this translation helpful? Give feedback.
All reactions