Retrieval from Azure Index (RAG Implementation) #25281

aayushrajj · 2024-08-11T15:21:28Z

aayushrajj
Aug 11, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

The structure of the Azure Index is:

{
id: "----", (string)
title: "title-of-the-document", (string)
chunk:"content-of-chunk", (string)
chunk_embedding:"chunk-embedding" (SingleCollection)
}

Description

Hi All,

I have been working on integrating Azure with OpenAI, specifically focusing on using Azure Index as a large-scale vector database.

My Use Case:

I have PDF documents as input, each with a size of 25 pages or more.
I divide each document into suitable chunks using LangChain.
Each chunk is then embedded using the OpenAI embedding model.
The embeddings for each chunk are uploaded to Azure AI Index.

Now, I have stored each chunk with its corresponding embedding in the index.

In the retrieval step:

I filter the chunks based on the document title.
To fetch the top k relevant chunks based on the user query, I have experimented with various search approaches, including Vector Search, Hybrid Search, etc. However, these methods are not yielding satisfactory top k chunks that can be effectively used as context knowledge.
I also tried using NumPy cosine similarity search, but the results were still not satisfactory.
Additionally, I used the Vector_store function from LangChain. While it provides decent results for single-field queries, the results for multi-field queries are unsatisfactory.

The large size of the documents prevents me from directly uploading them and their embeddings since it exceeds the context limit.

My Questions:

How can I use the content (chunks, embeddings) and the user query (content, embeddings) to retrieve a proper top k chunks that can then be used as context knowledge?
Given that I have large PDF documents and want to retrieve varied information based on user queries, how can I effectively use Azure Index for my RAG implementation?

Thank you!

System Info

Windows
langchain=0.1.20
langchain-community=0.0.38
langchain-core=0.1.52
langchain-experimental=0.0.58
langchain-openai=0.1.6
langchain-text-splitters=0.0.2
azure-identity=1.17.1
azure-search-documents=11.4.0
openai

@dosu · 2024-08-11T15:25:08Z

dosubot[bot]
bot Aug 11, 2024

To retrieve the top k chunks that can be used as context knowledge for your RAG implementation using Azure Index, you can follow these steps:

Load and Split the Document into Chunks:
Use Azure AI Document Intelligence to load the document and split it into semantic chunks based on markdown headers.
Embed and Index the Chunks:
Embed the split documents and insert them into the Azure Search vector store.
Retrieve Relevant Chunks Based on the User Query:
Use the retriever to get the relevant chunks based on the user query.

Here is the relevant code:

# Load a document and split it into semantic chunks
loader = AzureAIDocumentIntelligenceLoader(
    file_path="<path to your file>",
    api_key=doc_intelligence_key,
    api_endpoint=doc_intelligence_endpoint,
    api_model="prebuilt-layout",
)
docs = loader.load()

# Split the document into chunks based on markdown headers
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
text_splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)

docs_string = docs[0].page_content
splits = text_splitter.split_text(docs_string)

print("Length of splits: " + str(len(splits)))

# Embed the split documents and insert into Azure Search vector store
aoai_embeddings = AzureOpenAIEmbeddings(
    azure_deployment="<Azure OpenAI embeddings model>",
    openai_api_version="<Azure OpenAI API version>",  # e.g., "2023-07-01-preview"
)

vector_store_address: str = os.getenv("AZURE_SEARCH_ENDPOINT")
vector_store_password: str = os.getenv("AZURE_SEARCH_ADMIN_KEY")

index_name: str = "<your index name>"
vector_store: AzureSearch = AzureSearch(
    azure_search_endpoint=vector_store_address,
    azure_search_key=vector_store_password,
    index_name=index_name,
    embedding_function=aoai_embeddings.embed_query,
)

vector_store.add_documents(documents=splits)

# Retrieve relevant chunks based on the question
retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 3})

retrieved_docs = retriever.invoke("<your question>")

print(retrieved_docs[0].page_content)

This code demonstrates how to load a document, split it into chunks, embed and index those chunks, and finally retrieve the top k relevant chunks based on a user query ^[1].

Additionally, you can enhance your retrieval process by incorporating query analysis techniques. This involves defining a query schema and using a function-calling model to convert a user question into structured queries, which can improve the relevance of the retrieved chunks. For example:

from typing import Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Search(BaseModel):
    query: str = Field(..., description="Similarity search query applied to video transcripts.")
    publish_year: Optional[int] = Field(None, description="Year video was published")

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

def retrieval(search: Search) -> List[Document]:
    if search.publish_year is not None:
        _filter = {"publish_year": {"$eq": search.publish_year}}
    else:
        _filter = None
    return vectorstore.similarity_search(search.query, filter=_filter)

retrieval_chain = query_analyzer | retrieval

results = retrieval_chain.invoke("RAG tutorial published in 2023")

This approach ensures that your retrieval process is more robust and can handle specific query requirements effectively ^[2].

By leveraging these methods, you can efficiently manage the retrieval process and ensure that the most relevant chunks are used as context knowledge for your RAG implementation.

To continue talking to Dosu, mention @dosu.

4 replies

aayushrajj Aug 11, 2024
Author

okay.
If I have all the chunks embeddings and query embedding fetched in variables, is there any way I can use langchain functions to so any standard search to retrieve relevant context chunks?

dosubot[bot] bot Aug 11, 2024

Yes, you can use LangChain functions to perform a standard search to retrieve relevant context chunks given that you have all the chunk embeddings and the query embedding fetched in variables. You can use the Chroma vectorstore and its retriever functionality to achieve this. Here is an example:

from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Assuming `all_splits` contains your document chunks and `query_embedding` is your query embedding
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

# Create a retriever from the vectorstore
retriever = vectorstore.as_retriever(k=4)

# Use the retriever to fetch relevant context chunks
docs = retriever.invoke("Your query here")

# `docs` will contain the relevant context chunks
print(docs)

Alternatively, you can use the Qdrant vectorstore for a similar purpose. Here is an example:

from qdrant_client import QdrantClient
from langchain.vectorstores import Qdrant
from langchain.embeddings import OpenAIEmbeddings

# Initialize Qdrant client and collection
client = QdrantClient()
collection_name = "MyCollection"
embeddings = OpenAIEmbeddings()

# Create Qdrant vectorstore
qdrant = Qdrant(client, collection_name, embeddings)

# Add your document chunks to the vectorstore
qdrant.add_texts(texts=all_splits)

# Perform a similarity search with the query embedding
docs = qdrant.similarity_search_by_vector(query_embedding, k=4)

# `docs` will contain the relevant context chunks
print(docs)

Additionally, you can use the VespaStore vectorstore for a similar purpose. Here is an example:

from langchain_community.vectorstores import VespaStore
from langchain.embeddings import OpenAIEmbeddings
from vespa.application import Vespa

# Create a Vespa client dependent upon your application
vespa_app = Vespa(url="...", port=..., application_package=...)

# Configure VespaStore
vespa_config = dict(
    page_content_field="text",
    embedding_field="embedding",
    input_field="query_embedding",
    metadata_fields=["date", "rating", "author"]
)

embedding_function = OpenAIEmbeddings()
vectorstore = VespaStore(vespa_app, embedding_function, **vespa_config)

# Add your document chunks to the vectorstore
vectorstore.add_texts(texts=all_splits)

# Perform a similarity search with the query embedding
docs = vectorstore.similarity_search_by_vector(query_embedding, k=4)

# `docs` will contain the relevant context chunks
print(docs)

All three methods set up a retriever using the respective vectorstore and retrieve the relevant context chunks based on the query ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

aayushrajj Aug 11, 2024
Author

@dosu
If I fetch all the chunk contents and their corresponding embeddings fromazure index into a variable, after that is there any way to use the fetched contents with thier embeddings and user query with its embedding by langchain to do search and fetch relevant context? We do not want to use any other vectorstore like chroma etc.

dosubot[bot] bot Aug 11, 2024

Yes, you can use LangChain to perform a search and fetch relevant context using the fetched contents with their embeddings and a user query with its embedding without using any other vector store like Chroma, Qdrant, or Vespa. You can achieve this by using the Embeddings class to generate embeddings for both the documents and the query, and then manually compute the similarity between them.

Here is an example of how you can do this:

from langchain.embeddings import OpenAIEmbeddings
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Initialize the embeddings model
embeddings_model = OpenAIEmbeddings()

# Example documents and their embeddings
documents = [
    "Document 1 content",
    "Document 2 content",
    "Document 3 content"
]
document_embeddings = embeddings_model.embed_documents(documents)

# Example user query and its embedding
query = "User query content"
query_embedding = embeddings_model.embed_query(query)

# Compute cosine similarity between the query embedding and document embeddings
similarities = cosine_similarity([query_embedding], document_embeddings)

# Get the index of the most similar document
most_similar_doc_index = np.argmax(similarities)

# Fetch the most relevant document
most_relevant_document = documents[most_similar_doc_index]

print("Most relevant document:", most_relevant_document)

In this example:

Initialize the OpenAIEmbeddings model.
Embed the documents and the user query.
Compute the cosine similarity between the query embedding and the document embeddings.
Identify the most similar document based on the highest similarity score.
Fetch and print the most relevant document.

This approach allows you to perform the search and retrieval process without relying on any external vector store ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retrieval from Azure Index (RAG Implementation) #25281

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Retrieval from Azure Index (RAG Implementation) #25281

Uh oh!

Uh oh!

aayushrajj Aug 11, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 4 replies

Uh oh!

dosubot[bot] bot Aug 11, 2024

Uh oh!

aayushrajj Aug 11, 2024 Author

Uh oh!

dosubot[bot] bot Aug 11, 2024

Uh oh!

aayushrajj Aug 11, 2024 Author

Uh oh!

dosubot[bot] bot Aug 11, 2024

aayushrajj
Aug 11, 2024

Replies: 1 comment 4 replies

dosubot[bot]
bot Aug 11, 2024

aayushrajj Aug 11, 2024
Author

aayushrajj Aug 11, 2024
Author