DocumentCompressorPipeline + BaseDocumentTransformer ---> store the embeddings for chunks before hand #25499

Aneeqasif · 2024-08-16T18:10:10Z

Aneeqasif
Aug 16, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

python ...



# Define the components for the compression pipeline
splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.66)

# Create the compression pipeline
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)


# Compress documents based on the query
compressed_docs = pipeline_compressor.compress_documents(
    top_1_retrieved_docs, query=query
)

pretty_print_docs(compressed_docs[:1])

Description

hi guyz. I am building a semantic search engine for small articles. i am using multivectorRetriever to retrieved the desired articles using the summary approach. Life is great till there.
Now i want to add a feature like google where google highlights with blue the main relevant chunk of the top result in it search results. I do not now what is the best way to do it with least latency. I avoided ContextualCompressionRetriever because it invokes the base retriever. and i don't want to invoke it again as i have already got the results from it. So I am using DocumentCompressorPipeline for it as given in my snippet and it is working but the issue is it is creating alot of latency because it splits --> transforms-->generate embedding--> compresses on the go. how can i reduce the latency. Probably the splitting and embedding creation part is taking alot of time how can i split ,embed and store it in seperate chroma db before hand so that in production when user inputs the query my script takes the top result articles , use the corresponding splits stored in Chroma and give me the most relevant chunk.

Or is there a totally different approach to do this task?
I am building it for my school project and i am really confused.
Please help me 🙏🙏🙏

System Info

pip freeze | grep langchain
langchain==0.2.12
langchain-chroma==0.1.2
langchain-community==0.2.11
langchain-core==0.2.29
langchain-experimental==0.0.64
langchain-openai==0.1.20
langchain-text-splitters==0.2.2

@Aneeqasif · 2024-08-16T18:14:10Z

dosubot[bot]
bot Aug 16, 2024

Hey @Aneeqasif! How can I assist you today? I'm here to help you with bugs, questions, and becoming a contributor.

To reduce latency by pre-processing and storing the embeddings in a separate Chroma DB, you can follow these steps:

Load the data using a DocumentLoader.
Split the data into smaller chunks using a TextSplitter.
Embed and store the chunks in a Chroma vector store.

Here's how you can achieve this:

Pre-processing and Storing Embeddings

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import QuantizedBiEncoderEmbeddings

# Step 1: Load the data
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

# Step 2: Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400, chunk_overlap=50, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

# Step 3: Embed and store the chunks in Chroma vector store
model_name = "Intel/bge-small-en-v1.5-rag-int8-static"
encode_kwargs = {"normalize_embeddings": True}
embeddings = QuantizedBiEncoderEmbeddings(model_name=model_name, encode_kwargs=encode_kwargs)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

Retrieving the Most Relevant Chunks in Production

In production, when a user inputs a query, you can retrieve the most relevant chunks as follows:

# Convert the vector store to a retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

# Retrieve the most relevant documents for the user query
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

# Print the most relevant chunk
print(retrieved_docs[0].page_content)

This setup ensures that the embeddings are pre-processed and stored in Chroma DB, allowing for efficient retrieval and reduced latency during production queries ^[1]^[2].

Alternative Approach

If you are using other document sources like Google Drive, you can use the PsychicLoader to load documents and follow similar steps to convert them into embeddings and store them in Chroma:

from langchain_community.document_loaders import PsychicLoader
from psychicapi import ConnectorId
from langchain_chains import RetrievalQAWithSourcesChain
from langchain_chroma import Chroma
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

# Load documents from Google Drive
google_drive_loader = PsychicLoader(
    api_key="your_api_key",
    connector_id=ConnectorId.gdrive.value,
    connection_id="google-test",
)
documents = google_drive_loader.load()

# Split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Convert the chunks to embeddings and store in Chroma
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
chain = RetrievalQAWithSourcesChain.from_chain_type(
    OpenAI(temperature=0), chain_type="stuff", retriever=docsearch.as_retriever()
)
chain({"question": "what is psychic?"}, return_only_outputs=True)

This approach allows you to handle various document sources and ensures that your embeddings are efficiently stored and retrievable from Chroma DB ^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

0 replies

Aneeqasif · 2024-08-16T18:45:52Z

Aneeqasif
Aug 16, 2024
Author

@dosu thanks but i donot want to get the chunks only i want to get the most relevant full articles as well as the most relevant chunk in the top relevant article

1 reply

dosubot[bot] bot Aug 16, 2024

To retrieve both the most relevant full articles and the most relevant chunk in the top relevant article with minimal latency, you can follow these steps:

Load and Pre-process Documents: Load your documents, split them into chunks, and store the embeddings in Chroma DB.
Retrieve Relevant Articles and Chunks: Use a retrieval chain to get the most relevant articles and chunks.

Here's how you can achieve this:

Pre-process and Store Embeddings

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import QuantizedBiEncoderEmbeddings

# Step 1: Load the data
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

# Step 2: Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400, chunk_overlap=50, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)

# Step 3: Embed and store the chunks in Chroma vector store
model_name = "Intel/bge-small-en-v1.5-rag-int8-static"
encode_kwargs = {"normalize_embeddings": True}
embeddings = QuantizedBiEncoderEmbeddings(model_name=model_name, encode_kwargs=encode_kwargs)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings)

Retrieve Relevant Articles and Chunks

In production, when a user inputs a query, you can retrieve the most relevant articles and chunks as follows:

from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryByteStore
import uuid

# Convert the vector store to a retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})

# Retrieve the most relevant documents for the user query
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")

# Print the most relevant full article
print("Most Relevant Full Article:")
print(retrieved_docs[0].page_content)

# Retrieve the most relevant chunk in the top relevant article
store = InMemoryByteStore()
id_key = "doc_id"
retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=store,
    id_key=id_key,
)

doc_ids = [str(uuid.uuid4()) for _ in all_splits]
for i, doc in enumerate(all_splits):
    doc.metadata[id_key] = doc_ids[i]

retriever.vectorstore.add_documents(all_splits)
retriever.docstore.mset(list(zip(doc_ids, docs)))

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_prompt = hub.pull("rlm/rag-prompt")
llm = LlamaCpp(
    model_path="llama-2-7b-chat.Q8_0.gguf",
    n_gpu_layers=-1,
    n_batch=512,
    n_ctx=2048,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

qa_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

result = qa_chain.invoke("What is Intel CCG revenue in Q1 2024?")
print("Most Relevant Chunk in the Top Relevant Article:")
print(result)

This setup ensures that the embeddings are pre-processed and stored in Chroma DB, allowing for efficient retrieval of both the most relevant full articles and the most relevant chunks with minimal latency ^[1]^[2]^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

Aneeqasif · 2024-08-16T18:57:06Z

Aneeqasif
Aug 16, 2024
Author

Is there any kind human to understand my request and guide me please...
I donot want to make a llm call to get the most relevant chunk but i want use the embedding filter

1 reply

dosubot[bot] bot Aug 16, 2024

This is a tough nut to crack, could really use your help on this, @eyurtsev.

To continue talking to Dosu, mention @dosu.

Aneeqasif · 2024-08-22T11:57:20Z

Aneeqasif
Aug 22, 2024
Author

anyone here???

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DocumentCompressorPipeline + BaseDocumentTransformer ---> store the embeddings for chunks before hand #25499

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DocumentCompressorPipeline + BaseDocumentTransformer ---> store the embeddings for chunks before hand #25499

Uh oh!

Aneeqasif Aug 16, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 4 comments · 2 replies

Uh oh!

dosubot[bot] bot Aug 16, 2024

Pre-processing and Storing Embeddings

Retrieving the Most Relevant Chunks in Production

Alternative Approach

Uh oh!

Aneeqasif Aug 16, 2024 Author

Uh oh!

dosubot[bot] bot Aug 16, 2024

Pre-process and Store Embeddings

Retrieve Relevant Articles and Chunks

Uh oh!

Aneeqasif Aug 16, 2024 Author

Uh oh!

dosubot[bot] bot Aug 16, 2024

Uh oh!

Aneeqasif Aug 22, 2024 Author

Aneeqasif
Aug 16, 2024

Replies: 4 comments 2 replies

dosubot[bot]
bot Aug 16, 2024

Aneeqasif
Aug 16, 2024
Author

Aneeqasif
Aug 16, 2024
Author

Aneeqasif
Aug 22, 2024
Author