Replies: 4 comments 2 replies
-
Hey @Aneeqasif! How can I assist you today? I'm here to help you with bugs, questions, and becoming a contributor. To reduce latency by pre-processing and storing the embeddings in a separate Chroma DB, you can follow these steps:
Here's how you can achieve this: Pre-processing and Storing Embeddingsimport bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_community.embeddings import QuantizedBiEncoderEmbeddings
# Step 1: Load the data
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()
# Step 2: Split the data into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=400, chunk_overlap=50, add_start_index=True
)
all_splits = text_splitter.split_documents(docs)
# Step 3: Embed and store the chunks in Chroma vector store
model_name = "Intel/bge-small-en-v1.5-rag-int8-static"
encode_kwargs = {"normalize_embeddings": True}
embeddings = QuantizedBiEncoderEmbeddings(model_name=model_name, encode_kwargs=encode_kwargs)
vectorstore = Chroma.from_documents(documents=all_splits, embedding=embeddings) Retrieving the Most Relevant Chunks in ProductionIn production, when a user inputs a query, you can retrieve the most relevant chunks as follows: # Convert the vector store to a retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
# Retrieve the most relevant documents for the user query
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
# Print the most relevant chunk
print(retrieved_docs[0].page_content) This setup ensures that the embeddings are pre-processed and stored in Chroma DB, allowing for efficient retrieval and reduced latency during production queries [1][2]. Alternative ApproachIf you are using other document sources like Google Drive, you can use the from langchain_community.document_loaders import PsychicLoader
from psychicapi import ConnectorId
from langchain_chains import RetrievalQAWithSourcesChain
from langchain_chroma import Chroma
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
# Load documents from Google Drive
google_drive_loader = PsychicLoader(
api_key="your_api_key",
connector_id=ConnectorId.gdrive.value,
connection_id="google-test",
)
documents = google_drive_loader.load()
# Split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# Convert the chunks to embeddings and store in Chroma
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings)
chain = RetrievalQAWithSourcesChain.from_chain_type(
OpenAI(temperature=0), chain_type="stuff", retriever=docsearch.as_retriever()
)
chain({"question": "what is psychic?"}, return_only_outputs=True) This approach allows you to handle various document sources and ensures that your embeddings are efficiently stored and retrievable from Chroma DB [3][4][5]. |
Beta Was this translation helpful? Give feedback.
-
@dosu thanks but i donot want to get the chunks only i want to get the most relevant full articles as well as the most relevant chunk in the top relevant article |
Beta Was this translation helpful? Give feedback.
-
Is there any kind human to understand my request and guide me please... |
Beta Was this translation helpful? Give feedback.
-
anyone here??? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
hi guyz. I am building a semantic search engine for small articles. i am using multivectorRetriever to retrieved the desired articles using the summary approach. Life is great till there.
Now i want to add a feature like google where google highlights with blue the main relevant chunk of the top result in it search results. I do not now what is the best way to do it with least latency. I avoided
ContextualCompressionRetriever
because it invokes the base retriever. and i don't want to invoke it again as i have already got the results from it. So I am usingDocumentCompressorPipeline
for it as given in my snippet and it is working but the issue is it is creating alot of latency because itsplits --> transforms-->generate embedding--> compresses
on the go. how can i reduce the latency. Probably the splitting and embedding creation part is taking alot of time how can i split ,embed and store it in seperate chroma db before hand so that in production when user inputs the query my script takes the top result articles , use the corresponding splits stored in Chroma and give me the most relevant chunk.Or is there a totally different approach to do this task?
I am building it for my school project and i am really confused.
Please help me 🙏🙏🙏
System Info
Beta Was this translation helpful? Give feedback.
All reactions