how to get topk retrievals for ParentDocumentRetriever using Chroma? #17582

nithinreddyyyyyy · 2024-02-15T16:35:51Z

nithinreddyyyyyy
Feb 15, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

%%time
# query = 'how many are injured and dead in christchurch Mosque?'

# Define your prompt template
prompt_template = """Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else. If no context, then no answer.
Helpful Answer:"""

child_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
parent_splitter = RecursiveCharacterTextSplitter(chunk_size=1200, chunk_overlap=300)

vectorstore = Chroma(
  collection_name="full_documents", embedding_function=embeddings)

store = InMemoryStore()

retriever = ParentDocumentRetriever(
  vectorstore=vectorstore,
  docstore=store,
  child_splitter=child_splitter,
  parent_splitter=parent_splitter
)

retriever.add_documents(documents, ids=None)

retriever.invoke("data related to cricket")

Description

in the above code, if you see i wrote a code where ParentDocument retriever will invoke. And it is returning only 1 document. How to get topk documents?

System Info

google colab

@nithinreddyyyyyy · 2024-02-15T16:36:09Z

dosubot[bot]
bot Feb 15, 2024

🤖

Hey @nithinreddyyyyyy! Great to see you diving into another intriguing aspect of LangChain. How's everything going on your end?

Based on the code you've provided, it seems like you're using the invoke method of the ParentDocumentRetriever class to retrieve a single document. However, the ParentDocumentRetriever class doesn't have a built-in way to return the top k documents.

To achieve this, you can modify your code to use the similarity_search or similarity_search_by_vector methods of the Chroma class, which can return the top k documents most similar to a given query or embedding vector. Here's an example of how you can use these methods:

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

# Initialize the embeddings and vectorstore
embeddings = OpenAIEmbeddings()
vectorstore = Chroma("full_documents", embeddings)

# Run a similarity search with a query
query = "data related to cricket"
k = 5  # Number of documents to return
documents = vectorstore.similarity_search(query, k)

# Or run a similarity search with an embedding vector
embedding = [0.1, 0.2, 0.3, 0.4, 0.5]  # Your embedding vector here
documents = vectorstore.similarity_search_by_vector(embedding, k)

In this example, the similarity_search and similarity_search_by_vector methods return the top k documents most similar to the given query or embedding vector. The number of documents to return is specified by the k parameter.

Please note that this approach will return the top k documents based on the similarity to the query or embedding vector, not based on the parent-child document structure used by the ParentDocumentRetriever class. If you need to maintain the parent-child document structure, you may need to implement additional logic to retrieve the parent documents of the top k child documents.

You can find more information about the Chroma class and its methods in the Chroma source code, and about the ParentDocumentRetriever class in the ParentDocumentRetriever source code.

I hope this helps! If you have any further questions, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

1 reply

nithinreddyyyyyy Feb 15, 2024
Author

in this code, you're not even using ParentDocumentRetriever. I want to retrieve the documents using ParentDocumentRetriever

schuemie · 2024-06-19T07:38:55Z

schuemie
Jun 19, 2024

It agree would be nice if the ParentDocumentRetriever's invoke() function would allow similar functionality as similarity_search(), such as specifying top k, as well as any filters. Right now, the only way I found that achieves that bypasses invoke() altogether:

# First search vector store:
query = "data related to cricket"
k = 5
documents = vectorstore.similarity_search(query, k)

# Then retrieve parent documents:
keys = [doc.metadata["doc_id"] for doc in documents]
documents = store.mget(keys)

It seems ParentDocumentRetriever should handle this? Am I missing something? If I understand this old documentation correctly, I can switch out the child retriever and specify k there, but that still doesn't allow me to set it at search time, or add any filters.

0 replies

jedick · 2025-07-11T01:40:25Z

jedick
Jul 11, 2025

ParentDocumentRetriever has a search_kwargs parameter that is used to pass keyword arguments to the search function.

retriever = ParentDocumentRetriever(
  vectorstore=vectorstore,
  docstore=store,
  child_splitter=child_splitter,
  parent_splitter=parent_splitter,
  # Get top 5 documents
  search_kwargs={"k": 5},
)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to get topk retrievals for ParentDocumentRetriever using Chroma? #17582

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

About Dosu

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

how to get topk retrievals for ParentDocumentRetriever using Chroma? #17582

Uh oh!

nithinreddyyyyyy Feb 15, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

dosubot[bot] bot Feb 15, 2024

Sources

About Dosu

Uh oh!

nithinreddyyyyyy Feb 15, 2024 Author

Uh oh!

schuemie Jun 19, 2024

Uh oh!

jedick Jul 11, 2025

nithinreddyyyyyy
Feb 15, 2024

Replies: 3 comments 1 reply

dosubot[bot]
bot Feb 15, 2024

nithinreddyyyyyy Feb 15, 2024
Author

schuemie
Jun 19, 2024

jedick
Jul 11, 2025