Empty Nodes Returned by get_relevant_documents() in Vector Store for YouTube Transcripts #25354
Replies: 2 comments 3 replies
-
Hey @dhruv-scogo! 👋 I'm here to help you with bug fixes, answer questions, and guide you on becoming a contributor. To resolve the issue where the Here is a checklist to help you debug and resolve the issue:
Here is an example of how you might implement and test the class MyVectorStore(VectorStore):
def similarity_search(self, query: str, k: int = 4, **kwargs: Any) -> List[Document]:
# Implement the similarity search logic here
# For example, use a pre-trained model to get embeddings and perform a vector search
# Return a list of Document objects that are most similar to the query
pass
def similarity_search_with_relevance_scores(
self, query: str, k: int = 4, **kwargs: Any
) -> List[Tuple[Document, float]]:
# Implement the similarity search with relevance scores logic here
pass
def max_marginal_relevance_search(self, query: str, k: int = 4, **kwargs: Any) -> List[Document]:
# Implement the max marginal relevance search logic here
pass
Here is an example of how you might test the # Create an instance of your vector store
vector_store = MyVectorStore()
# Add documents to the vector store
documents = [Document(page_content="Sample YouTube transcript", metadata={})]
vector_store.add_documents(documents)
# Set the search type and search kwargs
search_type = "similarity"
search_kwargs = {"k": 4}
# Create a query
query = "Sample query text"
# Get relevant documents
relevant_docs = vector_store.get_relevant_documents(query, search_type=search_type, search_kwargs=search_kwargs)
# Print the relevant documents
print(relevant_docs) By following these steps, you should be able to identify and resolve the issue with the Additionally, ensure that the def search(self, query: str, search_type: str, **kwargs: Any) -> List[Document]:
"""Return docs most similar to query using a specified search type.
Args:
query: Input text
search_type: Type of search to perform. Can be "similarity",
"mmr", or "similarity_score_threshold".
**kwargs: Arguments to pass to the search method.
Returns:
List of Documents most similar to the query.
Raises:
ValueError: If search_type is not one of "similarity",
"mmr", or "similarity_score_threshold".
"""
if search_type == "similarity":
return self.similarity_search(query, **kwargs)
elif search_type == "similarity_score_threshold":
docs_and_similarities = self.similarity_search_with_relevance_scores(query, **kwargs)
return [doc for doc, _ in docs_and_similarities]
elif search_type == "mmr":
return self.max_marginal_relevance_search(query, **kwargs)
else:
raise ValueError(
f"search_type of {search_type} not allowed. Expected "
"search_type to be 'similarity', 'similarity_score_threshold' or 'mmr'."
) By ensuring that these methods are correctly implemented and tested, you should be able to resolve the issue with the |
Beta Was this translation helpful? Give feedback.
-
@dosu |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Vector Store Retrieval: I use the get_relevant_documents() method to retrieve documents related to a specific query:
Expected Behavior:
The get_relevant_documents() method should return a list of relevant nodes containing documents related to the user_query.
Actual Behavior:
The method returns an empty list of nodes, despite the documents being successfully loaded and indexed in the vector store.
Additional Context:
The transcripts are stored in .txt format, loaded from a directory, split into chunks, and indexed with metadata containing YouTube links.
The issue persists across multiple queries, and the documents are confirmed to be loaded into the vector store.
System Information
Package Information
Beta Was this translation helpful? Give feedback.
All reactions