Replies: 4 comments 3 replies
-
Yes, it is possible to apply the concept of To implement this, you can create a custom retriever that processes DataFrame queries similarly to how
Here is a sample implementation: from typing import List
from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
import pandas as pd
# Output parser to split the LLM result into a list of queries
class LineListOutputParser(BaseOutputParser[List[str]]):
def parse(self, text: str) -> List[str]:
lines = text.strip().split("\n")
return list(filter(None, lines)) # Remove empty lines
output_parser = LineListOutputParser()
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant data from a pandas DataFrame.
Provide these alternative questions separated by newlines.
Original question: {question}""",
)
llm = ChatOpenAI(temperature=0)
# Chain
llm_chain = QUERY_PROMPT | llm | output_parser
# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [24, 30, 22, 35],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
# Custom retriever for DataFrame
class DataFrameRetriever:
def __init__(self, df: pd.DataFrame):
self.df = df
def retrieve(self, query: str) -> pd.DataFrame:
# Implement your query execution logic here
# For simplicity, let's assume the query is a column name
if query in self.df.columns:
return self.df[[query]]
else:
return pd.DataFrame()
# MultiQueryRetriever adapted for DataFrame
class MultiQueryDataFrameRetriever(MultiQueryRetriever):
def __init__(self, retriever, llm_chain, include_original=False):
super().__init__(retriever, llm_chain, include_original)
def retrieve_documents(self, queries: List[str]) -> pd.DataFrame:
results = [self.retriever.retrieve(query) for query in queries]
return pd.concat(results).drop_duplicates()
# Instantiate the custom retriever
df_retriever = DataFrameRetriever(df)
# Instantiate the MultiQueryDataFrameRetriever
multi_query_retriever = MultiQueryDataFrameRetriever(
retriever=df_retriever, llm_chain=llm_chain, include_original=True
)
# Example usage
question = "What are the ages of the people?"
queries = multi_query_retriever.generate_queries(question)
result_df = multi_query_retriever.retrieve_documents(queries)
print(result_df) This code demonstrates how to set up a custom retriever for a pandas DataFrame and use it with the |
Beta Was this translation helpful? Give feedback.
-
@dosu can you hlep on this error? I am getting error on this statement...
|
Beta Was this translation helpful? Give feedback.
-
@dosu the main problem is ... with pandas dataframe. I am getting this error with df_retriever from the code:
|
Beta Was this translation helpful? Give feedback.
-
@how I got queries like below... How I refine below
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
The concept I am interested in: Web link:
https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/MultiQueryRetriever/#supplying-your-own-prompt
I like the concept of MultiQueryRetriever; however, most of my work involves structured data and utilizes the pandas DataFrame agent concept. My question is: is it possible to apply a similar concept to a DataFrame instead of a vector database? Thank you for your help.
System Info
Name: langchain
Version: 0.2.16
Beta Was this translation helpful? Give feedback.
All reactions