Prompt Clean up before it passing to query_str? #25942

mraguth · 2024-09-02T22:55:15Z

mraguth
Sep 2, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_experimental.agents import create_pandas_dataframe_agent
import os
import pandas as pd
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="openai-tools",
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_code=True,
    allow_dangerous_requests=True,
)

agent.invoke({"input":query_str})

Description

Are there any good packages, such as DsPy or native LangChain packages, that can help tune a prompt before passing it into query_str in the example below?

How can I narrate my output in natural language after the agent provides its output? My objective is to extract data from the agent and pass it to another agent that can narrate the results in human-readable language, similar to how llama_index handles response synthesis. Please help.

System Info

Name: langchain
Version: 0.2.15

Answered by dosubot[bot]

Sep 3, 2024

To create embeddings for multiple columns (e.g., 'country_names' and 'city_names') in a DataFrame, incorporate those embeddings into a vector database, and call them for a query, you can follow these steps:

Collect Unique String Values from Multiple Columns:
Create Embeddings for These Values:
Update the Vector Store:

Here is the updated code:

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
f…

View full answer

@dosu · 2024-09-02T22:56:47Z

dosubot[bot]
bot Sep 2, 2024

Yes, there are good packages such as LangChain that can help tune a prompt before passing it into query_str. You can add examples to the prompt to guide the language model in generating more accurate queries. Here is an example of how you can do this using LangChain:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Search(BaseModel):
    query: str = Field(
        ...,
        description="Primary similarity search query applied to video transcripts.",
    )
    sub_queries: List[str] = Field(
        default_factory=list, description="List of sub-questions derived from the main query."
    )
    publish_year: Optional[int] = Field(None, description="Year video was published")

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

# Adding examples to the prompt
examples = []

question = "What's chat langchain, is it a langchain template?"
query = Search(
    query="What is chat langchain and is it a langchain template?",
    sub_queries=["What is chat langchain", "What is a langchain template"],
)
examples.append({"input": question, "tool_calls": [query]})

question = "How to build multi-agent system and stream intermediate steps from it"
query = Search(
    query="How to build multi-agent system and stream intermediate steps from it",
    sub_queries=[
        "How to build multi-agent system",
        "How to stream intermediate steps from multi-agent system",
        "How to stream intermediate steps",
    ],
)
examples.append({"input": question, "tool_calls": [query]})

question = "LangChain agents vs LangGraph?"
query = Search(
    query="What's the difference between LangChain agents and LangGraph? How do you deploy them?",
    sub_queries=[
        "What are LangChain agents",
        "What is LangGraph",
        "How do you deploy LangChain agents",
        "How do you deploy LangGraph",
    ],
)
examples.append({"input": question, "tool_calls": [query]})

# Helper function to convert examples to messages
import uuid
from typing import Dict
from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    ToolMessage,
)

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "You have correctly called this tool."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# Using the query analyzer with examples
result = query_analyzer_with_examples.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

To narrate your output in natural language after the agent provides its output, you can simply format the result into a human-readable string. Here is an example:

def narrate_output(result):
    narration = f"The main query is: {result.query}\n"
    if result.sub_queries:
        narration += "The sub-queries are:\n"
        for sub_query in result.sub_queries:
            narration += f"- {sub_query}\n"
    if result.publish_year:
        narration += f"The video was published in the year: {result.publish_year}\n"
    return narration

output_narration = narrate_output(result)
print(output_narration)

This will provide a natural language narration of the output generated by the agent ^[1].

To continue talking to Dosu, mention @dosu.

0 replies

mraguth · 2024-09-02T23:55:50Z

mraguth
Sep 2, 2024
Author

@dosu.... this is great. One challenge I frequently encounter in my program is case sensitivity discrepancies between the attribute names in my pandas DataFrame and the way they are queried. For example, if my DataFrame column 'country' records 'USA', but the query uses 'US' or 'usa', the program returns "no data found."

Is there a way to check the unique values in the DataFrame before writing these query snippets to assist in the query tuning process? I could convert all DataFrame values to lowercase and adjust the queries accordingly, but my problem is more complex. Sometimes, users input 'US' and expect feedback on 'USA'.

Is there any strategy to mitigate this issue?

7 replies

dosubot[bot] bot Sep 3, 2024

Yes, you can combine query_analyzer_with_examples, query_analyzer_select, and output_narration into a single pipeline before passing the final query to agent.invoke({"input": query_str}). Here is an example of how you can achieve this:

Define the Query Analyzers:
- query_analyzer_with_examples for handling complex query analysis with examples.
- query_analyzer_select for handling case insensitivity and partial matches.
Combine the Query Analyzers:
- Use a custom chain to integrate both analyzers and handle the query transformation.
Narrate the Output:
- Format the output into a human-readable string.
Invoke the Agent:
- Pass the final query to the agent.

Here is the complete code:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
import pandas as pd
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage

# Define the Search model
class Search(BaseModel):
    query: str = Field(..., description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(None, description="Year video was published")

# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

# Adding examples to the prompt
examples = [
    {"input": "What's chat langchain, is it a langchain template?", "tool_calls": [Search(query="What is chat langchain and is it a langchain template?", sub_queries=["What is chat langchain", "What is a langchain template"])]},
    {"input": "How to build multi-agent system and stream intermediate steps from it", "tool_calls": [Search(query="How to build multi-agent system and stream intermediate steps from it", sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"])]},
    {"input": "LangChain agents vs LangGraph?", "tool_calls": [Search(query="What's the difference between LangChain agents and LangGraph? How do you deploy them?", sub_queries=["What are LangChain agents", "What is LangGraph", "How do you deploy LangChain agents", "How do you deploy LangGraph"])]}
]

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = [{"id": str(uuid.uuid4()), "type": "function", "function": {"name": tool_call.__class__.__name__, "arguments": tool_call.json()}} for tool_call in example["tool_calls"]]
    messages.append(AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls}))
    tool_outputs = example.get("tool_outputs") or ["You have correctly called this tool."] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# Define the retriever for partial and fuzzy matching
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(["USA", "Canada", "Mexico"], embeddings, collection_name="country_names")

def select_names(question):
    _docs = vectorstore.similarity_search(question, k=10)
    _names = [d.page_content for d in _docs]
    return ", ".join(_names)

create_prompt = {
    "question": RunnablePassthrough(),
    "countries": select_names,
} | prompt

query_analyzer_select = create_prompt | structured_llm

# Combine the query analyzers and handle the query transformation
@chain
def combined_query_analyzer(question):
    response_with_examples = query_analyzer_with_examples.invoke(question)
    response_select = query_analyzer_select.invoke(question)
    return response_with_examples if response_with_examples else response_select

# Narrate the output
def narrate_output(result):
    narration = f"The main query is: {result.query}\n"
    if result.sub_queries:
        narration += "The sub-queries are:\n"
        for sub_query in result.sub_queries:
            narration += f"- {sub_query}\n"
    if result.publish_year:
        narration += f"The video was published in the year: {result.publish_year}\n"
    return narration

# Example DataFrame and agent setup
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="openai-tools",
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_code=True,
)

# Final query invocation
query_str = "What are the records for US?"
analyzed_query = combined_query_analyzer.invoke(query_str)
output_narration = narrate_output(analyzed_query)
print(output_narration)

# Pass the final query to the agent
result = agent.invoke({"input": analyzed_query.query})
print(result)

This code combines the query analyzers, formats the output into a human-readable string, and then passes the final query to the agent for execution.

To continue talking to Dosu, mention @dosu.

mraguth Sep 3, 2024
Author

@dosu I am close. However, I would like to perform partial and fuzzy matching for all unique data labels in my dataset. For example, I want to collect all string columns from my pandas DataFrame, retrieve their unique values, and build embeddings, rather than focusing solely on 'country_names'.

Can you help me update the code? Here is the code you provided:

# Define the retriever for partial and fuzzy matching
embeddings = embeddings
vectorstore = Chroma.from_texts(
    ["USA", "Canada", "Mexico"], embeddings, collection_name="country_names")

dosubot[bot] bot Sep 3, 2024

To perform partial and fuzzy matching for all unique string data labels in your pandas DataFrame, you can update the code to collect all unique string values from all string columns, create embeddings for these values, and then use these embeddings for similarity search. Here is the updated code:

Collect Unique String Values from All String Columns:
Create Embeddings for These Values:
Update the Vector Store:

Here is the complete code:

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage

# Define the Search model
class Search(BaseModel):
    query: str = Field(..., description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(None, description="Year video was published")

# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

# Adding examples to the prompt
examples = [
    {"input": "What's chat langchain, is it a langchain template?", "tool_calls": [Search(query="What is chat langchain and is it a langchain template?", sub_queries=["What is chat langchain", "What is a langchain template"])]},
    {"input": "How to build multi-agent system and stream intermediate steps from it", "tool_calls": [Search(query="How to build multi-agent system and stream intermediate steps from it", sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"])]},
    {"input": "LangChain agents vs LangGraph?", "tool_calls": [Search(query="What's the difference between LangChain agents and LangGraph? How do you deploy them?", sub_queries=["What are LangChain agents", "What is LangGraph", "How do you deploy LangChain agents", "How do you deploy LangGraph"])]}
]

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = [{"id": str(uuid.uuid4()), "type": "function", "function": {"name": tool_call.__class__.__name__, "arguments": tool_call.json()}} for tool_call in example["tool_calls"]]
    messages.append(AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls}))
    tool_outputs = example.get("tool_outputs") or ["You have correctly called this tool."] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# Collect unique string values from all string columns in the DataFrame
def get_unique_string_values(df: pd.DataFrame) -> List[str]:
    unique_values = set()
    for column in df.select_dtypes(include=['object']).columns:
        unique_values.update(df[column].unique())
    return list(unique_values)

# Example DataFrame
df = pd.DataFrame({
    'country_names': ['Kaenu Reves', 'John F. Kennedy'],
    'city_names': ['New Yrok', 'Los Angles']
})

# Get unique string values
unique_values = get_unique_string_values(df)

# Create embeddings for the unique string values
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(unique_values, embeddings, collection_name="unique_values")

def select_names(question):
    _docs = vectorstore.similarity_search(question, k=10)
    _names = [d.page_content for d in _docs]
    return ", ".join(_names)

create_prompt = {
    "question": RunnablePassthrough(),
    "countries": select_names,
} | prompt

query_analyzer_select = create_prompt | structured_llm

# Combine the query analyzers and handle the query transformation
@chain
def combined_query_analyzer(question):
    response_with_examples = query_analyzer_with_examples.invoke(question)
    response_select = query_analyzer_select.invoke(question)
    return response_with_examples if response_with_examples else response_select

# Narrate the output
def narrate_output(result):
    narration = f"The main query is: {result.query}\n"
    if result.sub_queries:
        narration += "The sub-queries are:\n"
        for sub_query in result.sub_queries:
            narration += f"- {sub_query}\n"
    if result.publish_year:
        narration += f"The video was published in the year: {result.publish_year}\n"
    return narration

# Example DataFrame and agent setup
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="openai-tools",
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_code=True,
)

# Final query invocation
query_str = "What are the records for US?"
analyzed_query = combined_query_analyzer.invoke(query_str)
output_narration = narrate_output(analyzed_query)
print(output_narration)

# Pass the final query to the agent
result = agent.invoke({"input": analyzed_query.query})
print(result)

This code will collect all unique string values from all string columns in your DataFrame, create embeddings for these values, and use these embeddings for partial and fuzzy matching. The combined query analyzer will handle the query transformation, and the output will be narrated in a human-readable format before being passed to the agent for execution.

To continue talking to Dosu, mention @dosu.

mraguth Sep 3, 2024
Author

@dosu for my query I am getting right pandas code. However, python repl is not able to execute the code due to this error. Please help.

> Entering new AgentExecutor chain...
Could not parse tool input: {'arguments': 'query', 'name': 'python_repl_ast'} because the `arguments` is not valid JSON.Invalid or incomplete response

dosubot[bot] bot Sep 3, 2024

To resolve the error "Could not parse tool input: {'arguments': 'query', 'name': 'python_repl_ast'} because the arguments is not valid JSON.Invalid or incomplete response," you need to ensure that the arguments field is a valid JSON object. The arguments should be a JSON object with a key query containing the code snippet to run.

Here is an example of how to format the input correctly:

{
  "arguments": {
    "query": "print('Hello, World!')"
  },
  "name": "python_repl_ast"
}

This JSON object correctly specifies the arguments with a query key, which is required by the PythonAstREPLTool class.

Additionally, ensure that your Python environment meets the requirements for running the PythonAstREPLTool. According to the LangChain framework, the PythonAstREPLTool requires Python version 3.9 or higher. If your environment does not meet this requirement, you may encounter issues when running the tool.

Moreover, the PythonREPL class in the LangChain framework includes a method sanitize_input to clean the input query by removing unnecessary whitespace, backticks, and any mistaken "python" console commands. This sanitization ensures that the input is properly formatted before execution. Here is a snippet of the relevant code:

@staticmethod
def sanitize_input(query: str) -> str:
    """Sanitize input to the python REPL.

    Remove whitespace, backtick & python
    (if llm mistakes python console as terminal)

    Args:
        query: The query to sanitize

    Returns:
        str: The sanitized query
    """
    query = re.sub(r"^(\s|`)*(?i:python)?\s*", "", query)
    query = re.sub(r"(\s|`)*$", "", query)
    return query

Make sure your input query is sanitized if necessary to avoid any parsing issues ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

mraguth · 2024-09-03T02:48:34Z

mraguth
Sep 3, 2024
Author

@dosu Can you help me in injecting above function in my final query please? I am new to Python as well as langchain...

analyzed_query = combined_query_analyzer.invoke(query_str)
output_narration = narrate_output(analyzed_query)
print(output_narration)

# Pass the final query to the agent
result = agent.invoke({"input": analyzed_query.query})
print(result)

5 replies

mraguth Sep 3, 2024
Author

@dosu I hope this is last error. The program is producing this...

> Entering new AgentExecutor chain...
Could not parse tool input: {'arguments': 'query="""df[(df[\'CaseRecordType\'] == \'missing information\') & (df[\'Methodofintake\'] == \'phone\')].groupby(\'Year_QTR\')[\'TAT_in_Hours\'].mean()"""', 'name': 'python_repl_ast'} because the `arguments` is not valid JSON.Invalid or incomplete response
Invoking: `python_repl_ast` with `{'query': "df[(df['CaseRecordType'] == 'missing information') & (df['Methodofintake'] == 'phone')].groupby('Year_QTR')['TAT_in_Hours'].mean()"}`

dosubot[bot] bot Sep 3, 2024

The error you're encountering is due to the arguments field not being properly formatted as a valid JSON object. Specifically, the query string needs to be properly escaped to be valid JSON.

To fix this, you can use the _custom_parser function from LangChain to preprocess your JSON string before passing it to the python_repl_ast. Here is how you can update your code:

Import the _custom_parser function:
Preprocess the JSON string:
Pass the sanitized and preprocessed query to the agent:

Here is the updated code:

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage
import re
from langchain_core.utils.json import _custom_parser, parse_partial_json

# Define the Search model
class Search(BaseModel):
    query: str = Field(..., description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(None, description="Year video was published")

# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

# Adding examples to the prompt
examples = [
    {"input": "What's chat langchain, is it a langchain template?", "tool_calls": [Search(query="What is chat langchain and is it a langchain template?", sub_queries=["What is chat langchain", "What is a langchain template"])]},
    {"input": "How to build multi-agent system and stream intermediate steps from it", "tool_calls": [Search(query="How to build multi-agent system and stream intermediate steps from it", sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"])]},
    {"input": "LangChain agents vs LangGraph?", "tool_calls": [Search(query="What's the difference between LangChain agents and LangGraph? How do you deploy them?", sub_queries=["What are LangChain agents", "What is LangGraph", "How do you deploy LangChain agents", "How do you deploy LangGraph"])]}
]

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = [{"id": str(uuid.uuid4()), "type": "function", "function": {"name": tool_call.__class__.__name__, "arguments": tool_call.json()}} for tool_call in example["tool_calls"]]
    messages.append(AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls}))
    tool_outputs = example.get("tool_outputs") or ["You have correctly called this tool."] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# Collect unique string values from all string columns in the DataFrame
def get_unique_string_values(df: pd.DataFrame) -> List[str]:
    unique_values = set()
    for column in df.select_dtypes(include=['object']).columns:
        unique_values.update(df[column].unique())
    return list(unique_values)

# Example DataFrame
df = pd.DataFrame({
    'country_names': ['Kaenu Reves', 'John F. Kennedy'],
    'city_names': ['New Yrok', 'Los Angles']
})

# Get unique string values
unique_values = get_unique_string_values(df)

# Create embeddings for the unique string values
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(unique_values, embeddings, collection_name="unique_values")

def select_names(question):
    _docs = vectorstore.similarity_search(question, k=10)
    _names = [d.page_content for d in _docs]
    return ", ".join(_names)

create_prompt = {
    "question": RunnablePassthrough(),
    "countries": select_names,
} | prompt

query_analyzer_select = create_prompt | structured_llm

# Combine the query analyzers and handle the query transformation
@chain
def combined_query_analyzer(question):
    response_with_examples = query_analyzer_with_examples.invoke(question)
    response_select = query_analyzer_select.invoke(question)
    return response_with_examples if response_with_examples else response_select

# Narrate the output
def narrate_output(result):
    narration = f"The main query is: {result.query}\n"
    if result.sub_queries:
        narration += "The sub-queries are:\n"
        for sub_query in result.sub_queries:
            narration += f"- {sub_query}\n"
    if result.publish_year:
        narration += f"The video was published in the year: {result.publish_year}\n"
    return narration

# Sanitize the input query
def sanitize_input(query: str) -> str:
    query = re.sub(r"^(\s|`)*(?i:python)?\s*", "", query)
    query = re.sub(r"(\s|`)*$", "", query)
    return query

# Example DataFrame and agent setup
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="openai-tools",
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_code=True,
)

# Final query invocation
query_str = "What are the records for US?"
analyzed_query = combined_query_analyzer.invoke(query_str)
output_narration = narrate_output(analyzed_query)
print(output_narration)

# Sanitize the query before passing it to the agent
sanitized_query = sanitize_input(analyzed_query.query)

# Preprocess the JSON string to escape special characters
processed_query = _custom_parser(f'{{"query": "{sanitized_query}"}}')

# Parse the processed JSON string
parsed_query = parse_partial_json(processed_query)

# Pass the final query to the agent
result = agent.invoke({"input": parsed_query["query"]})
print(result)

This code ensures that the input query is sanitized and properly formatted as a valid JSON object before being passed to the agent. The _custom_parser function handles special characters in the JSON string, and parse_partial_json ensures the JSON string is correctly parsed ^[1].

To continue talking to Dosu, mention @dosu.

mraguth Sep 3, 2024
Author

@dosu In this function, how I create embeddings for multiple columns and incorporate those in the vector database and call them for query? can you adjust the code? In the below example, if I like to embed both country name and city name....

# Example DataFrame
df = pd.DataFrame({
    'country_names': ['Kaenu Reves', 'John F. Kennedy'],
    'city_names': ['New Yrok', 'Los Angles']
})

# Get unique string values
unique_values = get_unique_string_values(df)

# Create embeddings for the unique string values
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(unique_values, embeddings, collection_name="unique_values")

def select_names(question):
    _docs = vectorstore.similarity_search(question, k=10)
    _names = [d.page_content for d in _docs]
    return ", ".join(_names)

create_prompt = {
    "question": RunnablePassthrough(),
    "countries": select_names,
} | prompt

query_analyzer_select = create_prompt | structured_llm

dosubot[bot] bot Sep 3, 2024

To create embeddings for multiple columns (e.g., 'country_names' and 'city_names') in a DataFrame, incorporate those embeddings into a vector database, and call them for a query, you can follow these steps:

Collect Unique String Values from Multiple Columns:
Create Embeddings for These Values:
Update the Vector Store:

Here is the updated code:

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage
import re
from langchain_core.utils.json import _custom_parser, parse_partial_json

# Define the Search model
class Search(BaseModel):
    query: str = Field(..., description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(None, description="Year video was published")

# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

# Adding examples to the prompt
examples = [
    {"input": "What's chat langchain, is it a langchain template?", "tool_calls": [Search(query="What is chat langchain and is it a langchain template?", sub_queries=["What is chat langchain", "What is a langchain template"])]},
    {"input": "How to build multi-agent system and stream intermediate steps from it", "tool_calls": [Search(query="How to build multi-agent system and stream intermediate steps from it", sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"])]},
    {"input": "LangChain agents vs LangGraph?", "tool_calls": [Search(query="What's the difference between LangChain agents and LangGraph? How do you deploy them?", sub_queries=["What are LangChain agents", "What is LangGraph", "How do you deploy LangChain agents", "How do you deploy LangGraph"])]}
]

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = [{"id": str(uuid.uuid4()), "type": "function", "function": {"name": tool_call.__class__.__name__, "arguments": tool_call.json()}} for tool_call in example["tool_calls"]]
    messages.append(AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls}))
    tool_outputs = example.get("tool_outputs") or ["You have correctly called this tool."] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

# Collect unique string values from multiple columns in the DataFrame
def get_unique_string_values(df: pd.DataFrame, columns: List[str]) -> List[str]:
    unique_values = set()
    for column in columns:
        unique_values.update(df[column].unique())
    return list(unique_values)

# Example DataFrame
df = pd.DataFrame({
    'country_names': ['Kaenu Reves', 'John F. Kennedy'],
    'city_names': ['New Yrok', 'Los Angles']
})

# Get unique string values from specified columns
columns_to_embed = ['country_names', 'city_names']
unique_values = get_unique_string_values(df, columns_to_embed)

# Create embeddings for the unique string values
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(unique_values, embeddings, collection_name="unique_values")

def select_names(question):
    _docs = vectorstore.similarity_search(question, k=10)
    _names = [d.page_content for d in _docs]
    return ", ".join(_names)

create_prompt = {
    "question": RunnablePassthrough(),
    "countries": select_names,
} | prompt

query_analyzer_select = create_prompt | structured_llm

# Combine the query analyzers and handle the query transformation
@chain
def combined_query_analyzer(question):
    response_with_examples = query_analyzer_with_examples.invoke(question)
    response_select = query_analyzer_select.invoke(question)
    return response_with_examples if response_with_examples else response_select

# Narrate the output
def narrate_output(result):
    narration = f"The main query is: {result.query}\n"
    if result.sub_queries:
        narration += "The sub-queries are:\n"
        for sub_query in result.sub_queries:
            narration += f"- {sub_query}\n"
    if result.publish_year:
        narration += f"The video was published in the year: {result.publish_year}\n"
    return narration

# Sanitize the input query
def sanitize_input(query: str) -> str:
    query = re.sub(r"^(\s|`)*(?i:python)?\s*", "", query)
    query = re.sub(r"(\s|`)*$", "", query)
    return query

# Example DataFrame and agent setup
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="openai-tools",
    verbose=True,
    return_intermediate_steps=True,
    allow_dangerous_code=True,
)

# Final query invocation
query_str = "What are the records for US?"
analyzed_query = combined_query_analyzer.invoke(query_str)
output_narration = narrate_output(analyzed_query)
print(output_narration)

# Sanitize the query before passing it to the agent
sanitized_query = sanitize_input(analyzed_query.query)

# Preprocess the JSON string to escape special characters
processed_query = _custom_parser(f'{{"query": "{sanitized_query}"}}')

# Parse the processed JSON string
parsed_query = parse_partial_json(processed_query)

# Pass the final query to the agent
result = agent.invoke({"input": parsed_query["query"]})
print(result)

This code will collect all unique string values from the specified columns in your DataFrame, create embeddings for these values, and use these embeddings for partial and fuzzy matching. The combined query analyzer will handle the query transformation, and the output will be narrated in a human-readable format before being passed to the agent for execution.

To continue talking to Dosu, mention @dosu.

Answer selected by mraguth

mraguth · 2024-09-04T02:26:46Z

mraguth
Sep 4, 2024
Author

@dosu everything is working great.. Thank you! However, the only one thing is not working is output_narration = narrate_output(analyzed_query) My objective of narration is .... collect the output from python agent and take the output and narrate the output in human readable language. Is that possible to create a agent or function to narrate the story based on agent output?

1 reply

dosubot[bot] bot Sep 4, 2024

Yes, it is possible to create an agent or function to narrate the output from the Python agent in human-readable language using the Eden AI tools. Specifically, you can use the EdenAiTextToSpeechTool to convert text into speech. Here is an example of how you can achieve this:

from langchain_community.tools.edenai import EdenAiTextToSpeechTool
from langchain.agents import AgentType, initialize_agent
from langchain_community.llms import EdenAI

# Initialize the language model
llm = EdenAI(
    feature="text", provider="openai", params={"temperature": 0.2, "max_tokens": 250}
)

# Initialize the text-to-speech tool
text_to_speech_tool = EdenAiTextToSpeechTool(providers=["amazon"], language="en", voice="MALE")

# Create a list of tools including the text-to-speech tool
tools = [
    text_to_speech_tool,
    # Add other tools as needed
]

# Initialize the agent with the tools
agent_chain = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    return_intermediate_steps=True,
)

# Example input to convert text to speech
input_ = """i have this text : 'i want to slap you' 
first : i want to know if this text contains explicit content or not .
second : if it does contain explicit content i want to know what is the explicit content in this text, 
third : i want to make the text into speech .
if there is URL in the observations , you will always put it in the output (final answer) .
"""
result = agent_chain(input_)

# Get the output
output = result["output"]
print(output)

Additionally, you can leverage the LangChain framework to create more sophisticated generative agents with memory and behavior simulation. For example, you can create agents that interact with each other and the environment, storing and recalling memories to generate more human-like responses. Here is a brief overview of how you can create such agents:

from langchain_experimental.generative_agents import (
    GenerativeAgent,
    GenerativeAgentMemory,
)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.retrievers import TimeWeightedVectorStoreRetriever
from langchain_community.vectorstores import FAISS
from datetime import datetime, timedelta
import faiss
import math

# Define your embedding model
embeddings_model = OpenAIEmbeddings()
embedding_size = 1536
index = faiss.IndexFlatL2(embedding_size)
vectorstore = FAISS(
    embeddings_model.embed_query,
    index,
    InMemoryDocstore({}),
    {},
    relevance_score_fn=lambda score: 1.0 - score / math.sqrt(2),
)

# Create a memory retriever
memory_retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=vectorstore, other_score_keys=["importance"], k=15
)

# Initialize the language model
LLM = ChatOpenAI(max_tokens=1500)

# Create a generative agent memory
tommies_memory = GenerativeAgentMemory(
    llm=LLM,
    memory_retriever=memory_retriever,
    verbose=False,
    reflection_threshold=8,
)

# Create a generative agent
tommie = GenerativeAgent(
    name="Tommie",
    age=25,
    traits="anxious, likes design, talkative",
    status="looking for a job",
    memory_retriever=memory_retriever,
    llm=LLM,
    memory=tommies_memory,
)

# Add memories to the agent
tommie_observations = [
    "Tommie remembers his dog, Bruno, from when he was a kid",
    "Tommie feels tired from driving so far",
    "Tommie sees the new home",
    "The new neighbors have a cat",
    "The road is noisy at night",
    "Tommie is hungry",
    "Tommie tries to get some rest.",
]
for observation in tommie_observations:
    tommie.memory.add_memory(observation)

# Generate a summary of the agent's memories
print(tommie.get_summary(force_refresh=True))

This code demonstrates how to create a generative agent with memory and behavior simulation, which can be used to generate more human-like responses and interactions ^[1]^[2].

To continue talking to Dosu, mention @dosu.

mraguth · 2024-09-04T02:36:24Z

mraguth
Sep 4, 2024
Author

@dosu I am just looking to narrate the response that I got from my python agent. Can you help?

3 replies

dosubot[bot] bot Sep 4, 2024

To narrate the response from your Python agent in human-readable language, you can use the generate_reaction or generate_dialogue_response methods from the GenerativeAgent class. These methods process the observation and generate a response that can be narrated in a human-readable format.

Here is an example of how you can use these methods:

# Assuming you have an instance of GenerativeAgent named `agent`
observation = "The sky is clear and blue."

# Generate a reaction
is_dialogue, reaction = agent.generate_reaction(observation)

# Narrate the response
if is_dialogue:
    print(f"{agent.name} said: {reaction}")
else:
    print(f"{agent.name} reacted: {reaction}")

The generate_reaction method will return a tuple where the first element indicates if the response is a dialogue (True for dialogue, False for reaction), and the second element is the human-readable response.

Additionally, you can use the LangChain framework to format the response in a structured way. LangChain provides a template for generating responses that can be directly used or further customized:

from langchain_core.prompts.chat import HumanMessagePromptTemplate

# Example usage
formatted_response = HumanMessagePromptTemplate().format(input=observation, agent_scratchpad=reaction)
print(formatted_response)

This template ensures that the response is coherent and relevant, making it easier to narrate in human-readable language. The HumanMessagePromptTemplate from LangChain helps structure the response in a clear and organized manner ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

mraguth Sep 4, 2024
Author

@dosu All I am asking here is.... how I change class Search(BaseModel): and system prompt system = """xxxxx" ` from the below query. I don't think I need brand new code snippet. Please hlep.

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage
import re
from langchain_core.utils.json import _custom_parser, parse_partial_json

# Define the Search model

class Search(BaseModel):
    query: str = Field(...,
                       description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(
        default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(
        None, description="Year video was published")


# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = llm
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

dosubot[bot] bot Sep 4, 2024

To better suit a pandas DataFrame for agent queries, you can modify the Search class to include fields that are relevant to the DataFrame and adjust the system prompt to reflect the new context. Additionally, you can add examples to the prompt to guide the LLM in generating more accurate queries. Here is the updated code snippet:

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict, Optional
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage
import re
from langchain_core.utils.json import _custom_parser, parse_partial_json

# Define the Search model

class Search(BaseModel):
    query: str = Field(...,
                       description="Primary similarity search query applied to the DataFrame.")
    sub_queries: List[str] = Field(
        default_factory=list, description="List of sub-questions derived from the main query.")
    column_filters: Dict[str, str] = Field(
        default_factory=dict, description="Dictionary of column filters to apply to the DataFrame.")

# Define the prompt template
system = """You are an expert at converting user questions into DataFrame queries. \
You have access to a pandas DataFrame containing various data points. \
Given a question, return a list of DataFrame queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)

# Adding examples to the prompt
examples = []

question = "How to filter DataFrame by date and category?"
query = Search(
    query="How to filter DataFrame by date and category?",
    sub_queries=["How to filter DataFrame by date", "How to filter DataFrame by category"],
    column_filters={"date": "specific_date", "category": "specific_category"}
)
examples.append({"input": question, "tool_calls": [query]})

question = "How to find the average value in a column?"
query = Search(
    query="How to find the average value in a column?",
    sub_queries=["How to find the average value", "Which column to find the average value in"],
    column_filters={"column": "specific_column"}
)
examples.append({"input": question, "tool_calls": [query]})

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "You have correctly called this tool."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

In this updated version, the Search class now includes a column_filters field to specify filters for DataFrame columns. The system prompt has also been updated to reflect that the queries are for a pandas DataFrame. Additionally, examples have been added to the prompt to guide the LLM in generating more accurate queries. This aligns with the example provided in the LangChain documentation, where query analysis is used to improve retrieval results by converting user questions into structured queries ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

mraguth · 2024-09-04T03:14:10Z

mraguth
Sep 4, 2024
Author

@dosu... One final question. The suggestion you provided is working excellently, and passing my query_str is yielding great results. However, I noticed that your BaseModel is primarily designed for YouTube transcripts, while I am mainly using a pandas DataFrame for agent queries.

My question is: Do I need to modify the class search(BaseModel) and the system message accordingly? If so, could you please provide a similar snippet?

The query you provided is below...

import pandas as pd
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough, chain
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field
import uuid
from typing import List, Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, ToolMessage
import re
from langchain_core.utils.json import _custom_parser, parse_partial_json

# Define the Search model

class Search(BaseModel):
    query: str = Field(...,
                       description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(
        default_factory=list, description="List of sub-questions derived from the main query.")
    publish_year: Optional[int] = Field(
        None, description="Year video was published")


# Define the prompt template
system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results.

If there are acronyms or words you are not familiar with, do not try to rephrase them."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = llm
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

1 reply

dosubot[bot] bot Sep 4, 2024

Yes, you need to modify the Search class and the system message to better suit a pandas DataFrame for agent queries. Below is a similar snippet that demonstrates how to adapt the code for handling pandas DataFrame, incorporating the new context from LangChain:

import pandas as pd
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain.agents import AgentExecutor, OpenAIFunctionsAgent
from langchain_experimental.tools import PythonAstREPLTool
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_experimental.agents.agent_toolkits.pandas.base import create_pandas_dataframe_agent

# Define the prompt for generating search queries
SEARCH_PROMPT = ChatPromptTemplate.from_messages(
    [
        ("system", "{agent_prompt}"),
        (
            "user",
            "Write 3 google search queries to search online that form an "
            "objective opinion from the following: {question}\n"
            "You must respond with a list of strings in the following format: "
            '["query 1", "query 2", "query 3"].',
        ),
    ]
)

# Define the prompt for choosing the agent
AUTO_AGENT_INSTRUCTIONS = """
This task involves researching a given topic, regardless of its complexity or the availability of a definitive answer. The research is conducted by a specific agent, defined by its type and role, with each agent requiring distinct instructions.
Agent
The agent is determined by the field of the topic and the specific name of the agent that could be utilized to research the topic provided. Agents are categorized by their area of expertise, and each agent type is associated with a corresponding emoji.
examples:
task: "should I invest in apple stocks?"
response: 
{
    "agent": "💰 Finance Agent",
    "agent_role_prompt: "You are a seasoned finance analyst AI assistant. Your primary goal is to compose comprehensive, astute, impartial, and methodically arranged financial reports based on provided data and trends."
}
task: "could reselling sneakers become profitable?"
response: 
{ 
    "agent":  "📈 Business Analyst Agent",
    "agent_role_prompt": "You are an experienced AI business analyst assistant. Your main objective is to produce comprehensive, insightful, impartial, and systematically structured business reports based on provided business data, market trends, and strategic analysis."
}
task: "what are the most interesting sites in Tel Aviv?"
response:
{
    "agent:  "🌍 Travel Agent",
    "agent_role_prompt": "You are a world-travelled AI tour guide assistant. Your main purpose is to draft engaging, insightful, unbiased, and well-structured travel reports on given locations, including history, attractions, and cultural insights."
}
"""  # noqa: E501
CHOOSE_AGENT_PROMPT = ChatPromptTemplate.from_messages(
    [SystemMessage(content=AUTO_AGENT_INSTRUCTIONS), ("user", "task: {task}")]
)

# Function to load JSON
def load_json(s):
    try:
        return json.loads(s)
    except Exception:
        return {}

# Define the search query chain
search_query = SEARCH_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser() | load_json
choose_agent = (
    CHOOSE_AGENT_PROMPT | ChatOpenAI(temperature=0) | StrOutputParser() | load_json
)

# Function to handle DataFrame input
def get_search_queries_from_df(df: pd.DataFrame):
    tasks = df['task'].tolist()
    agent_prompts = [choose_agent({"task": task}).get("agent_role_prompt") for task in tasks]
    search_queries = [search_query({"agent_prompt": agent_prompt, "question": task}) for agent_prompt, task in zip(agent_prompts, tasks)]
    return search_queries

# Example usage with a DataFrame
df = pd.DataFrame({
    'task': [
        "should I invest in apple stocks?",
        "could reselling sneakers become profitable?",
        "what are the most interesting sites in Tel Aviv?"
    ]
})

search_queries = get_search_queries_from_df(df)
print(search_queries)

This snippet demonstrates how to adapt the existing code to handle a pandas DataFrame for agent queries. The get_search_queries_from_df function processes each task in the DataFrame, chooses the appropriate agent, and generates search queries accordingly. Additionally, the create_pandas_dataframe_agent function from LangChain can be integrated to enhance the agent's ability to interact with the DataFrame ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

Prompt Clean up before it passing to query_str? #25942

Uh oh!

mraguth Sep 2, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 6 comments · 17 replies

Uh oh!

dosubot[bot] bot Sep 2, 2024

Uh oh!

mraguth Sep 2, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

mraguth Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

mraguth Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

mraguth Sep 3, 2024 Author

Uh oh!

mraguth Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

mraguth Sep 3, 2024 Author

Uh oh!

dosubot[bot] bot Sep 3, 2024

Uh oh!

mraguth Sep 4, 2024 Author

Uh oh!

dosubot[bot] bot Sep 4, 2024

Uh oh!

mraguth Sep 4, 2024 Author

Uh oh!

dosubot[bot] bot Sep 4, 2024

Uh oh!

mraguth Sep 4, 2024 Author

Uh oh!

dosubot[bot] bot Sep 4, 2024

Uh oh!

mraguth Sep 4, 2024 Author

Uh oh!

dosubot[bot] bot Sep 4, 2024

mraguth
Sep 2, 2024

Replies: 6 comments 17 replies

dosubot[bot]
bot Sep 2, 2024

mraguth
Sep 2, 2024
Author

mraguth Sep 3, 2024
Author

mraguth Sep 3, 2024
Author

mraguth
Sep 3, 2024
Author

mraguth Sep 3, 2024
Author

mraguth Sep 3, 2024
Author

mraguth
Sep 4, 2024
Author

mraguth
Sep 4, 2024
Author

mraguth Sep 4, 2024
Author

mraguth
Sep 4, 2024
Author