Error Executing Pandas Data Frame Query #26681

AlisonZa · 2024-09-19T20:19:55Z

AlisonZa
Sep 19, 2024

I am trying to create a CSV/Xlsx copilot. However, the agent decide the right parameters but can't call it properly.
This is the Tool:

`def get_top_k_values(
df,
feature_name: Annotated[str, 'The name of the feature to sort by, must be the name of a collumn of the dataframe'],
k: Annotated[int, 'The number of top values to return'],
order: Annotated[Literal['ascending', 'descending'], 'Sorting order for the feature'] = 'descending'
) -> pd.DataFrame:
'''
Retrieves the top K values based on the specified feature from the user-provided DataFrame.

Args:
    df (pd.DataFrame): The DataFrame provided by the user.
    feature_name (str): The name of the feature to sort by.
    k (int): The number of top values to return.
    order (Literal['ascending', 'descending']): Sorting order for the feature. Default is 'descending'.

Returns:
    pd.DataFrame: A DataFrame containing the top K rows sorted by the specified feature.
'''
if order == 'ascending':
    sorted_df = df.sort_values(by=feature_name, ascending=True)
elif order == 'descending':
    sorted_df = df.sort_values(by=feature_name, ascending=False)
else:
    raise ValueError("Invalid value for 'order'. Use 'ascending' or 'descending'.")

top_k_df = sorted_df.head(k)
return top_k_df

get_top_k_values = StructuredTool.from_function(get_top_k_values)`

This is the agent:
`df = pd.read_csv(
"https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")

agent = create_pandas_dataframe_agent(OpenAI(temperature=0), df, verbose=True, extra_tools = [get_top_k_values],allow_dangerous_code=True)`

This is the calling:
agent.invoke({"input":"What are the top 10 younger passengers: "})

Output:
> Entering new AgentExecutor chain... Thought: We need to sort the dataframe by age and then return the top 10 rows. Action: get_top_k_values Action Input: df, 'Age', 10, 'ascending'
Error:
`---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[50], line 1
----> 1 agent.invoke({"input":"What are the top 10 younger passengers: "})

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\chains\base.py:164, in Chain.invoke(self, input, config, **kwargs)
162 except BaseException as e:
163 run_manager.on_chain_error(e)
--> 164 raise e
165 run_manager.on_chain_end(outputs)
167 if include_run_info:

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\chains\base.py:154, in Chain.invoke(self, input, config, **kwargs)
151 try:
152 self._validate_inputs(inputs)
153 outputs = (
--> 154 self._call(inputs, run_manager=run_manager)
155 if new_arg_supported
156 else self._call(inputs)
157 )
159 final_outputs: Dict[str, Any] = self.prep_outputs(
160 inputs, outputs, return_only_outputs
161 )
162 except BaseException as e:

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\agents\agent.py:1608, in AgentExecutor._call(self, inputs, run_manager)
1606 # We now enter the agent loop (until it returns something).
1607 while self._should_continue(iterations, time_elapsed):
-> 1608 next_step_output = self._take_next_step(
1609 name_to_tool_map,
1610 color_mapping,
1611 inputs,
1612 intermediate_steps,
1613 run_manager=run_manager,
1614 )
1615 if isinstance(next_step_output, AgentFinish):
1616 return self._return(
1617 next_step_output, intermediate_steps, run_manager=run_manager
1618 )

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\agents\agent.py:1314, in AgentExecutor._take_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)
1305 def _take_next_step(
1306 self,
1307 name_to_tool_map: Dict[str, BaseTool],
(...)
1311 run_manager: Optional[CallbackManagerForChainRun] = None,
1312 ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
1313 return self._consume_next_step(
-> 1314 [
1315 a
1316 for a in self._iter_next_step(
1317 name_to_tool_map,
1318 color_mapping,
1319 inputs,
1320 intermediate_steps,
1321 run_manager,
1322 )
1323 ]
1324 )

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\agents\agent.py:1314, in (.0)
1305 def _take_next_step(
1306 self,
1307 name_to_tool_map: Dict[str, BaseTool],
(...)
1311 run_manager: Optional[CallbackManagerForChainRun] = None,
1312 ) -> Union[AgentFinish, List[Tuple[AgentAction, str]]]:
1313 return self._consume_next_step(
-> 1314 [
1315 a
1316 for a in self._iter_next_step(
1317 name_to_tool_map,
1318 color_mapping,
1319 inputs,
1320 intermediate_steps,
1321 run_manager,
1322 )
1323 ]
1324 )

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\agents\agent.py:1399, in AgentExecutor._iter_next_step(self, name_to_tool_map, color_mapping, inputs, intermediate_steps, run_manager)
1397 yield agent_action
1398 for agent_action in actions:
-> 1399 yield self._perform_agent_action(
1400 name_to_tool_map, color_mapping, agent_action, run_manager
1401 )

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain\agents\agent.py:1421, in AgentExecutor._perform_agent_action(self, name_to_tool_map, color_mapping, agent_action, run_manager)
1419 tool_run_kwargs["llm_prefix"] = ""
1420 # We then call the tool on the tool input to get an observation
-> 1421 observation = tool.run(
1422 agent_action.tool_input,
1423 verbose=self.verbose,
1424 color=color,
1425 callbacks=run_manager.get_child() if run_manager else None,
1426 **tool_run_kwargs,
1427 )
1428 else:
1429 tool_run_kwargs = self.agent.tool_run_logging_kwargs()

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain_core\tools\base.py:586, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
584 if error_to_raise:
585 run_manager.on_tool_error(error_to_raise)
--> 586 raise error_to_raise
587 output = _format_output(content, artifact, tool_call_id, self.name, status)
588 run_manager.on_tool_end(output, color=color, name=self.name, **kwargs)

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain_core\tools\base.py:549, in BaseTool.run(self, tool_input, verbose, start_color, color, callbacks, tags, metadata, run_name, run_id, config, tool_call_id, **kwargs)
547 context = copy_context()
548 context.run(_set_config_context, child_config)
--> 549 tool_args, tool_kwargs = self._to_args_and_kwargs(tool_input)
550 if signature(self._run).parameters.get("run_manager"):
551 tool_kwargs["run_manager"] = run_manager

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain_core\tools\base.py:472, in BaseTool._to_args_and_kwargs(self, tool_input)
471 def _to_args_and_kwargs(self, tool_input: Union[str, Dict]) -> Tuple[Tuple, Dict]:
--> 472 tool_input = self._parse_input(tool_input)
473 # For backwards compatibility, if run_input is a string,
474 # pass as a positional argument.
475 if isinstance(tool_input, str):

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\langchain_core\tools\base.py:420, in BaseTool.parse_input(self, tool_input)
418 if input_args is not None:
419 key = next(iter(input_args.fields.keys()))
--> 420 input_args.validate({key_: tool_input})
421 return tool_input
422 else:

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\pydantic\v1\main.py:711, in BaseModel.validate(cls, value)
708 value = cls._enforce_dict_if_root(value)
710 if isinstance(value, dict):
--> 711 return cls(**value)
712 elif cls.config.orm_mode:
713 return cls.from_orm(value)

File c:\Users\Marina\Desktop\Langchain Projects\1 - QA.venv\lib\site-packages\pydantic\v1\main.py:341, in BaseModel.init(pydantic_self, **data)
339 values, fields_set, validation_error = validate_model(pydantic_self.class, data)
340 if validation_error:
--> 341 raise validation_error
342 try:
343 object_setattr(pydantic_self, 'dict', values)

ValidationError: 2 validation errors for get_top_k_valuesSchema
feature_name
field required (type=value_error.missing)
k
field required (type=value_error.missing)`

As you can see, the action input in correct, however I think, because the dataframe is a variable that comes out of the function (is not defined by the agent), it is having some trouble in running. Does anyone knows how to properly give the agent a parameter like that?

@AlisonZa · 2024-09-19T20:21:32Z

dosubot[bot]
bot Sep 19, 2024

Hello, @AlisonZa! I'm here to assist you with any bugs, questions, or contributions you have.

To properly pass a DataFrame variable to the get_top_k_values function when using a Pandas DataFrame agent, you need to ensure that the DataFrame is included in the local variables accessible to the agent. This is done by creating a PythonAstREPLTool with the DataFrame(s) and passing it to the agent. Here is an example of how to do this:

from langchain_openai import ChatOpenAI
from langchain_experimental.agents import create_pandas_dataframe_agent
import pandas as pd
from langchain_core.tools.structured import StructuredTool
from typing import Annotated, Literal

# Define the function
def get_top_k_values(
    df,
    feature_name: Annotated[str, 'The name of the feature to sort by, must be the name of a column of the dataframe'],
    k: Annotated[int, 'The number of top values to return'],
    order: Annotated[Literal['ascending', 'descending'], 'Sorting order for the feature'] = 'descending'
) -> pd.DataFrame:
    if order == 'ascending':
        sorted_df = df.sort_values(by=feature_name, ascending=True)
    elif order == 'descending':
        sorted_df = df.sort_values(by=feature_name, ascending=False)
    else:
        raise ValueError("Invalid value for 'order'. Use 'ascending' or 'descending'.")
    top_k_df = sorted_df.head(k)
    return top_k_df

# Create the StructuredTool
get_top_k_values_tool = StructuredTool.from_function(get_top_k_values)

# Load your DataFrame
df = pd.read_csv("https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv")

# Initialize the language model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Create the Pandas DataFrame agent
agent = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="tool-calling",
    verbose=True,
    extra_tools=[get_top_k_values_tool],
    allow_dangerous_code=True  # Ensure this is set to True
)

# Now you can use the agent to call the `get_top_k_values` function
result = agent.invoke({"input": "What are the top 10 younger passengers?"})
print(result)

In this example:

The DataFrame df is passed to the create_pandas_dataframe_agent function, which sets up the agent with the DataFrame accessible in its local environment.
The allow_dangerous_code=True flag is necessary to enable the execution of arbitrary code, which is required for the agent to interact with the DataFrame.
The get_top_k_values function is defined and converted into a StructuredTool using StructuredTool.from_function.

This setup ensures that the DataFrame is properly passed and accessible to the agent, allowing it to execute the get_top_k_values function correctly ^[1].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error Executing Pandas Data Frame Query #26681

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Error Executing Pandas Data Frame Query #26681

Uh oh!

Uh oh!

AlisonZa Sep 19, 2024

Replies: 1 comment

Uh oh!

dosubot[bot] bot Sep 19, 2024

AlisonZa
Sep 19, 2024

dosubot[bot]
bot Sep 19, 2024