PandasQueryEngine with Ollama #19171
Replies: 2 comments 1 reply
-
Hi @TastyMoocow! I'm Dosu and I’m helping the llama_index team. The error happens because PandasQueryEngine expects the LLM to return Python code (like a list of column names), but your Ollama model is returning a natural language explanation instead. The output parser tries to parse the LLM's response as code, so any extra text or explanation will cause a SyntaxError. To fix this, you need to make sure the LLM only returns code. The prompt template used by PandasQueryEngine is designed to instruct the LLM to output only executable Python code, often between from llama_index.experimental.query_engine.pandas import PandasQueryEngine
prompt_template = (
"Given the following pandas DataFrame, answer the user's question by returning ONLY executable Python code (no explanation, no markdown, no comments). "
"For example, to get the column names, return: df.columns.tolist()\n"
"DataFrame: {df}\n"
"Question: {question}\n"
"Python code:"
)
query_engine = PandasQueryEngine(df=dfs[0], llm=llm, prompt_template=prompt_template) Set If the LLM still returns extra text, you can post-process the output to extract just the code, or subclass/wrap the output processor to be more robust. For your DataFrame validation logic, you can check if the columns or shape match your expectations and, if not, trigger a re-extraction with an extended bounding box. Best practices: always give clear instructions in your prompt about the expected output format, include examples, and use temperature=0 for deterministic results. More on this in the LlamaIndex prompt engineering guidance. To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
-
Thank you for pointing out the need to use a coder type model. After switching to qwen2.5-coder, it worked even without a prompt_template. Do you know which Ollama models would work with PandaQueryEngine? I tried codegemma:7b-code model but it threw an error instead. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to determine if a Pandas DF is correctly generated from PymuPDF's get_tables() function. The end goal is to automatically detect if the program needs to go back and call PymuPDF again by extending the bounding box. When I run the code, I get the following error.
Traceback (most recent call last): File "D:\python_projects\FinanceReportRead\.venv\Lib\site-packages\llama_index\experimental\query_engine\pandas\output_parser.py", line 42, in default_output_processor tree = ast.parse(output) ^^^^^^^^^^^^^^^^^ File "C:\Users\Daniel\AppData\Local\Programs\Python\Python311\Lib\ast.py", line 50, in parse return compile(source, filename, mode, flags, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "<unknown>", line 2 Okay, let's see. The user is working with a pandas DataFrame called df. They provided the output of df.head(), which shows the columns as Assets, Col1, Col2, Col3, Col4, Col5. The question is asking for the column header names. ^ SyntaxError: unterminated string literal (detected at line 2)
Here's a cleaned up version of my code.
`doc = pymupdf.open("sample_docs/Ascend_10Q_2025.pdf")
data = doc.load_page(5)
llm = Ollama(model="qwen3:8b",
base_url="http://127.0.0.1:11434",
request_timeout=180,
keep_alive=1,
temperature=0)
tabsFinder = data.find_tables()
tables = []
dfs = []
for tab in tabsFinder.tables:
tables.append(tab)
for tab in tables:
df = tab.to_pandas()
dfs.append(df)
query_engine = PandasQueryEngine(df=dfs[0],llm=llm)
response = query_engine.query("What are the column header names?")
`
Looking at the example in the documentation page, it seems like all I need to include is the llm definition and a Pandas dataframe for the query. From the error code, it seems like Ollama was called since I see some kind of response from it.
Beta Was this translation helpful? Give feedback.
All reactions