Skip to content

Official implementation for Plan-of-SQLs or POS: Interpretable LLM-based Table Question Answering; published at Transactions on Machine Learning Research

Notifications You must be signed in to change notification settings

giangnguyen2412/Plan-of-SQLs-TMLR2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interpretable LLM-based Table Question Answering

by Giang Nguyen1, Ivan Brugere2, Shubham Sharma2, Sanjay Kariyappa3, Anh Totti Nguyen2* Freddy Lecue1*,

*Equal advising
1Auburn University, 2J.P.Morgan AI Research, 3NVIDIA

Transactions on Machine Learning Research

arXiv Interface License: MIT


📜 Abstract

Interpretability in Table Question Answering (Table QA) is critical, especially in high-stakes domains like finance and healthcare. While recent Table QA approaches based on Large Language Models (LLMs) achieve high accuracy, they often produce ambiguous explanations of how answers are derived. We propose Plan-of-SQLs (POS), a new Table QA method that makes the model's decision-making process interpretable. POS decomposes a question into a sequence of atomic steps, each directly translated into an executable SQL command on the table, thereby ensuring that every intermediate result is transparent. Through extensive experiments, we show that: First, POS generates the highest-quality explanations among compared methods, which markedly improves the users' ability to simulate and verify the model’s decisions. Second, when evaluated on standard Table QA benchmarks (TabFact, WikiTQ, and FeTaQA), POS achieves QA accuracy that is competitive to existing methods, while also offering greater efficiency—requiring significantly fewer LLM calls and table database queries (up to 25x fewer)—and more robust performance on large-sized tables. Finally, we observe high agreement (up to 90.59% in forward simulation) between LLMs and human users when making decisions based on the same explanations, suggesting that LLMs could serve as an effective proxy for humans in evaluating Table QA explanations.


🗺️ Table of Contents

  1. Environment Setup
  2. Datasets & Benchmarks
  3. Improved Planning Algorithm
  4. Performance vs. Table Size
  5. Visualization & Evaluation
  6. Interactive Demo
  7. Citation

Environment Setup

# 1️⃣  Create and activate Conda env
conda create -n tabular-llms-openai python=3.10.13 -y
conda activate tabular-llms-openai

# 2️⃣  Install core dependencies
pip install openai==0.27.4 azure-identity==1.12.0 azure-cli==2.41.0 azure-mgmt-cognitiveservices

# 3️⃣  Authenticate with Azure (browser sign-in)
az login

# 4️⃣  Project-specific dependencies
python install.py

Datasets & Benchmarks

TabFact

# 🔍 Step 1: Extract data
unzip data.zip

# ▶️ Step 2: Run evaluation
python run_tabfact_pos.py --use_subset True --load_dataset True

WikiTQ

# 🔍 Step 1: Clone Dater repo (official evaluation code)
git clone https://github.com/AlibabaResearch/DAMO-ConvAI.git
cd DAMO-ConvAI/dater

# 🔍 Step 2: Download pre-processed data (see repo instructions) to obtain dater_data.tar.gz

# ▶️ Step 3: Run POS on WikiTQ
mv saved/ code/
python run_wikitq_pos.py --load_dataset True --use_subset True

Improved Planning Algorithm

Why & how (click to expand)

We observed that many POS errors stem from the planning stage.
The original planner generates all steps up‑front, often missing crucial conditions.
Our dynamic planner instead generates one atomic step at a time, conditioned on the current intermediate table—leading to consistent accuracy gains across LLM backbones.

Try it by toggling self.planning_algorithm in helper.py between static and dynamic.


Performance vs. Table Size

Generate the table-size analysis plots:

python run_table_size_analysis.py           # TabFact
python run_table_size_analysis_wikitq.py    # WikiTQ

Input: a JSON results file (see example below).

{
  "test-1385": {
    "input": { ... },
    "answer": "TRUE",
    "answer_plans": { "dynamic": 1 },
    "groundtruth": "TRUE",
    "table_token_count": 101
  },
  ...
}

Visualization & Evaluation

Visualizing Table QA Explanations

cd visualization/script
sh vis_POS.sh

LLM-as-a-Judge

Prepare & run the three automatic XAI studies
# 1. Extract similar examples across XAI methods
cd xai_study/llm-judge/scripts
sh prepare_samples.sh

# 2. Experiments
# 2a. Preference (clearer reasoning?)
sh run_preference.sh     

# 2b. Forward Simulation (given explanation, predict model answer)
sh run_forward_sim.sh   

# 2c. Model Prediction Debugging (is prediction correct?)
sh run_debugging.sh      

Human Evaluation Interfaces


Interactive Demo

Explore POS explanations live: Interactive Tabular XAI.


Citation

@article{nguyen2024interpretable,
  title={Interpretable llm-based table question answering},
  author={Nguyen, Giang and Brugere, Ivan and Sharma, Shubham and Kariyappa, Sanjay and Nguyen, Anh Totti and Lecue, Freddy},
  journal={arXiv preprint arXiv:2412.12386},
  year={2024}
}

♥  Please reach out to nguyengiangbkhn@gmail.com for any inquiries  ♥

About

Official implementation for Plan-of-SQLs or POS: Interpretable LLM-based Table Question Answering; published at Transactions on Machine Learning Research

Topics

Resources

Stars

Watchers

Forks

Languages