VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

🌐 Project Page | 📖 arXiv | 🤗 VisCode-200K | 🤗 VisCoder-3B | 🤗 VisCoder-7B

🔔 News

🔥 [2025-06-05] VisCoder and VisCode-200K are now publicly released! Check out our paper and collections.

🧠 Introduction

VisCoder is an open-source large language model fine-tuned for Python visualization code generation and iterative self-correction. It is trained on VisCode-200K, a large-scale instruction-tuning dataset tailored for executable plotting tasks and runtime-guided revision.

VisCoder addresses a core challenge in data analysis: generating Python code that produces not only syntactically correct, but also visually meaningful plots. Unlike general code generation tasks, visualization requires grounding across natural language instructions, data structures, and rendered visual outputs.

To enable this, VisCode-200K includes:

✅ 150K+ executable visualization examples, validated through runtime checks and paired with plot images.
🔁 45K multi-turn correction dialogues from the Code-Feedback dataset, providing supervision for fixing faulty code based on execution feedback.

We further propose a self-debug evaluation protocol, simulating real-world developer workflows through multi-round error correction. VisCoder is benchmarked on PandasPlotBench against GPT-4o, GPT-4o-mini, Qwen, and LLaMA, demonstrating robust performance and strong recovery from execution failures.

📊 Main Results on PandasPlotBench

We evaluate VisCoder on PandasPlotBench, a benchmark for executable Python visualization code generation across three libraries: Matplotlib, Seaborn, and Plotly. The figure below summarizes model performance in terms of execution success and GPT-4o-judged alignment scores.

With self-debug, VisCoder-7B achieves over 90% execution pass rate on both Matplotlib and Seaborn, outperforming strong open-source baselines and approaching GPT-4o performance on multiple libraries.

🛠️ Training & Evaluation

We provide both training and evaluation scripts for VisCoder.

📦 Training is performed using the ms-swift framework with full-parameter supervised fine-tuning on VisCode-200K.
📊 Evaluation is based on the PandasPlotBench. We augment the original evaluation with an additional Execution Pass Rate metric and introduce a new self-debug evaluation mode that allows models to revise failed generations over multiple rounds.

See the following folders for details:

train/: Training scripts and configurations based on ms-swift.
eval/: Evaluation scripts adapted from PandasPlotBench with our self-debug extension.

Contact

Yuansheng Ni: yuansheng.ni@uwaterloo.ca
Wenhu Chen: wenhuchen@uwaterloo.ca

📖 Citation

BibTeX:

@article{ni2025viscoder,
  title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
  author={Ni, Yuansheng and Nie, Ping and Zou, Kai and Yue, Xiang and Chen, Wenhu},
  journal={arXiv preprint arXiv:2506.03930},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
eval		eval
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

🔔 News

🧠 Introduction

📊 Main Results on PandasPlotBench

🛠️ Training & Evaluation

Contact

📖 Citation

About

Uh oh!

Releases

Packages

Languages

License

TIGER-AI-Lab/VisCoder

Folders and files

Latest commit

History

Repository files navigation

VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation

🔔 News

🧠 Introduction

📊 Main Results on PandasPlotBench

🛠️ Training & Evaluation

Contact

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages