Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models
📄 ACL 2025 Findings Paper — Math2Visual
📘 Annotated Visual Language and Visual Dataset
🤖 Visual Language Generation Model
In this project, we present Math2Visual, an automatic framework for generating pedagogically meaningful visuals from math word problem text descriptions. Math2Visual leverages a pre-defined visual language and a design space grounded in interviews with math teachers, to illustrate the core mathematical relationships in math word problems. Using Math2Visual, we construct an annotated dataset of 1,903 visuals and evaluate Text-to-Image (TTI) models for their ability to generate visuals that align with our design. We further fine-tune several TTI models with our dataset, demonstrating improvements in educational visual generation. Our work establishes a new benchmark for automated generation of pedagogically meaningful visuals and offers insights into key challenges in producing multimodal educational content, such as the misrepresentation of mathematical relationships and the omission of essential visual elements.
We have released the full dataset on Hugging Face, including:
- Annotated visual language with corresponding math word problems
- Generated formal and intuitive visuals in both
.svg
and.png
formats
👉 Browse the dataset on Hugging Face
You can preview images and download files directly from the Hugging Face web interface.
git clone https://github.com/eth-lre/math2visual.git
conda create -n math2visual python=3.12.4
conda activate math2visual
cd math2visual
pip install -r requirements_a.txt
pip install -r requirements_b.txt
touch .env
echo "OPENAI_API_KEY=<your_openai key>" >> .env
Download our model adapter on Hugging Face
Place the adapter_model.safetensors into model/check-point/
Download base model meta-llama/Llama-3.1-8B on Hugging Face
Place the downloaded folder into model/base_model/
Replace the 'mwp' and 'formula' fields with your own math word problem content in generate_visual_language_with_our_model.py (around line 102). Then run:
python3 generate_visual_language_with_our_model.py
It will print out the generated visual language and save it in /output_visual_language/visual_langauge.txt
Replace the 'mwp' and 'formula' fields with your own math word problem content in generate_visual_language_with_gpt.py (around line 196). Then run:
python3 generate_visual_language_with_gpt.py
It will print out the generated visual language and save it in /output_visual_language/visual_langauge.txt
Replace the 'visual_language' field with your own generated visual language in generate_visual_formal.py (around line 1406). Then run:
python3 generate_visual_formal.py
It will generate the visual and save it in /output_visual_formal/01.svg
Replace the 'visual_language' field with your own generated visual language in generate_visual_intuitive.py (around line 4263). Then run:
python3 generate_visual_intuitive.py
It will generate the visual and save it in /output_visual_intuitive/01.svg
Junling Wang, Anna Rutkiewicz, April Wang, and Mrinmaya Sachan. 2025. Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models. In Findings of the Association for Computational Linguistics: ACL 2025, pages 11229–11257, Vienna, Austria. Association for Computational Linguistics.
@inproceedings{wang-etal-2025-generating-pedagogically,
title = "Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models",
author = "Wang, Junling and
Rutkiewicz, Anna and
Wang, April and
Sachan, Mrinmaya",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.586/",
pages = "11229--11257",
ISBN = "979-8-89176-256-5",
abstract = "Visuals are valuable tools for teaching math word problems (MWPs), helping young learners interpret textual descriptions into mathematical expressions before solving them.However, creating such visuals is labor-intensive and there is a lack of automated methods to support this process. In this paper, we present Math2Visual, an automatic framework for generating pedagogically meaningful visuals from MWP text descriptions. Math2Visual leverages a pre-defined visual language and a design space grounded in interviews with math teachers, to illustrate the core mathematical relationships in MWPs.Using Math2Visual, we construct an annotated dataset of 1,903 visuals and evaluate Text-to-Image (TTI) models for their ability to generate visuals that align with our design. We further fine-tune several TTI models with our dataset, demonstrating improvements in educational visual generation. Our work establishes a new benchmark for automated generation of pedagogically meaningful visuals and offers insights into key challenges in producing multimodal educational content, such as the misrepresentation of mathematical relationships and the omission of essential visual elements."
}
This work is licensed under a
This work is licensed under the Apache License 2.0.
For research inquiries, please contact: Junling Wang — wangjun [at] ethz [dot] ch