Skip to content

xjywhu/Awesome-Multimodal-LLM-for-Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 

Repository files navigation

👨‍💻 Awesome MLLM for Code

Awesome PRs Welcome Last Commit

This repo includes papers about methods, benchmarks and evaluation for code generation under multimodal scenarios:

  • UI Code Generation: Web front-end code generation, Mobile app UI code generation, etc。
  • Scientific Code Generation: plot, chart, formula, etc.
  • Slide code generation.
  • Visually Rich Programming: programming problems with image examples.
  • Logo: image generation through svg code generation.
  • Program repair under above scenarios.
  • UML code generation.
  • CAD code generation.
  • Poster code generation.
  • General Benchmark.

Content


📜 Papers

You can directly click on the title to jump to the corresponding PDF link location

1. Web/UI Code Generation

  1. Design2Code: How Far Are We From Automating Front-End Engineering?. Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, Diyi Yang . NAACL 2025.         GitHub Repo stars

  2. Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset. Hugo Laurençon, Léo Tronchon, Victor Sanh . Arxiv 2024.

  3. VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs. Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang . Arxiv 2024.        

  4. NLDesign: A UI Design Tool for Natural Language Interfaces Tianhao Zhang, Fu Peiguo, Jie Liu, Yihe Zhang, Xingmei Chen . ACM-TURC‘24 (2024.6.30)

  5. Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach. Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, Michael R. Lyu . Arxiv 2024 (FSE 2025).         GitHub Repo stars

  6. Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs. Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen . NeurIPS 2024 Datasets and Benchmarks.         GitHub Repo stars

  7. Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, Tingting Zhou . Arxiv 2024.

  8. Bridging Design and Development with Automated Declarative UI Code Generation Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, Haoyu Wang . Arxiv 2024.(FSE 2025)

  9. Sketch2Code: Evaluating Vision-Language Models for Interactive Web Design Prototyping Ryan Li, Yanzhe Zhang, Diyi Yang . Arxiv 2024.         GitHub Repo stars

  10. Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation? Jingyu Xiao, Yuxuan Wan, Yintong Huo, Zhiyao Xu, Michael R.Lyu . Arxiv 2024 (ASE 2025).         GitHub Repo stars

  11. UIClip: A Data-driven Model for Assessing User Interface Design Jason Wu, Yi-Hao Peng, Xin Yue Li, Amanda Swearngin, Jeffrey P. Biham, Jeffrey Nichols . UIST 2024.

  12. WAFFLE: Multi-Modal Model for Automated Front-End Development Shanchao Liang, Nan Jiang, Shangshu Qian, Lin Tan . Arxiv 2024 (ACL 2025 Main).         GitHub Repo stars

  13. MRWeb: An Exploration of Generating Multi-Page Resource-Aware Web Code from UI Designs Yuxuan Wan, Yi Dong, Jingyu Xiao, Yintong Huo, Wenxuan Wang, Michael R. Lyu . Arxiv 2024.         GitHub Repo stars

  14. UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs * Yi Gui, Yao Wan, Zhen Li, Zhongyi Zhang, Dongping Chen, Hongyu Zhang, Yi Su, Bohua Chen, Xing Zhou, Wenbin Jiang, Xiangliang Zhang. WWW 2025 (Oral).        

  15. WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs * Yi Gui, Zhen Li, Yao Wan*, Yemin Shi, Hongyu Zhang, Yi Su, Bohua Chen, Dongping Chen, Siyuan Wu, Xing Zhou, Wenbin Jiang, Hai Jin, Xiangliang Zhang. WWW 2025 (Oral).        

  16. Zero-Shot Prompting Approaches for LLM-based Graphical User Interface Generation * Kristian Kolthoff, Felix Kretzer, Lennart Fiebig, Christian Bartelt, Alexander Maedche, Simone Paolo Ponzetto.Arxiv 2024.12.        

  17. Towards Human-AI Synergy in UI Design: Enhancing Multi-Agent Based UI Generation with Intent Clarification and Alignment *Mingyue Yuan, Jieshan Chen, Yongquan Hu, Sidong Feng, Mulong Xie, Gelareh Mohammadi, Zhenchang Xing, Aaron Quigley.Arxiv 2024.12.28

  18. Frontend Diffusion: Empowering Self-Representation of Junior Researchers and Designers Through Agentic Workflows *Zijian Ding, Qinshi Zhang, Mohan Chi, Ziyi Wang. Arxiv 2025.        

  19. UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li. Arxiv 2024.7.11 (UIST 2024).         GitHub Repo stars

  20. Advancing vision-language models in front-end development via data synthesis Tong Ge, Yashu Liu, Jieping Ye, Tianyi Li, Chao Wang . Arxiv 2025.3.3.         GitHub Repo stars

  21. Multimodal graph representation learning for website generation based on visual sketch Tung D. Vu, Chung Hoang, Truong-Son Hy. Arxiv 2025.4.26.         GitHub Repo stars

  22. WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch Zimu Lu, Yunqiao Yang, Houxing Ren, Haotian Hou, Han Xiao, Ke Wang, Weikang Shi, Aojun Zhou, Mingjie Zhan, Hongsheng Li. Arxiv 2025.5.6.         GitHub Repo stars

  23. Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks Kai Xu, YiWei Mao, XinYi Guan, ZiLong Feng. Arxiv 2025.5.12.         GitHub Repo stars

  24. FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow Haoyu Sun, Huichen Will Wang, Jiawei Gu, Linjie Li, Yu Cheng. Arxiv 2025.5.23.         GitHub Repo stars

  25. DesignBench: A Comprehensive Benchmark for MLLM-based Front-end Code Generation Jingyu Xiao, Ming Wang, Man Ho Lam, Yuxuan Wan, Junliang Liu, Yintong Huo, Michael R. Lyu. Arxiv 2025.6.6.         GitHub Repo stars

  26. WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code Zhiyu Lin, Zhengda Zhou, Zhiyuan Zhao, Tianrui Wan, Yilun Ma, Junyu Gao, Xuelong Li. Arxiv 2025.6.9.         GitHub Repo stars

  27. MLLM-Based UI2Code Automation Guided by UI Layout Information Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao. Arxiv 2025.6.12.(ISSTA 2025)         GitHub Repo stars

  28. DesignCoder: Hierarchy-Aware and Self-Correcting UI Code Generation with Large Language Models Fan Wu, Cuiyun Gao, Shuqing Li, Xin-Cheng Wen, Qing Liao. Arxiv 2025.6.16.

  29. FrontendBench: A Benchmark for Evaluating LLMs on Front-End Development via Automatic Evaluation Hongda Zhu, Yiwen Zhang, Bing Zhao, Jingzhe Ding, Siyao Liu, Tong Liu, Dandan Wang, Yanan Liu, Zhaojian Lio. Arxiv 2025.6.16.

  30. ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents Yilei Jiang, Yaozhi Zheng, Yuxuan Wan, Jiaming Han, Qunzhong Wang, Michael R. Lyu, Xiangyu Yue. Arxiv 2025.7.31.         GitHub Repo stars

  31. Generative Interfaces for Language Models Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, Diyi Yang. Arxiv 2025.8.26.         GitHub Repo stars

2. Scientific Plots Code Generation

  1. Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots. Chengyue Wu, Yixiao Ge, Qiushan Guo, Jiahao Wang, Zhixuan Liang, Zeyu Lu, Ying Shan, Ping Luo . Arxiv 2024. (NAACL 2025 Findings)         GitHub Repo stars

  2. MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization. Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun . Arxiv 2024.         GitHub Repo stars

  3. ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation. Chufan Shi, Cheng Yang, Yaxin Liu, Bo Shui, Junjie Wang, Mohan Jing, Linran Xu, Xinyu Zhu, Siheng Li, Yuxiang Zhang, Gongye Liu, Xiaomei Nie, Deng Cai, Yujiu Yang .* Arxiv 2024.         GitHub Repo stars

  4. From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing. Jingxuan Wei, Cheng Tan, Qi Chen, Gaowei Wu, Siyuan Li, Zhangyang Gao, Linzhuang Sun, Bihui Yu, Ruifeng Guo .* Arxiv 2024.        

  5. Is GPT-4V (ision) All You Need for Automating Academic Data Visualization? Exploring Vision-Language Models’ Capability in Reproducing Academic Charts. Zhehao Zhang, Weicheng Ma, Soroush Vosoughi .* EMNLP 2024 (Findings).         GitHub Repo stars

  6. ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding. Zhengzhuo Xu, Bowen Qu, Yiyan Qi, Sinan Du, Chengjin Xu, Chun Yuan, Jian Guo.* Arxiv 2024.9 (ICLR 2025 Oral).         GitHub Repo stars

  7. ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation. Xuanle Zhao, Xianzhen Luo, Qi Shi, Chi Chen, Shuo Wang, Wanxiang Che, Zhiyuan Liu, Maosong Sun.* Arxiv 2025.1 (ACL 2025 Main).         GitHub Repo stars

  8. nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow. Geliang Ouyang, Jingyao Chen, Zhihe Nie, Yi Gui, Yao Wan, Hongyu Zhang, Dongping Chen.* Arxiv 2025.2.7 (ACL 2025 Main).         GitHub Repo stars

  9. METAL: A Multi-Agent Framework for Chart Generation with Test-Time Scaling Bingxuan Li, Yiwei Wang, Jiuxiang Gu, Kai-Wei Chang, Nanyun Peng. Arxiv 2025.2.24 (ACL 2025 Main).         GitHub Repo stars

  10. Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data Zijian Li, Jingjing Fu, Lei Song, Jiang Bian, Jun Zhang, Rui Wang . Arxiv 2025.3.20.

  11. Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning. Zhihan Zhang, Yixin Cao, and Lizi Liao.* Arxiv 2025.4.3.         GitHub Repo stars

  12. Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation. Zhiqing Cui, Jiahao Yuan, Hanqing Wang, Yanshu Li, Chenxu Du, Zhenglong Ding.* Arxiv 2025.4.13.

  13. ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing. Xuanle Zhao, Xuexin Liu, Haoyue Yang, Xianzhen Luo, Fanhu Zeng, Jianling Li, Qi Shi, Chi Chen.* Arxiv 2025.5.17. (ACL 2025 Findings).

  14. ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models. Liyan Tang, Grace Kim, Xinyu Zhao, Thom Lake, Wenxuan Ding, Fangcong Yin, Prasann Singhal, Manya Wadhwa, Zeyu Leo Liu, Zayne Sprague, Ramya Namuduri, Bodun Hu, Juan Diego Rodriguez, Puyuan Peng, Greg Durrett.* Arxiv 2025.5.19.        GitHub Repo stars

  15. ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding. Yifan Wu, Lutao Yan, Leixian Shen, Yinan Mei, Jiannan Wang, Yuyu Luo.* Arxiv 2025.5.21.        GitHub Repo stars

  16. Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework. Zhaorui Yang, Bo Pan, Han Wang, Yiyao Wang, Xingyu Liu, Minfeng Zhu, Bo Zhang, Wei Chen.* Arxiv 2025.6.3.

  17. Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models. Junling Wang, Anna Rutkiewicz, April Yi Wang, Mrinmaya Sachan.* Arxiv 2025.6.4. (ACL 2025 Findings)         GitHub Repo stars

  18. VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation. Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen.* Arxiv 2025.6.4. (EMNLP 2025 Findings)         GitHub Repo stars

  19. Effective Training Data Synthesis for Improving MLLM Chart Understanding. Yuwei Yang, Zeyu Zhang, Yunzhong Hou, Zhuowan Li, Gaowen Liu, Ali Payani, Yuan-Sen Ting, Liang Zheng. Arxiv 2025.8.8. (ICCV 2025)         GitHub Repo stars

  20. Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation. Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Liming Zheng, Yufeng Zhong, Lin Ma. Arxiv 2025.8.19.         GitHub Repo stars

  21. ChartMaster: Advancing Chart-to-Code Generation with Real-World Charts and Chart Similarity Reinforcement Learning. Wentao Tan, Qiong Cao, Chao Xue, Yibing Zhan, Changxing Ding, Xiaodong He. Arxiv 2025.8.25.         GitHub Repo stars

3. Visually Rich Programming and Math

  1. MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems. Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Jing Ma . Arxiv 2024. EMNLP 2024         GitHub Repo stars

  2. HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks. Fengji Zhang, Linquan Wu, Huiyu Bai, Guancheng Lin, Xiao Li, Xiao Yu, Yue Wang, Bei Chen, Jacky Keung . Arxiv 2024.         GitHub Repo stars

  3. DynEx: Dynamic Code Synthesis with Structured Design Exploration for Accelerated Exploratory Programming. Jenny Ma, Karthik Sreedhar, Vivian Liu, Sitong Wang, Pedro Alejandro Perez, Riya Sahni, Lydia B. Chilton . Arxiv 2024.

  4. ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges. Rao Fu, Ziyang Luo, Hongzhan Lin, Zhen Ye, Jing Ma . Arxiv 2024.         GitHub Repo stars

  5. Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities. * Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu .* Arxiv 2025.02.17.         GitHub Repo stars

  6. MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning. * Ke Wang, Junting Pan, Linda Wei, Aojun Zhou, Weikang Shi, Zimu Lu, Han Xiao, Yunqiao Yang, Houxing Ren, Mingjie Zhan, Hongsheng Li .* Arxiv 2025.05.15.         GitHub Repo stars

4. SVG Code Generation and Understanding

  1. StarVector: Generating Scalable Vector Graphics Code from Images and Text Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, Marco Pedersoli. Arxiv 2023. (CVPR 2025)

  2. LogoMotion: Visually Grounded Code Generation for Content-Aware Animation. Vivian Liu, Rubaiat Habib Kazi, Li-Yi Wei, Matthew Fisher, Timothy Langlois, Seth Walker, Lydia Chilton . Arxiv 2024 (CHI 2025).

  3. SVGEditBench: A Benchmark Dataset for Quantitative Assessment of LLM's SVG Editing Capabilities. Kunato Nishina, Yusuke Matsui. Arxiv 2024.4.21 (CVPR 2024 Workshop).         GitHub Repo stars

  4. Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models. Ronghuan Wu, Wanchao Su, Jing Liao . Arxiv 2024 (CVPR 2025).         GitHub Repo stars

  5. Can Large Language Models Understand Symbolic Graphics Programs? Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf. ICLR 2025 (Spotlight).         GitHub Repo stars

  6. LLM4SVG: Empowering LLMs to Understand and Generate Complex Vector Graphics Ximing Xing, Juncheng Hu, Guotao Liang, Jing Zhang, Dong Xu, Qian Yu. CVPR 2025.         GitHub Repo stars

  7. OmniSVG: A Unified Scalable Vector Graphics Generation Model. Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, Yu-Gang Jiang. Arxiv 2025.4.8.         GitHub Repo stars

  8. Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation. Ximing Xing, Yandong Guan, Jing Zhang, Dong Xu, Qian Yu. Arxiv 2025.5.30.

  9. SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation. Siqi Chen, Xinyu Dong, Haolei Xu, Xingyu Wu, Fei Tang, Hang Zhang, Yuchen Yan, Linjuan Wu, Wenqi Zhang, Guiyang Hou, Yongliang Shen, Weiming Lu, Yueting Zhuang. Arxiv 2025.6.3.        GitHub Repo stars

5. Slide Code Generation

  1. AutoPresent: Designing Structured Visuals from Scratch. Jiaxin Ge, Zora Zhiruo Wang, Xuhui Zhou, Yi-Hao Peng, Sanjay Subramanian, Qinyue Tan, Maarten Sap, Alane Suhr, Daniel Fried, Graham Neubig, Trevor Darrell . Arxiv 2025.1.1 (CVPR 2025). GitHub Repo stars

  2. PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides. Hao Zheng, Xinyan Guan, Hao Kong, Jia Zheng, Hongyu Lin, Yaojie Lu, Ben He, Xianpei Han, Le Sun . Arxiv 2025. GitHub Repo stars

  3. Talk to Your Slides: Language-Driven Agents for Efficient Slide Editing. Kyudan Jung, Hojun Cho, Jooyeol Yun, Soyoung Yang, Jaehyeok Jang, Jaegul Choo. Arxiv 2025.5.16 (https://anonymous.4open.science/r/Talk-to-Your-Slides-0F4C)

  4. PreGenie: An Agentic Framework for High-quality Visual Presentation. Xiaojie Xu, Xinli Xu, Sirui Chen, Haoyu Chen, Fan Zhang, Ying-Cong Chen . Arxiv 2025.5.27.

  5. SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design. Wenxin Tang, Jingyu Xiao, Wenxuan Jiang, Xi Xiao, Yuhang Wang, Xuxin Tang, Qing Li, Yuehe Ma, Junliang Liu, Shisong Tang, Michael R. Lyu . Arxiv 2025.6.9. (EMNLP 2025 Main) GitHub Repo stars

  6. PresentAgent: Multimodal Agent for Presentation Video Generation. Jingwei Shi, Zeyu Zhang, Biao Wu, Yanjie Liang, Meng Fang, Ling Chen, Yang Zhao . Arxiv 2025.7.5. GitHub Repo stars

6. Program Repair

  1. SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? John Yang, Carlos E. Jimenez, Alex L. Zhang, Kilian Lieret, Joyce Yang, Xindi Wu, Ori Press, Niklas Muennighoff, Gabriel Synnaeve, Karthik R. Narasimhan, Diyi Yang, Sida I. Wang, Ofir Press. ICLR 2025.

  2. DesignRepair: Dual-Stream Design Guideline-Aware Frontend Repair with Large Language Models * Mingyue Yuan, Jieshan Chen, Zhenchang Xing, Aaron Quigley, Yuyu Luo, Gelareh Mohammadi, Qinghua Lu, Liming Zhu.* ICSE 2025.         GitHub Repo stars

  3. CodeV: Issue Resolving with Visual Data Linhao Zhang, Daoguang Zan, Quanshun Yang, Zhirong Huang, Dong Chen, Bo Shen, Tianyu Liu, Yongshun Gong, Pengjie Huang, Xudong Lu, Guangtai Liang, Lizhen Cui, Qianxiang Wang . Arxiv 2024.         GitHub Repo stars

  4. Seeing is Fixing: Cross-Modal Reasoning with Multimodal LLMs for Visual Software Issue Fixing Kai Huang, Jian Zhang, Xiaofei Xie, Chunyang Chen . Arxiv 2025.6.19.

7. UML and workflow code generation

  1. From Image to UML: First Results of Image-Based UML Diagram Generation using LLMs Arie van Deursen, Eduard C. Groen . LLM4MDE 2024.
  2. StarFlow: Generating Structured Workflow Outputs From Sketch Images *Patrice Bechard, Chao Wang, Amirhossein Abaskohi, Juan Rodriguez, Christopher Pal, David Vazquez, Spandana Gella, Sai Rajeswar, Perouz Taslakian. Arxiv 2025.03.27.

8. CAD code generation

  1. mrCAD: Multimodal Refinement of Computer-aided Designs William P. McCarthy, Saujas Vaduguru, Karl D. D. Willis, Justin Matejka, Judith E. Fan, Daniel Fried, Yewen Pu . Arxiv 2025.04.28.         GitHub Repo stars

  2. CADReview: Automatically Reviewing CAD Programs with Error Detection and Correction Jiali Chen, Xusen Hei, HongFei Liu, Yuancheng Wei, Zikun Deng, Jiayuan Xie, Yi Cai, Li Qing . Arxiv 2025.05.28 (ACL 2025 Main Oral).         GitHub Repo stars

9. Poster code generation

  1. Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Wei Pang, Kevin Qinghong Lin, Xiangru Jian, Xi He, Philip Torr . Arxiv 2025.05.27.         GitHub Repo stars
  2. P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, Zhoujun Li. Arxiv 2025.05.21.         GitHub Repo stars
  3. PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, Chenyu You. Arxiv 2025.08.24.         GitHub Repo stars

10. Multimodal document generation

  1. BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks *Juan Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte, François Savard, Ahmed Masry, Shravan Nayak, Rabiul Awal, Mahsa Massoud, Amirhossein Abaskohi, Zichao Li, Suyuchen Wang, Pierre-André Noël, Mats Leon Richter, Saverio Vadacchino, Shubham Agarwal, Sanket Biswas, Sara Shanian, Ying Zhang, Noah Bolger, Kurt MacDonald, Simon Fauvel, Sathwik Tejaswi, Srinivas Sunkara, Joao Monteiro, Krishnamurthy DJ Dvijotham, Torsten Scholak, Nicolas Chapados, Sepideh Kharagani, Sean Hughes, M. Özsu, Siva Reddy, Marco Pedersoli, Yoshua Bengio, Christopher Pal, Issam Laradji, Spandana Gella, Perouz Taslakian, David Vazquez, Sai Rajeswar. ICLR 2025

  2. Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Zhaorui Yang, Bo Pan, Han Wang, Yiyao Wang, Xingyu Liu, Minfeng Zhu, Bo Zhang, Wei Chen. Arxiv 2025.06.03.

11. 3D point clouds code generation

  1. Real2Code: Reconstruct Articulated Objects via Code Generation Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song. Arxiv 2024.06.12.         GitHub Repo stars

  2. MeshCoder: LLM-Powered Structured Mesh CodeGeneration from Point Clouds Bingquan Dai, Li Ray Luo, Qihong Tang, Jie Wang, Xinyu Lian, Hao Xu, Minghan Qin, Xudong Xu, Bo Dai, Haoqian Wang, Zhaoyang Lyu, Jiangmiao Pang. Arxiv 2025.08.20.         GitHub Repo stars

12. General

  1. Image2Struct: Benchmarking Structure Extraction for Vision-Language Models Josselin Somerville Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, Percy Liang . NeurIPS 2024 Datasets and Benchmarks.

  2. FullStack Bench: Evaluating LLMs as Full Stack Coders Siyao Liu, He Zhu, Jerry Liu, Shulin Xin, Aoyan Li, Rui Long, Li Chen, Jack Yang, Jinxiang Xia, Z.Y. Peng, Shukai Liu, Zhaoxiang Zhang, Jing Mai, Ge Zhang, Wenhao Huang, Kai Shen, Liang Xiang . Arxiv 2024.         GitHub Repo stars

  3. Empowering Agile-Based Generative Software Development through Human-AI Teamwork Sai Zhang, Zhenchang Xing, Ronghui Guo, Fangzhou Xu, Lei Chen, Zhaoyuan Zhang, Xiaowang Zhang, Zhiyong Feng, Zhiqiang Zhuang . TOSEM 2024.         GitHub Repo stars

  4. Automated LaTeX Code Generation from Handwritten Mathematical Expressions Jayaprakash Sundararaj, Akhil Vyas, Benjamin Gonzalez-Maldonado. Arxiv 2024.

  5. ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation Chenchen Zhang, Yuhang Li, Can Xu, Jiaheng Liu, Ao Liu, Shihui Hu, Dengpeng Wu, Guanhua Huang, Kejiao Li, Qi Yi, Ruibin Xiong, Haotian Zhu, Yuanxing Zhang, Yuhao Jiang, Yue Zhang, Zenan Xu, Bohui Zhai, Guoxiang He, Hebin Li, Jie Zhao, Le Zhang, Lingyun Tan, Pengyu Guo, Xianshu Pang, Yang Ruan, Zhifeng Zhang, Zhonghu Wang, Ziyan Xu, Zuopu Yin, Wiggin Zhou, Chayse Zhou, Fengzong Lian. Arxiv 2025.07.07.         GitHub Repo stars

  6. VisCodex: Unified Multimodal Code Generation via Merging Vision and Coding Models Lingjie Jiang, Shaohan Huang, Xun Wu, Yixia Li,Dongdong Zhang,Furu Wei. Arxiv 2025.08.13.         GitHub Repo stars

  7. Multilingual Multimodal Software Developer for Code Generation Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Liran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, Jiaheng Liu, Xianjie Wu, Ge Zhang, Tianyu Liu, Zhoujun Li. Arxiv 2025.07.11.         GitHub Repo stars

🔥Contributing

This is an active repository and your contributions are always welcome! Before you add papers/tools into the awesome list, please make sure that:

  • First, think about which category the work should belong to.
  • The paper or tools is related to Multimodal Large Language Models (MLLMs) for code generation.
  • The paper should be inserted in the correct position in chronological order (publication/arxiv release time).
  • The link to paper should be the arxiv page, not the pdf page if this is a paper posted on arxiv.
  • If the paper is accpeted, please use the correct publication venue instead of arxiv.

🌟 Star History

Star History Chart

About

Multimodal Large Language Models for Code Generation under Multimodal Scenarios

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •