Awesome-Reasoning-MLLM

👏 Welcome to the Awesome-Reasoning-MLLM repository! This repository is a curated collection of the most influential papers, code, dataset, benchmarks, and resources about Reasoning in Multi-Modal Large Language Models (MLLMs) and Vision-Language Models (VLMs).

Feel free to ⭐ star and fork this repository to keep up with the latest advancements and contribute to the community.

📒 Table of Contents

Awesome-Reasoning-MLLM

Post-Training / Reinforcement Learning

[LMM-R1, 2503] LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities [Paper📑][Code🔧]
[R1-VL, 2503] R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization [Paper📑][Code🔧]
[Vision-R1, 2503] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models [Paper📑] [Code🔧]
[VisualThinker-R1-Zero, 2503] R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model [Paper📑] [Code🔧]
[MM-Eureka, 2503] MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning [Paper📑] [Code🔧]
[Visual-RFT, 2503] Visual-RFT: Visual Reinforcement Fine-Tuning [Paper📑] [Code🔧]
[VLM-R1] VLM-R1: A stable and generalizable R1-style Large Vision-Language Model [Code🔧]
[R1-V, 2502] R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 [Code🔧]

Tree Search / MCTS

[AStar, 2502] Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking [Paper📑]
[Mulberry, 2412] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper📑] [Code🔧]

SFT / CoT

[MM-Verify, 2502] MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification [Paper📑] [Code🔧]
[LlamaV-o1, 2501] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [Paper📑] [Code🔧]
[Insight-V, 2411] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models [Paper📑] [Code🔧]
[LLaVA-CoT, 2411] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step [Paper📑] [Code🔧]
[LLaVA-Reasoner, 2410] Improve Vision Language Model Chain-of-thought Reasoning [Paper📑] [Code🔧]
[Visual-CoT, 2403] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning [Paper📑] [Code🔧]

Data

[Mulberry 260K SFT, 2412] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper📑] [Code🔧]
[LLaVA-CoT 100K SFT, 2411] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step [Paper📑] [Code🔧]

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-Reasoning-MLLM

📒 Table of Contents

Post-Training / Reinforcement Learning

Tree Search / MCTS

SFT / CoT

Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

HJYao00/Awesome-Reasoning-MLLM

Folders and files

Latest commit

History

Repository files navigation

Awesome-Reasoning-MLLM

📒 Table of Contents

Post-Training / Reinforcement Learning

Tree Search / MCTS

SFT / CoT

Data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Packages