Skip to content

Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1

Notifications You must be signed in to change notification settings

HJYao00/Awesome-Reasoning-MLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 

Repository files navigation

Awesome-Reasoning-MLLM

👏 Welcome to the Awesome-Reasoning-MLLM repository! This repository is a curated collection of the most influential papers, code, dataset, benchmarks, and resources about Reasoning in Multi-Modal Large Language Models (MLLMs) and Vision-Language Models (VLMs).

Feel free to ⭐ star and fork this repository to keep up with the latest advancements and contribute to the community.

📒 Table of Contents

Post-Training / Reinforcement Learning

  • [LMM-R1, 2503] LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities [Paper📑][Code🔧]
  • [R1-VL, 2503] R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization [Paper📑][Code🔧]
  • [Vision-R1, 2503] Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models [Paper📑] [Code🔧]
  • [VisualThinker-R1-Zero, 2503] R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model [Paper📑] [Code🔧]
  • [MM-Eureka, 2503] MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning [Paper📑] [Code🔧]
  • [Visual-RFT, 2503] Visual-RFT: Visual Reinforcement Fine-Tuning [Paper📑] [Code🔧]
  • [VLM-R1] VLM-R1: A stable and generalizable R1-style Large Vision-Language Model [Code🔧]
  • [R1-V, 2502] R1-V: Reinforcing Super Generalization Ability in Vision Language Models with Less Than $3 [Code🔧]

Tree Search / MCTS

  • [AStar, 2502] Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking [Paper📑]
  • [Mulberry, 2412] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper📑] [Code🔧]

SFT / CoT

  • [MM-Verify, 2502] MM-Verify: Enhancing Multimodal Reasoning with Chain-of-Thought Verification [Paper📑] [Code🔧]
  • [LlamaV-o1, 2501] LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs [Paper📑] [Code🔧]
  • [Insight-V, 2411] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models [Paper📑] [Code🔧]
  • [LLaVA-CoT, 2411] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step [Paper📑] [Code🔧]
  • [LLaVA-Reasoner, 2410] Improve Vision Language Model Chain-of-thought Reasoning [Paper📑] [Code🔧]
  • [Visual-CoT, 2403] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning [Paper📑] [Code🔧]

Data

  • [Mulberry 260K SFT, 2412] Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search [Paper📑] [Code🔧]
  • [LLaVA-CoT 100K SFT, 2411] LLaVA-CoT: Let Vision Language Models Reason Step-by-Step [Paper📑] [Code🔧]

About

Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published