Skip to content

gullalc/multimodal_r1_papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 

Repository files navigation

Deepseek RL (GRPO)-Inspired Research for Vision & Multimodal Reasoning

This repository maintains a curated and updated collection of research papers and repositories focused on Reinforcement Learning (RL)-inspired methods such as Group Relative Policy Optimization (GRPO) applied to Vision and Multimodal Reasoning tasks.

Papers

Date Paper Repository
July 2, 2025 Kwai Keye-VL Technical Report Github
July 1, 2025 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Github
June 28, 2025 Listener-Rewarded Thinking in VLMs for Image Preferences -
June 27, 2025 MiCo: Multi-image Contrast for Reinforcement Visual Reasoning -
June 26, 2025 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models Github
June 25, 2025 MMSearch-R1: Incentivizing LMMs to Search Github
June 20, 2025 Aha Moment Revisited: Are VLMs Truly Capable of Self Verification in Inference-time Scaling? -
June 20, 2025 Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Github
June 20, 2025 Enhancing Step-by-Step and Verifiable Medical Reasoning in MLLMs Github
June 19, 2025 GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning Github
June 16, 2025 Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning Github
June 11, 2025 ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Github
June 10, 2025 VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning Github
June 9, 2025 DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO Github
June 4, 2025 Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Github
June 4, 2025 RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics Github
June 4, 2025 MiMo-VL Technical Report Github
June 2, 2025 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Github
June 2, 2025 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Github
May 29, 2025 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Github
May 28, 2025 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO Github
May 28, 2025 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start Github
May 27, 2025 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs Github
May 26, 2025 Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Github
May 24, 2025 Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models Github
May 22, 2025 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning Github
May 22, 2025 VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought -
May 22, 2025 SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward Github
May 20, 2025 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning Github
May 20, 2025 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Github
May 20, 2025 VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank Github
May 20, 2025 Visual Agentic Reinforcement Fine-Tuning Github
May 17, 2025 VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Github
May 16, 2025 GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning -
May 12, 2025 DanceGRPO: Unleashing GRPO on Visual Generation Github
May 11, 2025 Seed1.5-VL Technical Report Github
May 6, 2025 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Github
May 6, 2025 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Github
May 5, 2025 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Github
Apr 23, 2025 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Github
Apr 21, 2025 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Github
Apr 21, 2025 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Github
Apr 14, 2025 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Github
Apr 13, 2025 TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Github
Apr 10, 2025 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Github
Apr 10, 2025 SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models -
Apr 10, 2025 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Github
Apr 10, 2025 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Github
Apr 9, 2025 VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning Github
Apr 8, 2025 Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Github
Apr 3, 2025 Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Github
Apr 1, 2025 Improved Visual-Spatial Reasoning via R1-Zero-Like Training Github
Apr 1, 2025 Think or Not Think: A Study of Explicit Thinking inRule-Based Visual Reinforcement Fine-Tuning Github
Mar 31, 2025 Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Github
Mar 28, 2025 OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning -
Mar 27, 2025 Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning Github
Mar 27, 2025 Video-R1: Reinforcing Video Reasoning in MLLMs Github
Mar 23, 2025 Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning Github
Mar 21, 2025 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Github
Mar 18, 2025 Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Github
Mar 18, 2025 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Github
Mar 17, 2025 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization GitHub
Mar 13, 2025 R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization GitHub
Mar 10, 2025 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL GitHub
Mar 10, 2025 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning GitHub
Mar 10, 2025 Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning GitHub
Mar 9, 2025 Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models GitHub
Mar 7, 2025 R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model GitHub
Mar 4, 2025 Visual-RFT: Visual Reinforcement Fine-Tuning GitHub
Feb 27, 2025 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning -
Feb 27, 2025 Med-RLVR: Emerging Medical Reasoning from a 3B base model via Reinforcement Learning -
Feb 20, 2025 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO GitHub

Repositories Only

Surveys

About

Deepseek RL (GRPO)-Inspired Research for Vision & Multimodal Reasoning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published