- AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy, Paper (Jun 13, 2025)
- Spurious Rewards: Rethinking Training Signals in RLVR, Paper (Jun 12, 2025)
- RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought, Paper (June 4, 2025)
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Paper (June 7, 2025)
- Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning, Paper (June 2025)
- Does Thinking More always Help? Understanding Test‑Time Scaling in Reasoning Models, Paper (June 2025)
- The Illusion of Thinking: Comment on Shojaee et al., Paper (June 10, 2025)
- WorkForceAgent‑R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning, Paper (May 22, 2025)
- AdaptThink: Reasoning Models Can Learn When to Think, Paper (May 19, 2025)
- Learning When to Think: Shaping Adaptive Reasoning in R1‑Style Models via Multi‑Stage RL, Paper (May 16, 2025)
- Llama-Nemotron: Efficient Reasoning Models, Paper (May 15, 2025)
- Chain‑of‑Thought Tokens are Computer Program Variables, Paper (May 2025)
- SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model, Paper (April 13, 2025)
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, Paper (April 18, 2025)
- Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?, Paper (April 16, 2025)
- Inference‑Time Scaling for Generalist Reward Modeling, Paper (April 3, 2025)
- Test‑Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards, Paper (March 2025)
- SimpleRL‑Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, Paper (March 2025)
- What Makes a Reward Model a Good Teacher? An Optimization Perspective, Paper (March 2025)
- Sketch‑of‑Thought: Efficient LLM Reasoning with Adaptive Cognitive‑Inspired Sketching, Paper (March 2025)
- All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine‑Tuning, Paper (March 2025)
- Reward Shaping to Mitigate Reward Hacking in RLHF, Paper (February 2025)
- Reward-Guided Speculative Decoding for Efficient LLM Reasoning, Paper, (Feb 14, 2025)
- Chain of Draft: Thinking Faster by Writing Less, Paper (February 2025)
- ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, Paper (February 2025)
- Step Back to Leap Forward: Self‑Backtracking for Boosting Reasoning of Language Models, Paper (February 2025)
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post‑training, Paper (January 2025)
- Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization, Paper (January 2025)
- LLMs Can Plan Only If We Tell Them, Paper (January 2025)
- Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain‑of‑Thought, Paper (January 2025)
- SegLLM: Multi-round Reasoning Segmentation, Paper (October 24, 2024)
- Automatic Curriculum Expert Iteration for Reliable LLM Reasoning, Paper (October 2024)
- Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization, Paper (July 2024)
- RouteLLM: Learning to Route LLMs with Preference Data, Paper, (Jun 26, 2024)
- Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study, Paper (June 2024)
- Quiet‑STaR: Language Models Can Teach Themselves to Think Before Speaking, Paper (March 2024)
- Self‑Rewarding Language Models, Paper (January 2024)
- The Impact of Reasoning Step Length on Large Language Models, Paper (January 2024)