Skip to content

SafeRL-Lab/AI-Agent-Reasoning-Baselines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

AI-Agent-Reasoning-Papers

Reasoning Paper List

2025

  • AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy, Paper (Jun 13, 2025)
  • Spurious Rewards: Rethinking Training Signals in RLVR, Paper (Jun 12, 2025)
  • RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought, Paper (June 4, 2025)
  • The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Paper (June 7, 2025)
  • Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning, Paper (June 2025)
  • Does Thinking More always Help? Understanding Test‑Time Scaling in Reasoning Models, Paper (June 2025)
  • The Illusion of Thinking: Comment on Shojaee et al., Paper (June 10, 2025)
  • WorkForceAgent‑R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning, Paper (May 22, 2025)
  • AdaptThink: Reasoning Models Can Learn When to Think, Paper (May 19, 2025)
  • Learning When to Think: Shaping Adaptive Reasoning in R1‑Style Models via Multi‑Stage RL, Paper (May 16, 2025)
  • Llama-Nemotron: Efficient Reasoning Models, Paper (May 15, 2025)
  • Chain‑of‑Thought Tokens are Computer Program Variables, Paper (May 2025)
  • SegEarth-R1: Geospatial Pixel Reasoning via Large Language Model, Paper (April 13, 2025)
  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?, Paper (April 18, 2025)
  • Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?, Paper (April 16, 2025)
  • Inference‑Time Scaling for Generalist Reward Modeling, Paper (April 3, 2025)
  • Test‑Time Reasoning Through Visual Human Preferences with VLMs and Soft Rewards, Paper (March 2025)
  • SimpleRL‑Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild, Paper (March 2025)
  • What Makes a Reward Model a Good Teacher? An Optimization Perspective, Paper (March 2025)
  • Sketch‑of‑Thought: Efficient LLM Reasoning with Adaptive Cognitive‑Inspired Sketching, Paper (March 2025)
  • All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine‑Tuning, Paper (March 2025)
  • Reward Shaping to Mitigate Reward Hacking in RLHF, Paper (February 2025)
  • Reward-Guided Speculative Decoding for Efficient LLM Reasoning, Paper, (Feb 14, 2025)
  • Chain of Draft: Thinking Faster by Writing Less, Paper (February 2025)
  • ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates, Paper (February 2025)
  • Step Back to Leap Forward: Self‑Backtracking for Boosting Reasoning of Language Models, Paper (February 2025)
  • SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post‑training, Paper (January 2025)
  • Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization, Paper (January 2025)
  • LLMs Can Plan Only If We Tell Them, Paper (January 2025)
  • Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain‑of‑Thought, Paper (January 2025)

2024

  • SegLLM: Multi-round Reasoning Segmentation, Paper (October 24, 2024)
  • Automatic Curriculum Expert Iteration for Reliable LLM Reasoning, Paper (October 2024)
  • Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization, Paper (July 2024)
  • RouteLLM: Learning to Route LLMs with Preference Data, Paper, (Jun 26, 2024)
  • Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study, Paper (June 2024)
  • Quiet‑STaR: Language Models Can Teach Themselves to Think Before Speaking, Paper (March 2024)
  • Self‑Rewarding Language Models, Paper (January 2024)
  • The Impact of Reasoning Step Length on Large Language Models, Paper (January 2024)

2023

  • Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, Paper (August 2023)
  • Tree of Thoughts: Deliberate Problem Solving with Large Language Models, Paper (May 2023)

2022

  • STaR: Bootstrapping Reasoning With Reasoning, Paper (March 2022)
  • Self‑Consistency Improves Chain of Thought Reasoning in Language Models, Paper (March 2022)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •