Skip to content

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

Notifications You must be signed in to change notification settings

Hongcheng-Gao/Awesome-Long2short-on-LRMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 

Repository files navigation

Awesome-Long2short-on-LRMs

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

Content Content

Length-Aware Guidance

Prompt Guidance

Prompt guidance methods make LRMs generate less reasoning text directly by adding explicit length constraint instructions to the prompts.

Active Prompt Guidance

Active Prompt Guidance methods add the explicit length constraint instructions set by users to the prompt of LRMs to generate short reasoning text.

Time Title Venue Paper Code
2025.02 Chain of Draft: Thinking Faster by Writing Less arXiv link link
2025.02 s1: Simple test-time scaling arXiv link link
2024.07 Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost arXiv link -
2024.01 The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models arXiv link link

Passive Prompt Guidance

Passive Prompt Guidance methods rely on additional models or algorithms to generate explicit length constraint instructions corresponding to different data, rather than user-specified.

Time Title Venue Paper Code
2024.12 Token-Budget-Aware LLM Reasoning arXiv link link

Reward Guidance

From the perspective of reinforcement learning, the reward guidance based method designs a reward function optimized for length to generate high-precision answers while reducing the consumption of reasoning tokens as much as possible.

Time Title Venue Paper Code
2025.03 DAPO: an Open-source RL System from ByteDance Seed and Tsinghua AIR arXiv link link
2025.03 L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning arXiv link link
2025.02 Training Language Models to Reason Efficiently arXiv link link
2025.02 Demystifying Long Chain-of-Thought Reasoning in LLMs arXiv link link
2025.01 O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning arXiv link link
2025.01 Kimi k1.5: Scaling Reinforcement Learning with LLMs arXiv link -

Length-Agnostic Optimization

Latent Space Compression

Latent space compression replaces reasoning token with latent space, allowing more flexible and efficient reasoning for LRMs.

Time Title Venue Paper Code
2025.02 LightThinker: Thinking Step-by-Step Compression arXiv link link
2025.02 SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs arXiv link link
2025.02 Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach arXiv link link
2025.02 Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning arXiv link -
2025.01 Efficient Reasoning with Hidden Thinking arXiv link -
2024.12 Training Large Language Model to Reason in a Continuous Latent Space arXiv link link
2024.12 Compressed Chain of Thought: Efficient Reasoning through Dense Representations arXiv link -
2024.09 Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding arXiv link -
2024.05 From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step arXiv link link
2024.03 Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking arXiv link link
2023.10 Think before you speak: Training Language Models With Pause Tokens arXiv link -

Routing Strategy

The routing strategy methods assign different reasoning strategies to the data according to the difficulty or task of the data to simplify the overall reasoning output of the data.

Template Routing

Template routing method usually simplify reasoning output of LRMs by selecting the appropriate reasoning template or reasoning paradigms for data with different task scenarios.

Time Title Venue Paper Code
2025.03 How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach arXiv link link
2025.03 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching arXiv link link
2025.02 Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models arXiv link -

Solution Routing

Solution routing based methods work by selectively pruning of solutions in the chain of thought generated by LRMs.

Time Title Venue Paper Code
2025.02 When More is Less: Understanding Chain-of-Thought Length in LLMs arXiv link -
2025.02 Stepwise Informativeness Search for Improving LLM Reasoning arXiv link link
2025.01 Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization arXiv link -

Computation Routing

Computation routing based methods usually simplify reasoning output of LRMs by allocating different computing resources to data with different difficulty.

Time Title Venue Paper Code
2024.12 Efficiently Serving LLM Reasoning Programs with Certaindex arXiv link link

Model Distillation

Model distillation based methods construct a well-designed dataset to apply model learning, like SFT/In-context Learning/Preference Learning, thereby simplifying the reasoning output of the LRMs.

Model Distillation for Preference Learning

Time Title Venue Paper Code
2025.03 DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models arXiv link -
2025.02 Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs arXiv link -
2025.01 Kimi k1.5: Scaling Reinforcement Learning with LLMs arXiv link -
2024.12 Token-Budget-Aware LLM Reasoning arXiv link link

Model Distillation for In-context Learning

Time Title Venue Paper Code
2025.02 Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models arXiv link -
2024.01 The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models arXiv link link
2023.05 Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models arXiv link link

Model Distillation for SFT

Time Title Venue Paper Code
2025.08 Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal arxiv link link
2025.05 Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning arxiv link -
2025.03 InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models arXiv link -
2025.02 Self-Training Elicits Concise Reasoning in Large Language Models arXiv link link
2025.02 TokenSkip: Controllable Chain-of-Thought Compression in LLMs arXiv link link
2025.02 CoT-Valve: Length-Compressible Chain-of-Thought Tuning arXiv link link
2025.02 Stepwise Informativeness Search for Improving LLM Reasoning arXiv link link
2025.02 CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation arXiv link -
2024.12 C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness arXiv link -
2024.12 Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria arXiv link -
2024.12 Token-Budget-Aware LLM Reasoning arXiv link link
2024.11 Can Language Models Learn to Skip Steps? arXiv link link

Model Merge

Model merge based methods typically merge the parameters of two models with different reasoning styles to obtain an LRMs with inclusion of different reasoning styles.

Time Title Venue Paper Code
2025.03 Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging arXiv link link
2025.01 Kimi k1.5: Scaling Reinforcement Learning with LLMs arXiv link -

Survey

Time Title Venue Paper Code
2025.04 Efficient Reasoning Models: A Survey arXiv link link
2025.03 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond arXiv link link
2025.03 Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models arXiv link link

Others

Time Title Venue Paper Code
2025.06 Accelerated Test-Time Scaling with Model-Free Speculative Sampling arXiv link -
2025.03 EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test arXiv link link
2025.02 LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification arXiv link link
2024.12 Bag of Tricks for Inference-time Computation of LLM Reasoning arXiv link link

Contributors

Hongcheng-Gao xinlong-yang yueliu1999 ColorDavid junming-yang yili-19

About

Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7