Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains papers, codes, datasets, evaluations, and analyses.
Content Content
Prompt guidance methods make LRMs generate less reasoning text directly by adding explicit length constraint instructions to the prompts.
Active Prompt Guidance methods add the explicit length constraint instructions set by users to the prompt of LRMs to generate short reasoning text.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.02 | Chain of Draft: Thinking Faster by Writing Less | arXiv | link | link |
2025.02 | s1: Simple test-time scaling | arXiv | link | link |
2024.07 | Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | arXiv | link | - |
2024.01 | The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | arXiv | link | link |
Passive Prompt Guidance methods rely on additional models or algorithms to generate explicit length constraint instructions corresponding to different data, rather than user-specified.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2024.12 | Token-Budget-Aware LLM Reasoning | arXiv | link | link |
From the perspective of reinforcement learning, the reward guidance based method designs a reward function optimized for length to generate high-precision answers while reducing the consumption of reasoning tokens as much as possible.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.03 | DAPO: an Open-source RL System from ByteDance Seed and Tsinghua AIR | arXiv | link | link |
2025.03 | L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | arXiv | link | link |
2025.02 | Training Language Models to Reason Efficiently | arXiv | link | link |
2025.02 | Demystifying Long Chain-of-Thought Reasoning in LLMs | arXiv | link | link |
2025.01 | O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | arXiv | link | link |
2025.01 | Kimi k1.5: Scaling Reinforcement Learning with LLMs | arXiv | link | - |
Latent space compression replaces reasoning token with latent space, allowing more flexible and efficient reasoning for LRMs.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.02 | LightThinker: Thinking Step-by-Step Compression | arXiv | link | link |
2025.02 | SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs | arXiv | link | link |
2025.02 | Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach | arXiv | link | link |
2025.02 | Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | arXiv | link | - |
2025.01 | Efficient Reasoning with Hidden Thinking | arXiv | link | - |
2024.12 | Training Large Language Model to Reason in a Continuous Latent Space | arXiv | link | link |
2024.12 | Compressed Chain of Thought: Efficient Reasoning through Dense Representations | arXiv | link | - |
2024.09 | Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | arXiv | link | - |
2024.05 | From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | arXiv | link | link |
2024.03 | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | arXiv | link | link |
2023.10 | Think before you speak: Training Language Models With Pause Tokens | arXiv | link | - |
The routing strategy methods assign different reasoning strategies to the data according to the difficulty or task of the data to simplify the overall reasoning output of the data.
Template routing method usually simplify reasoning output of LRMs by selecting the appropriate reasoning template or reasoning paradigms for data with different task scenarios.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.03 | How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach | arXiv | link | link |
2025.03 | Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | arXiv | link | link |
2025.02 | Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models | arXiv | link | - |
Solution routing based methods work by selectively pruning of solutions in the chain of thought generated by LRMs.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.02 | When More is Less: Understanding Chain-of-Thought Length in LLMs | arXiv | link | - |
2025.02 | Stepwise Informativeness Search for Improving LLM Reasoning | arXiv | link | link |
2025.01 | Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization | arXiv | link | - |
Computation routing based methods usually simplify reasoning output of LRMs by allocating different computing resources to data with different difficulty.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2024.12 | Efficiently Serving LLM Reasoning Programs with Certaindex | arXiv | link | link |
Model distillation based methods construct a well-designed dataset to apply model learning, like SFT/In-context Learning/Preference Learning, thereby simplifying the reasoning output of the LRMs.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.03 | DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models | arXiv | link | - |
2025.02 | Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs | arXiv | link | - |
2025.01 | Kimi k1.5: Scaling Reinforcement Learning with LLMs | arXiv | link | - |
2024.12 | Token-Budget-Aware LLM Reasoning | arXiv | link | link |
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.02 | Stepwise Perplexity-Guided Refinement for Efficient Chain-of-Thought Reasoning in Large Language Models | arXiv | link | - |
2024.01 | The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models | arXiv | link | link |
2023.05 | Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models | arXiv | link | link |
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.08 | Pruning the Unsurprising: Efficient Code Reasoning via First-Token Surprisal | arxiv | link | link |
2025.05 | Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning | arxiv | link | - |
2025.03 | InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models | arXiv | link | - |
2025.02 | Self-Training Elicits Concise Reasoning in Large Language Models | arXiv | link | link |
2025.02 | TokenSkip: Controllable Chain-of-Thought Compression in LLMs | arXiv | link | link |
2025.02 | CoT-Valve: Length-Compressible Chain-of-Thought Tuning | arXiv | link | link |
2025.02 | Stepwise Informativeness Search for Improving LLM Reasoning | arXiv | link | link |
2025.02 | CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation | arXiv | link | - |
2024.12 | C3oT: Generating Shorter Chain-of-Thought without Compromising Effectiveness | arXiv | link | - |
2024.12 | Verbosity-Aware Rationale Reduction: Effective Reduction of Redundant Rationale via Principled Criteria | arXiv | link | - |
2024.12 | Token-Budget-Aware LLM Reasoning | arXiv | link | link |
2024.11 | Can Language Models Learn to Skip Steps? | arXiv | link | link |
Model merge based methods typically merge the parameters of two models with different reasoning styles to obtain an LRMs with inclusion of different reasoning styles.
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.03 | Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | arXiv | link | link |
2025.01 | Kimi k1.5: Scaling Reinforcement Learning with LLMs | arXiv | link | - |
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.04 | Efficient Reasoning Models: A Survey | arXiv | link | link |
2025.03 | A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond | arXiv | link | link |
2025.03 | Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | arXiv | link | link |
Time | Title | Venue | Paper | Code |
---|---|---|---|---|
2025.06 | Accelerated Test-Time Scaling with Model-Free Speculative Sampling | arXiv | link | - |
2025.03 | EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test | arXiv | link | link |
2025.02 | LongSpec: Long-Context Speculative Decoding with Efficient Drafting and Verification | arXiv | link | link |
2024.12 | Bag of Tricks for Inference-time Computation of LLM Reasoning | arXiv | link | link |