A repository about techniques to improve the reasoning capability of model.
- Let's Verify Step by Step
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- REFT: Reasoning with REinforced Fine-Tuning
- SCoRe Training Language Models to Self-Correct via Reinforcement Learning
- Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
- A Theoretical Understanding of Self-Correction through In-context Alignment
- Self-Taught Reasoner Bootstrapping Reasoning With Reasoning
- Scalable Online Planning via Reinforcement Learning Fine-Tuning
- Azure OpenAI GPT-4o-mini fine-tuning tutorial
- Customize a model with fine-tuning and DPO(Direct preference optimization)
- Azure DPO(Direct preference optimization)
- Approaches to Improve Logical Reasoning in LLMs: The Techniques Behind the O3
- Improving Logical Reasoning in LLMs: A Tool of Synthetic Data Generation using Evolutionary Learning and MCTS