GitHub

A repository about techniques to improve the reasoning capability of model.

Reference

Let's Verify Step by Step
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
REFT: Reasoning with REinforced Fine-Tuning
SCoRe Training Language Models to Self-Correct via Reinforcement Learning
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
A Theoretical Understanding of Self-Correction through In-context Alignment
Self-Taught Reasoner Bootstrapping Reasoning With Reasoning
Scalable Online Planning via Reinforcement Learning Fine-Tuning
Azure OpenAI GPT-4o-mini fine-tuning tutorial
Customize a model with fine-tuning and DPO(Direct preference optimization)
Azure DPO(Direct preference optimization)
Approaches to Improve Logical Reasoning in LLMs: The Techniques Behind the O3
Improving Logical Reasoning in LLMs: A Tool of Synthetic Data Generation using Evolutionary Learning and MCTS

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/evolvemcts4rl		data/evolvemcts4rl
docs		docs
infr		infr
train		train
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt