trl

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python transformer adapters huggingface trl safetensors text-generation-inference unsloth qwen2-5 grpo

Updated Apr 7, 2025
Jupyter Notebook

SharathHebbar / sft_mathgpt2

Star

Supervised Fine tuning using TRL library

decoder transformers text-generation sft gpt2 trl llm mathgpt

Updated Jan 24, 2024
Jupyter Notebook

YanCotta / post_training_llms

Star

Different post-training techniques for LLMs, including: SFT, DPO and Online RL

reinforcement-learning pytorch alignment fine-tuning sft dpo huggingface trl huggingface-transformers llm

Updated Jul 9, 2025
Python

Mikesterner87 / Nano-R1

Star

This project demonstrates the process of fine-tuning the Qwen2.5-3B-Instruct model using GRPO (Generalized Reward Policy Optimization) on the GSM8K dataset.

python build openwrt transformer adapters nanopi huggingface trl nanopi-r1s nanopi-r1 safetensors text-generation-inference unsloth grpo

Updated Jul 18, 2025
Jupyter Notebook

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

SharathHebbar / dpo_chatgpt2

Star

Direct Preference Optimization of ChatGPT2 using TRL Library

decoder transformers text-generation dpo gpt2 trl llm rlhf chatgpt2

Updated Jan 24, 2024
Jupyter Notebook

pberlandier / irl-to-bal

Star

ODM: TRL to BAL rules automated translation

translation odm ruleset irl operational-decision-manager verbalization bal-rule technical-rule trl

Updated Dec 6, 2019
Java

YanCotta / reinforcement-fine-tuning-llms-with-grpo

Star

Reinforcement Fine-Tuning LLMs With GRPO

python nlp reinforcement-learning fine-tuning trl huggingface-transformers llm

Updated May 23, 2025
Python

WCoetser / Trl.TermDataRepresentation

Star

The overall aim of this project is to create a term rewriting system that could be useful in everyday programming, and to represent data in a way that roughly correspond to the definition of a term in formal logic. Terms should be familiar to any programmer because they are basically constants, variables, and function symbols.

syntax-tree term-rewriting trl term-database

Updated Dec 16, 2020
C#

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

torotoki / reasoning-minimal

Star

Minimal code to train reasoning model with reinforcement learning.

python reinforcement-learning transformers huggingface trl llm

Updated Aug 9, 2025
Python

Daddy-Myth / Fine-tuning-Flan-T5-RLHF

Star

Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for Neutral, Grammatically Correct News Summaries

nlp reinforcement-learning transformers summarization fine-tuning huggingface trl rlhf flan-t5

Updated Jul 19, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the trl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the trl topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trl

Here are 23 public repositories matching this topic...

jasonvanf / llama-trl

argilla-io / notus

GAD-cell / vlm-grpo

sugarandgugu / Simple-Trl-Training

RobinSmits / Dutch-LLMs

ssbuild / llm_rlhf

yflyzhang / simpleR1

LegendLeoChen / llm-finetune

Akshint0407 / Nano-R1

SharathHebbar / sft_mathgpt2

YanCotta / post_training_llms

Mikesterner87 / Nano-R1

rasyosef / phi-2-sft-and-dpo

SharathHebbar / dpo_chatgpt2

pberlandier / irl-to-bal

YanCotta / reinforcement-fine-tuning-llms-with-grpo

WCoetser / Trl.TermDataRepresentation

rasyosef / phi-1_5-instruct

torotoki / reasoning-minimal

Daddy-Myth / Fine-tuning-Flan-T5-RLHF

Improve this page

Add this topic to your repo