Skip to content

Added new Post-training an LLM using GRPO with TRL recipe 🧑‍🍳️ #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion notebooks/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,9 @@
title: Phoenix Observability Dashboard on HF Spaces
- local: search_and_learn
title: Scaling Test-Time Compute for Longer Thinking in LLMs

- local: fine_tuning_llm_grpo_trl
title: Post-training an LLM using GRPO with TRL

- title: Computer Vision Recipes
isExpanded: false
sections:
Expand Down
3,950 changes: 3,950 additions & 0 deletions notebooks/en/fine_tuning_llm_grpo_trl.ipynb

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion notebooks/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ applications and solving various machine learning tasks using open-source tools

Check out the recently added notebooks:

- [Post-training an LLM using GRPO with TRL](fine_tuning_llm_grpo_trl)
- [Vector Search on Hugging Face with the Hub as Backend](vector_search_with_hub_as_backend)
- [Multi-Agent Order Management System with MongoDB](mongodb_smolagents_multi_micro_agents)
- [Scaling Test-Time Compute for Longer Thinking in LLMs](search_and_learn)
- [Signature-Aware Model Serving from MLflow with Ray Serve](mlflow_ray_serve)
- [Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU](fine_tuning_vlm_dpo_smolvlm_instruct)


You can also check out the notebooks in the cookbook's [GitHub repo](https://github.com/huggingface/cookbook).

Expand Down