You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+97Lines changed: 97 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,57 @@
23
23
24
24
**TorchRL** is an open-source Reinforcement Learning (RL) library for PyTorch.
25
25
26
+
## 🚀 What's New
27
+
28
+
### LLM API - Complete Framework for Language Model Fine-tuning
29
+
30
+
TorchRL now includes a comprehensive **LLM API** for post-training and fine-tuning of language models! This new framework provides everything you need for RLHF, supervised fine-tuning, and tool-augmented training:
31
+
32
+
- 🤖 **Unified LLM Wrappers**: Seamless integration with Hugging Face models and vLLM inference engines
33
+
- 💬 **Conversation Management**: Advanced `History` class for multi-turn dialogue with automatic chat template detection
34
+
- 🛠️ **Tool Integration**: Built-in support for Python code execution, function calling, and custom tool transforms
35
+
- 🎯 **Specialized Objectives**: GRPO (Group Relative Policy Optimization) and SFT loss functions optimized for language models
36
+
- ⚡ **High-Performance Collectors**: Async data collection with distributed training support
37
+
- 🔄 **Flexible Environments**: Transform-based architecture for reward computation, data loading, and conversation augmentation
38
+
39
+
The LLM API follows TorchRL's modular design principles, allowing you to mix and match components for your specific use case. Check out the [complete documentation](https://pytorch.org/rl/main/reference/llms.html) and [GRPO implementation example](https://github.com/pytorch/rl/tree/main/sota-implementations/grpo) to get started!
40
+
41
+
<details>
42
+
<summary>Quick LLM API Example</summary>
43
+
44
+
```python
45
+
from torchrl.envs.llm import ChatEnv
46
+
from torchrl.modules.llm import TransformersWrapper
47
+
from torchrl.objectives.llm import GRPOLoss
48
+
from torchrl.collectors.llm import LLMCollector
49
+
50
+
# Create environment with Python tool execution
51
+
env = ChatEnv(
52
+
tokenizer=tokenizer,
53
+
system_prompt="You are an assistant that can execute Python code.",
If you feel a feature is missing from the library, please submit an issue!
520
604
If you would like to contribute to new features, check our [call for contributions](https://github.com/pytorch/rl/issues/509) and our [contribution](https://github.com/pytorch/rl/blob/main/CONTRIBUTING.md) page.
521
605
@@ -792,6 +876,18 @@ A series of [State-of-the-Art implementations](https://github.com/pytorch/rl/blo
792
876
<td> NA
793
877
</td>
794
878
</tr>
879
+
<tr>
880
+
<td><ahref="https://github.com/pytorch/rl/blob/main/sota-implementations/grpo">LLM API (GRPO)</a>
881
+
</td>
882
+
<td> NA
883
+
</td>
884
+
<td> +
885
+
</td>
886
+
<td> +
887
+
</td>
888
+
<td> NA
889
+
</td>
890
+
</tr>
795
891
</table>
796
892
797
893
** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on
@@ -800,6 +896,7 @@ A series of [State-of-the-Art implementations](https://github.com/pytorch/rl/blo
800
896
and many more to come!
801
897
802
898
[Code examples](examples/) displaying toy code snippets and training scripts are also available
899
+
-[LLM API & GRPO](sota-implementations/grpo) - Complete language model fine-tuning pipeline
0 commit comments