RL4LMS: Reinforcement Learning for Language Model Supervision

RL4LMS is a powerful and flexible library designed for fine-tuning large language models (LLMs) using reinforcement learning, with a primary focus on the GRPO (Generalized Reinforcement Policy Optimization) algorithm. This library provides researchers and practitioners with a robust framework for implementing custom reward functions, environments, and training loops to optimize language models for specific tasks.

Features

RL4LMS comes packed with powerful features designed to streamline the process of fine-tuning language models:

🔄 Flexible Reward Function API: Intuitive interface for defining custom reward functions tailored to your specific task
🤗 HuggingFace Integration: Seamless compatibility with all HuggingFace Transformers models
⚡ Efficient Training: Optimized for both single and multi-GPU training with minimal setup
🧩 Extensible Architecture: Modular design that makes it easy to add new components and environments
📊 Built-in Evaluation: Comprehensive tools for monitoring and evaluating model performance
🎮 Wordle Environment: Built-in Wordle game environment for RL training and experimentation

Installation

RL4LMS can be installed with just a few simple steps:

Clone the repository

git clone https://github.com/YanCotta/reinforcement-fine-tuning-llms-with-grpo.git
cd reinforcement-fine-tuning-llms-with-grpo

Set up a virtual environment (recommended):

# Create and activate virtual environment
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install the package in development mode
```
pip install -e .
```
Install additional dependencies
```
pip install -r requirements.txt
```

Optional: Install with development dependencies

For contributing to the project or running tests:

pip install -e ".[dev]"

Quick Start

Fine-tuning on Wordle

RL4LMS includes a ready-to-use implementation for fine-tuning language models on the Wordle game. Here's how to get started:

Prepare your environment as described in the Installation section
Run the example script:
```
python examples/wordle_finetuning.py
```

Basic Usage Example

Here's a minimal example showing how to use RL4LMS to fine-tune a model:

from rl4lms.trainer import GRPOTrainer
from rl4lms.reward_functions.wordle import WordleRewardFunction
from rl4lms.envs.wordle_env import WordleEnv

# Initialize components
model = AutoModelForCausalLM.from_pretrained("gpt2")
ref_model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
reward_fn = WordleRewardFunction()

# Create trainer and start training
trainer = GRPOTrainer(
    model=model,
    ref_model=ref_model,
    tokenizer=tokenizer,
    reward_fn=reward_fn,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    batch_size=8,
    num_epochs=3,
    learning_rate=1e-5,
    output_dir="./wordle_grpo_output"
)

trainer.train()

Project Structure

rl4lms/
├── envs/                  # Environment implementations
│   ├── __init__.py
│   └── wordle_env.py      # Wordle game environment
├── losses/
│   ├── __init__.py
│   └── grpo_loss.py       # GRPO loss implementation
├── models/                # Model architectures
│   └── __init__.py
├── reward_functions/      # Reward function implementations
│   ├── __init__.py
│   ├── base.py           # Base reward function class
│   └── wordle.py         # Wordle-specific reward functions
├── trainer/
│   ├── __init__.py
│   └── grpo_trainer.py   # Training loop implementation
└── utils/                 # Utility functions
    └── __init__.py

examples/                # Example scripts
├── wordle_finetuning.py  # Wordle fine-tuning example

tests/                   # Unit tests
└── test_reward_functions.py

Custom Reward Functions

To create a custom reward function, inherit from the RewardFunction base class and implement the __call__ method:

from rl4lms.reward_functions import RewardFunction
import torch

class MyRewardFunction(RewardFunction):
    def __init__(self, **kwargs):
        super().__init__()
        # Initialize any parameters
        
    def __call__(self, prompt_texts, generated_texts, **kwargs):
        """
        Calculate rewards for generated text.
        
        Args:
            prompt_texts: List of input prompts
            generated_texts: List of generated texts to score
            **kwargs: Additional metadata
            
        Returns:
            torch.Tensor: Tensor of rewards for each generated text
        """
        # Calculate rewards here
        rewards = torch.ones(len(generated_texts))  # Example: return 1 for each text
        return rewards

Documentation

For detailed documentation, including API references, advanced usage examples, and tutorials, please visit our documentation site.

Contributing

We welcome contributions from the community! Whether you're fixing bugs, adding new features, or improving documentation, your help is greatly appreciated.

How to Contribute

Fork the repository on GitHub
Clone your fork locally
Create a new branch for your changes
Commit your changes with clear, descriptive messages
Push your changes to your fork
Open a Pull Request with a clear description of your changes

Development Setup

Install development dependencies:
```
pip install -e ".[dev]"
```
Run tests:
```
pytest tests/
```
Format your code:
```
black .
isort .
```
Check for code style issues:
```
flake8 src tests
mypy src
```

Contact

For questions, suggestions, or support, please reach out:

Email: yanpcotta@gmail.com
GitHub: @YanCotta
Issues: Open an issue

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project was inspired by the course "Reinforcement Fine-Tuning LLMs With GRPO".
Built with ❤️ using PyTorch and HuggingFace Transformers.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
examples		examples
src/rl4lms		src/rl4lms
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL4LMS: Reinforcement Learning for Language Model Supervision

Table of Contents

Features

Installation

Optional: Install with development dependencies

Quick Start

Fine-tuning on Wordle

Basic Usage Example

Project Structure

Custom Reward Functions

Documentation

Contributing

How to Contribute

Development Setup

Contact

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

YanCotta/reinforcement-fine-tuning-llms-with-grpo

Folders and files

Latest commit

History

Repository files navigation

RL4LMS: Reinforcement Learning for Language Model Supervision

Table of Contents

Features

Installation

Optional: Install with development dependencies

Quick Start

Fine-tuning on Wordle

Basic Usage Example

Project Structure

Custom Reward Functions

Documentation

Contributing

How to Contribute

Development Setup

Contact

License

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages