Skip to content

hanzoai/grpo

Repository files navigation

# GRPO: Guided Reinforcement Policy Optimization for LLM Fine-tuning A comprehensive guide and toolkit for fine-tuning language models using reinforcement learning techniques on the Hanzo AI platform. ## Overview GRPO (Guided Reinforcement Policy Optimization) enables you to transform general language models into domain-specific experts using custom reward signals and reinforcement learning. This repository provides both the original implementation and Hanzo AI cloud platform integration. ## Quick Start ### Using Hanzo AI Cloud Platform ```bash # Install Hanzo CLI pip install hanzoai-cli # Login to Hanzo Cloud hanzo auth login # Create a new GRPO project hanzo ml create grpo-project --type=reinforcement-learning # Deploy your dataset hanzo ml dataset upload --file=skippy_knowledge_base.csv --project=grpo-project # Start fine-tuning hanzo ml train --config=config/grpo_config.yaml --project=grpo-project ``` ### Local Development ```bash # Clone the repository git clone https://github.com/hanzoai/grpo cd grpo # Install dependencies pip install -r requirements.txt # Prepare your dataset python scripts/prepare_dataset.py --input=data/raw --output=data/processed # Run training python src/train.py --config=config/local_config.yaml ``` ## Features - **Custom Reward Functions**: Define domain-specific reward signals - **Parameter-Efficient Fine-Tuning**: Support for LoRA and QLoRA - **Multi-Model Support**: Compatible with Qwen, Llama, and other models - **Hanzo AI Integration**: Seamless deployment on Hanzo cloud infrastructure - **Kubeflow ML Platform**: Enterprise-grade ML operations support ## Documentation - [Complete Guide](docs/guide.md) - Detailed walkthrough of GRPO concepts - [Hanzo Integration](docs/hanzo_integration.md) - Using GRPO with Hanzo AI platform - [API Reference](docs/api_reference.md) - Complete API documentation - [Examples](examples/) - Sample implementations and use cases ## Requirements ### Core Libraries ```python datasets>=2.14.0 transformers>=4.35.0 trl>=0.7.0 torch>=2.0.0 peft>=0.6.0 accelerate>=0.24.0 ``` ### Hanzo AI Platform Libraries ```python hanzoai>=0.1.0 hanzoai-ml>=0.1.0 hanzoai-cli>=0.1.0 ``` ## Repository Structure ``` grpo/ � README.md # This file � requirements.txt # Python dependencies � setup.py # Package setup � config/ # Configuration files � � grpo_config.yaml # Default GRPO configuration � � kubeflow/ # Kubeflow pipeline configs � src/ # Source code � � grpo/ # Core GRPO implementation � � hanzo/ # Hanzo AI integrations � � utils/ # Utility functions � scripts/ # Helper scripts � � prepare_dataset.py � � deploy_to_hanzo.py � examples/ # Example implementations � � skippy/ # Skippy platform example � � custom_rewards/ # Custom reward functions � docs/ # Documentation � � guide.md # Complete GRPO guide � � hanzo_integration.md � � api_reference.md � tests/ # Unit tests ``` ## License MIT License - see [LICENSE](LICENSE) file for details. ## Contributing Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project. ## Support - [Documentation](https://docs.hanzo.ai/grpo) - [GitHub Issues](https://github.com/hanzoai/grpo/issues) - [Hanzo AI Community](https://community.hanzo.ai)

About

GRPO: Guided Reinforcement Policy Optimization for LLM fine-tuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published