🦙 LLaMA3 PPO Training Framework

By RuoChen from ZJU

📋 目录

环境准备
模型下载
模型测试
训练流程
工具脚本
致谢

🚀 环境准备

创建虚拟环境

conda create -n llama3 python=3.10 -y
conda activate llama3
pip install -r requirements.txt

重要依赖

# 指定TRL版本（必须为要求版本！！！）
pip install trl==0.8.0

# 如遇到torch下载问题，请运行如下命令
pip install torch --index-url https://download.pytorch.org/whl/cu124

📥 模型下载

使用以下命令下载 LLaMA3-8B 模型：

python download_model.py https://hf-mirror.com/Undi95/Meta-Llama-3-8B-hf --output ./models/llama3-8b-hf

🧪 模型测试

测试原始模型

python test_origin.py

测试奖励模型

对比原模型与奖励模型对于测试问题与回答对的预测结果：

# 测试数据：./data/test_data.json
python test_reward.py

测试PPO模型

自动测试模式

# 使用预设测试文本，测试数据：./data/comparisonPPO_data.json
python PPO_comparison.py

交互式测试模式

# 手动输入测试文本
python PPO_comparison_chat.py

🎯 训练流程

1. 训练奖励模型

# 训练数据：./data/preference.json
python train_reward.py

2. PPO训练模型

# 训练数据：./data/PPOtrain_data.json
python train_PPO.py

🛠 工具脚本

tools/ 文件夹下包含以下实用工具：

脚本名称	功能描述
`check_PPOConfig.py`	检查PPO训练配置文件
`check_rewardConfig.py`	检查奖励模型训练配置文件
`clear_gpu.py`	清理GPU缓存
`download_model.py`	下载模型

📁 项目结构

.
├── data/
│   ├── preference.json          # 奖励模型训练数据
│   ├── PPOtrain_data.json      # PPO训练数据
│   ├── comparisonPPO_data.json # PPO对比测试数据
│   └── test_data.json          # 奖励模型测试数据
├── models/
│   └── llama3-8b-hf/           # LLaMA3模型文件
├── tools/                      # 工具脚本目录
├── requirements.txt            # 项目依赖
└── [训练和测试脚本]

🙏 致谢

This project would not be possible without the following codebases:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦙 LLaMA3 PPO Training Framework

📋 目录

🚀 环境准备

创建虚拟环境

重要依赖

📥 模型下载

🧪 模型测试

测试原始模型

测试奖励模型

测试PPO模型

自动测试模式

交互式测试模式

🎯 训练流程

1. 训练奖励模型

2. PPO训练模型

🛠 工具脚本

📁 项目结构

🙏 致谢

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Utils		Utils
data		data
tools		tools
PPO_comparison.py		PPO_comparison.py
PPO_comparison_chat.py		PPO_comparison_chat.py
readme.md		readme.md
requirements.txt		requirements.txt
test_origin.py		test_origin.py
test_reward.py		test_reward.py
train_PPO.py		train_PPO.py
train_reward.py		train_reward.py

Chromium0516/Llama3_RL

Folders and files

Latest commit

History

Repository files navigation

🦙 LLaMA3 PPO Training Framework

📋 目录

🚀 环境准备

创建虚拟环境

重要依赖

📥 模型下载

🧪 模型测试

测试原始模型

测试奖励模型

测试PPO模型

自动测试模式

交互式测试模式

🎯 训练流程

1. 训练奖励模型

2. PPO训练模型

🛠 工具脚本

📁 项目结构

🙏 致谢

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages