Skip to content

Update readme to match the API of the fine-tuning tools #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -383,11 +383,11 @@ model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", t
使用CPU进行推理大概需要 60GB 内存。

# 对模型进行微调
开发者可以对 Baichuan-13B-Base 或 Baichuan-13B-Chat 进行微调使用。在此我们测试了与 Baichuan-13B 兼容的微调工具 [LLaMA Efficient Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning),并给出`全量微调`和 `LoRA微调`的两种示范。
开发者可以对 Baichuan-13B-Base 或 Baichuan-13B-Chat 进行微调使用。在此我们测试了与 Baichuan-13B 兼容的微调工具 [LLaMA Efficient Tuning](https://github.com/hiyouga/LLaMA-Efficient-Tuning),并给出`全量微调`和 `LoRA 微调`的两种示范。

在开始之前,开发者需下载 LLaMA Efficient Tuning 项目并按其要求[安装依赖](https://github.com/hiyouga/LLaMA-Efficient-Tuning#getting-started)。
在开始之前,开发者需下载 LLaMA Efficient Tuning 项目并按其要求[安装依赖](https://github.com/hiyouga/LLaMA-Efficient-Tuning/blob/main/README_zh.md#%E5%A6%82%E4%BD%95%E4%BD%BF%E7%94%A8)。

输入数据为放置在项目`data`目录下的 json 文件,`--dataset`选项指定(参考下面示例),多个输入文件用`,`分隔。json 文件示例格式和字段说明如下:
输入数据为放置在项目 `data` 目录下的 json 文件,在训练时用 `--dataset` 选项指定(参考下面示例),多个输入文件用 `,` 分隔。json 文件示例格式和字段说明如下:
```
[
{
Expand All @@ -398,12 +398,12 @@ model = AutoModelForCausalLM.from_pretrained("baichuan-inc/Baichuan-13B-Chat", t
....
]
```
json 文件中存储一个列表,列表的每个元素是一个 sample。其中`instruction`代表用户输入,`input`是可选项,如果开发者同时指定了`instruction``input`,会把二者用`\n`连接起来代表用户输入;`output`代表期望的模型输出。
json 文件中存储一个列表,列表的每个元素是一个 sample。其中 `instruction` 代表用户输入,`input` 是可选项,如果开发者同时指定了 `instruction``input`,会把二者用 `\n` 连接起来代表用户输入;`output` 代表期望的模型输出。

下面我们给出两种微调场景下测试跑通的示范脚本。

## 全量微调
我们在 8 * Nvidia A100 80 GB + deepspeed 的环境下进行了全量微调测试。
我们在 8 * Nvidia A100 80 GB + DeepSpeed 的环境下进行了全量微调测试。

训练启动脚本示例:
```shell
Expand All @@ -412,6 +412,7 @@ deepspeed --num_gpus=8 src/train_bash.py \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--template default \
--finetuning_type full \
--output_dir path_to_your_sft_checkpoint \
--overwrite_cache \
Expand All @@ -426,18 +427,20 @@ deepspeed --num_gpus=8 src/train_bash.py \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--val_size 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16 \
--deepspeed deepspeed.json
```

deep_speed.json 配置示例:
`deepspeed.json` 配置示例:
```json
{
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": "auto",
Expand All @@ -454,7 +457,7 @@ deep_speed.json 配置示例:
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients" : true
"contiguous_gradients": true
}
}
```
Expand All @@ -470,6 +473,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--template default \
--finetuning_type lora \
--lora_rank 8 \
--lora_target W_pack \
Expand All @@ -486,7 +490,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--val_size 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
Expand Down
14 changes: 9 additions & 5 deletions README_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ The .json file stores a list, each element of which is a sample. `instruction` r
Below, we provide demonstration scripts that have been successfully tested in two fine-tuning scenarios.

## Full Params Fine-tuning
We tested under 8 * Nvidia A100 80 GB + deepspeed for full params fine-tuning.
We tested under 8 * Nvidia A100 80 GB + DeepSpeed for full params fine-tuning.

Example of script to start fine-tuning:
```shell
Expand All @@ -389,6 +389,7 @@ deepspeed --num_gpus=8 src/train_bash.py \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--template default \
--finetuning_type full \
--output_dir path_to_your_sft_checkpoint \
--overwrite_cache \
Expand All @@ -403,18 +404,20 @@ deepspeed --num_gpus=8 src/train_bash.py \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--val_size 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
--fp16 \
--deepspeed deepspeed.json
```

Example of deep_speed.json:
Example of `deepspeed.json`:
```json
{
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": "auto",
Expand All @@ -431,7 +434,7 @@ Example of deep_speed.json:
"overlap_comm": false,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients" : true
"contiguous_gradients": true
}
}
```
Expand All @@ -446,6 +449,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--model_name_or_path baichuan-inc/Baichuan-13B-Base \
--do_train \
--dataset alpaca_gpt4_en,alpaca_gpt4_zh \
--template default \
--finetuning_type lora \
--lora_rank 8 \
--lora_target W_pack \
Expand All @@ -462,7 +466,7 @@ CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--learning_rate 5e-5 \
--max_grad_norm 0.5 \
--num_train_epochs 2.0 \
--dev_ratio 0.01 \
--val_size 0.01 \
--evaluation_strategy steps \
--load_best_model_at_end \
--plot_loss \
Expand Down