Skip to content

Commit ab4e2e5

Browse files
HyeongminMoonHyeongmin Moonyaozhewei
authored
Fixed typo with --only_optimize_lora (deepspeedai#331)
Co-authored-by: Hyeongmin Moon <mohomin@mirageworks.co.kr> Co-authored-by: Zhewei Yao <zheweiy@berkeley.edu>
1 parent dfb9491 commit ab4e2e5

File tree

5 files changed

+5
-5
lines changed

5 files changed

+5
-5
lines changed

applications/DeepSpeed-Chat/train.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ def launch_cmd(args, step_num, cmd):
181181
f"If you are seeing an OOM error, try modifying {get_script(args, step_num)}:",
182182
" - Reduce `--per_device_*_batch_size`",
183183
" - Increase `--zero_stage {0,1,2,3}` on multi-gpu setups",
184-
" - Enable `--gradient_checkpointing` or `--only_optimizer_lora`"
184+
" - Enable `--gradient_checkpointing` or `--only_optimize_lora`"
185185
)))
186186

187187

applications/DeepSpeed-Chat/training/step1_supervised_finetuning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Most of the arguments used in the main.py file have clear explanations and are u
5959
| --lora_dim | When it is larger than 0, LoRA will be enabled | Usually, LoRA needs a larger learning rate for better convergence |
6060
| --lora_module_name | The scope to enable LoRA module. | |
6161
| --only_optimize_lora | Freeze all othre paramters and only optimize LoRA-related prameters | |
62-
| --gradient_checkpoint, --lora_dim, only_optimizer_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the augo-grad system backend by PyTorch) |
62+
| --gradient_checkpoint, --lora_dim, only_optimize_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the augo-grad system backend by PyTorch) |
6363

6464
One important consideration for users is determining the maximum model size they can train using their current system. Here, we present a method for estimating this limit. Assuming that you do not use the offload feature and enable (i) zero stage 3 (if using multiple GPUs), (ii) gradient checkpoint, and (iii) LoRA, the approximate maximum model size (in billions of parameters) that you can train can be estimated as "Total GPU memory in GB divided by 3." For example, if you have a single A6000-48G GPU, you can probably train models up to 16 billion parameters. It is important to note that this is a rough estimation, and you should verify it by yourselves.
6565

applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ def parse_args():
169169
if args.gradient_checkpointing and args.lora_dim > 0:
170170
assert (
171171
not args.only_optimize_lora
172-
), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
172+
), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
173173

174174
return args
175175

applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ def parse_args():
169169
if args.gradient_checkpointing and args.lora_dim > 0:
170170
assert (
171171
not args.only_optimize_lora
172-
), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
172+
), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
173173

174174
return args
175175

applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ def parse_args():
296296
and args.critic_lora_dim > 0):
297297
assert (
298298
not args.only_optimize_lora
299-
), "--{actor,critic}_gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
299+
), "--{actor,critic}_gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
300300

301301
if args.inference_tp_size > 1:
302302
assert (

0 commit comments

Comments
 (0)