Fixed typo with --only_optimize_lora (deepspeedai#331)

HyeongminMoon · Hyeongmin Moon · yaozhewei · web-flow · commit ab4e2e54620d · 2023-04-24T09:34:38.000-07:00
Co-authored-by: Hyeongmin Moon &lt;mohomin@mirageworks.co.kr&gt;
Co-authored-by: Zhewei Yao &lt;zheweiy@berkeley.edu&gt;
diff --git a/applications/DeepSpeed-Chat/train.py b/applications/DeepSpeed-Chat/train.py
@@ -181,7 +181,7 @@ def launch_cmd(args, step_num, cmd):
             f"If you are seeing an OOM error, try modifying {get_script(args, step_num)}:",
             "  - Reduce `--per_device_*_batch_size`",
             "  - Increase `--zero_stage {0,1,2,3}` on multi-gpu setups",
-            "  - Enable `--gradient_checkpointing` or `--only_optimizer_lora`"
+            "  - Enable `--gradient_checkpointing` or `--only_optimize_lora`"
         )))
 
 
diff --git a/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/README.md b/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/README.md
@@ -59,7 +59,7 @@ Most of the arguments used in the main.py file have clear explanations and are u
 | --lora_dim                                               | When it is larger than 0, LoRA will be enabled                                           | Usually, LoRA needs a larger learning rate for better convergence                                                                                                                                                                 |
 | --lora_module_name                                       | The scope to enable LoRA module.                                                         |                                                                                                                                                                                            |
 | --only_optimize_lora                                     | Freeze all othre paramters and only optimize LoRA-related prameters                      |                                                                                                                                                                                            |
-| --gradient_checkpoint,   --lora_dim, only_optimizer_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA   cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the   augo-grad system backend by PyTorch)                                                                                 |
+| --gradient_checkpoint,   --lora_dim, only_optimize_lora | When LoRA and Gradient Checkpointing are enabled. Only Optimize LoRA   cannot be enabled | If all three are enabled, it will affect the gradient flow (aka the   augo-grad system backend by PyTorch)                                                                                 |
 
 One important consideration for users is determining the maximum model size they can train using their current system. Here, we present a method for estimating this limit. Assuming that you do not use the offload feature and enable (i) zero stage 3 (if using multiple GPUs), (ii) gradient checkpoint, and (iii) LoRA, the approximate maximum model size (in billions of parameters) that you can train can be estimated as "Total GPU memory in GB divided by 3." For example, if you have a single A6000-48G GPU, you can probably train models up to 16 billion parameters. It is important to note that this is a rough estimation, and you should verify it by yourselves.
 
diff --git a/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py b/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/main.py
@@ -169,7 +169,7 @@ def parse_args():
     if args.gradient_checkpointing and args.lora_dim > 0:
         assert (
             not args.only_optimize_lora
-        ), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
+        ), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
 
     return args
 
diff --git a/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py b/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/main.py
@@ -169,7 +169,7 @@ def parse_args():
     if args.gradient_checkpointing and args.lora_dim > 0:
         assert (
             not args.only_optimize_lora
-        ), "--gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
+        ), "--gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
 
     return args
 
diff --git a/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py b/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py
@@ -296,7 +296,7 @@ def parse_args():
                                              and args.critic_lora_dim > 0):
         assert (
             not args.only_optimize_lora
-        ), "--{actor,critic}_gradient_checkpointing and --only_optimizer_lora cannot be enabled at the same time."
+        ), "--{actor,critic}_gradient_checkpointing and --only_optimize_lora cannot be enabled at the same time."
 
     if args.inference_tp_size > 1:
         assert (