Skip to content

Commit a6d855f

Browse files
committed
Update llama2_full.rst and training_llm.rst
1 parent 1bc4b18 commit a6d855f

File tree

2 files changed

+25
-42
lines changed

2 files changed

+25
-42
lines changed

docs/source/user_guide/jobs/tabs/llama2_full.rst

Lines changed: 16 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -14,42 +14,32 @@
1414
.with_compartment_id("<compartment_ocid>")
1515
.with_project_id("<project_ocid>")
1616
.with_subnet_id("<subnet_ocid>")
17-
.with_shape_name("VM.GPU.A10.1")
17+
.with_shape_name("VM.GPU.A10.2")
1818
.with_block_storage_size(256)
1919
)
2020
.with_runtime(
2121
PyTorchDistributedRuntime()
2222
# Specify the service conda environment by slug name.
23-
.with_service_conda("pytorch20_p39_gpu_v1")
23+
.with_service_conda("pytorch20_p39_gpu_v2")
2424
.with_git(
2525
url="https://github.com/facebookresearch/llama-recipes.git",
26-
commit="03faba661f079ee1ecaeb66deaa6bdec920a7bab"
26+
commit="1aecd00924738239f8d86f342b36bacad180d2b3"
2727
)
2828
.with_dependency(
2929
pip_pkg=" ".join([
30-
"'accelerate>=0.21.0'",
31-
"appdirs",
32-
"loralib",
33-
"bitsandbytes==0.39.1",
34-
"black",
35-
"'black[jupyter]'",
36-
"datasets",
37-
"fire",
38-
"'git+https://github.com/huggingface/peft.git'",
39-
"'transformers>=4.31.0'",
40-
"sentencepiece",
41-
"py7zr",
42-
"scipy",
43-
"optimum"
30+
"--extra-index-url https://download.pytorch.org/whl/cu118 torch==2.1.0",
31+
"git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4"
32+
"appdirs==1.4.4",
33+
"llama-recipes==0.0.1",
34+
"py7zr==0.20.6",
4435
])
4536
)
4637
.with_output("/home/datascience/outputs", "oci://bucket@namespace/outputs/$JOB_RUN_OCID")
4738
.with_command(" ".join([
48-
"torchrun llama_finetuning.py",
39+
"torchrun examples/finetuning.py",
4940
"--enable_fsdp",
5041
"--pure_bf16",
5142
"--batch_size_training 1",
52-
"--micro_batch_size 1",
5343
"--model_name $MODEL_NAME",
5444
"--dist_checkpoint_root_folder /home/datascience/outputs",
5545
"--dist_checkpoint_folder fine-tuned"
@@ -87,36 +77,26 @@
8777
spec:
8878
git:
8979
url: https://github.com/facebookresearch/llama-recipes.git
90-
commit: 03faba661f079ee1ecaeb66deaa6bdec920a7bab
80+
commit: 1aecd00924738239f8d86f342b36bacad180d2b3
9181
command: >-
9282
torchrun llama_finetuning.py
9383
--enable_fsdp
9484
--pure_bf16
9585
--batch_size_training 1
96-
--micro_batch_size 1
9786
--model_name $MODEL_NAME
9887
--dist_checkpoint_root_folder /home/datascience/outputs
9988
--dist_checkpoint_folder fine-tuned
10089
replicas: 2
10190
conda:
10291
type: service
103-
slug: pytorch20_p39_gpu_v1
92+
slug: pytorch20_p39_gpu_v2
10493
dependencies:
10594
pipPackages: >-
106-
'accelerate>=0.21.0'
107-
appdirs
108-
loralib
109-
bitsandbytes==0.39.1
110-
black
111-
'black[jupyter]'
112-
datasets
113-
fire
114-
'git+https://github.com/huggingface/peft.git'
115-
'transformers>=4.31.0'
116-
sentencepiece
117-
py7zr
118-
scipy
119-
optimum
95+
--extra-index-url https://download.pytorch.org/whl/cu118 torch==2.1.0
96+
git+https://github.com/huggingface/peft.git@15a013af5ff5660b9377af24d3eee358213d72d4
97+
llama-recipes==0.0.1
98+
appdirs==1.4.4
99+
py7zr==0.20.6
120100
outputDir: /home/datascience/outputs
121101
outputUri: oci://bucket@namespace/outputs/$JOB_RUN_OCID
122102
env:

docs/source/user_guide/model_training/training_llm.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ This page shows an example of fine-tuning the `Llama 2 <https://ai.meta.com/llam
1616
In this example, internet access is needed to download the source code and the pre-trained model.
1717

1818
The `llama-recipes <llama-recipes>`_ repository contains example code to fine-tune llama2 model.
19-
The example `fine-tuning script <https://github.com/facebookresearch/llama-recipes/blob/main/llama_finetuning.py>`_ support full parameter fine-tuning
19+
The example `fine-tuning script <https://github.com/facebookresearch/llama-recipes/blob/1aecd00924738239f8d86f342b36bacad180d2b3/examples/finetuning.py>`_ supports both full parameter fine-tuning
2020
and `Parameter-Efficient Fine-Tuning (PEFT) <https://huggingface.co/blog/peft>`_.
21-
With ADS, you can start the training job by taking the source code directly from Github.
21+
With ADS, you can start the training job by taking the source code directly from Github with no code change.
2222

2323
Access the Pre-Trained Model
2424
============================
@@ -49,13 +49,16 @@ The job run will:
4949

5050
Note that in the training command, there is no need specify the number of nodes, or the number of GPUs. ADS will automatically configure that base on the ``replica`` and ``shape`` you specified.
5151

52-
The fine-tuning runs on the `samsum <https://huggingface.co/datasets/samsum>`_ dataset by default. You can also `add your custom datasets <https://github.com/facebookresearch/llama-recipes/blob/main/docs/Dataset.md#adding-custom-datasets>`_.
52+
The fine-tuning runs on the `samsum <https://huggingface.co/datasets/samsum>`_ dataset by default. You can also `add your custom datasets <https://github.com/facebookresearch/llama-recipes/blob/1aecd00924738239f8d86f342b36bacad180d2b3/docs/Dataset.md>`_.
5353

54-
The same training script also support Parameter-Efficient Fine-Tuning (PEFT). You can change the ``command`` to the following for PEFT with `LoRA <https://huggingface.co/docs/peft/conceptual_guides/lora>`_
54+
Once the fine-tuning is finished, the checkpoints will be saved into OCI object storage bucket as specified.
55+
You can `load the FSDP checkpoints for inferencing <https://github.com/facebookresearch/llama-recipes/blob/main/docs/inference.md#loading-back-fsdp-checkpoints>`_.
56+
57+
The same training script also support Parameter-Efficient Fine-Tuning (PEFT). You can change the ``command`` to the following for PEFT with `LoRA <https://huggingface.co/docs/peft/conceptual_guides/lora>`_. Note that for PEFT, the fine-tuned weights are stored in the location specified by ``--output_dir``, while for full parameter fine-tuning, the checkpoints are stored in the location specified by ``--dist_checkpoint_root_folder`` and ``--dist_checkpoint_folder``
5558

5659
.. code-block:: bash
5760
5861
torchrun llama_finetuning.py --enable_fsdp --use_peft --peft_method lora \
59-
--pure_bf16 --batch_size_training 1 --micro_batch_size 1 \
60-
--model_name /home/datascience/llama --output_dir /home/datascience/outputs
62+
--pure_bf16 --batch_size_training 1 \
63+
--model_name meta-llama/Llama-2-7b-hf --output_dir /home/datascience/outputs
6164

0 commit comments

Comments
 (0)