Skip to content

Commit 8631416

Browse files
committed
Add train LLM docs.
1 parent 566ded9 commit 8631416

File tree

3 files changed

+188
-0
lines changed

3 files changed

+188
-0
lines changed
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
.. tabs::
2+
3+
.. code-tab:: python
4+
:caption: Python
5+
6+
from ads.jobs import Job, DataScienceJob, PyTorchDistributedRuntime
7+
8+
job = (
9+
Job(name="LLAMA2-Fine-Tuning")
10+
.with_infrastructure(
11+
DataScienceJob()
12+
.with_log_group_id("<log_group_ocid>")
13+
.with_log_id("<log_ocid>")
14+
.with_compartment_id("<compartment_ocid>")
15+
.with_project_id("<project_ocid>")
16+
.with_subnet_id("<subnet_ocid>")
17+
.with_shape_name("VM.GPU.A10.1")
18+
.with_block_storage_size(256)
19+
)
20+
.with_runtime(
21+
PyTorchDistributedRuntime()
22+
# Specify the service conda environment by slug name.
23+
.with_service_conda("pytorch20_p39_gpu_v1")
24+
.with_git(
25+
url="https://github.com/facebookresearch/llama-recipes.git",
26+
commit="03faba661f079ee1ecaeb66deaa6bdec920a7bab"
27+
)
28+
.with_dependency(
29+
pip_pkg=" ".join([
30+
"'accelerate>=0.21.0'",
31+
"appdirs",
32+
"loralib",
33+
"bitsandbytes==0.39.1",
34+
"black",
35+
"'black[jupyter]'",
36+
"datasets",
37+
"fire",
38+
"'git+https://github.com/huggingface/peft.git'",
39+
"'transformers>=4.31.0'",
40+
"sentencepiece",
41+
"py7zr",
42+
"scipy",
43+
"optimum"
44+
])
45+
)
46+
.with_output("/home/datascience/outputs", "oci://bucket@namespace/outputs/$JOB_RUN_OCID")
47+
.with_command(" ".join([
48+
"torchrun llama_finetuning.py",
49+
"--enable_fsdp",
50+
"--pure_bf16",
51+
"--batch_size_training 1",
52+
"--micro_batch_size 1",
53+
"--model_name $MODEL_NAME",
54+
"--dist_checkpoint_root_folder /home/datascience/outputs",
55+
"--dist_checkpoint_folder fine-tuned"
56+
]))
57+
.with_replica(2)
58+
.with_environment_variable(
59+
MODEL_NAME="meta-llama/Llama-2-7b-hf",
60+
HUGGING_FACE_HUB_TOKEN="<access_token>",
61+
LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib",
62+
)
63+
)
64+
)
65+
66+
.. code-tab:: yaml
67+
:caption: YAML
68+
69+
kind: job
70+
apiVersion: v1.0
71+
spec:
72+
name: LLAMA2-Fine-Tuning
73+
infrastructure:
74+
kind: infrastructure
75+
spec:
76+
blockStorageSize: 256
77+
compartmentId: "<compartment_ocid>"
78+
logGroupId: "<log_group_id>"
79+
logId: "<log_id>"
80+
projectId: "<project_id>"
81+
subnetId: "<subnet_id>"
82+
shapeName: VM.GPU.A10.2
83+
type: dataScienceJob
84+
runtime:
85+
kind: runtime
86+
type: pyTorchDistributed
87+
spec:
88+
git:
89+
url: https://github.com/facebookresearch/llama-recipes.git
90+
commit: 03faba661f079ee1ecaeb66deaa6bdec920a7bab
91+
command: >-
92+
torchrun llama_finetuning.py
93+
--enable_fsdp
94+
--pure_bf16
95+
--batch_size_training 1
96+
--micro_batch_size 1
97+
--model_name $MODEL_NAME
98+
--dist_checkpoint_root_folder /home/datascience/outputs
99+
--dist_checkpoint_folder fine-tuned
100+
replicas: 2
101+
conda:
102+
type: service
103+
slug: pytorch20_p39_gpu_v1
104+
dependencies:
105+
pipPackages: >-
106+
'accelerate>=0.21.0'
107+
appdirs
108+
loralib
109+
bitsandbytes==0.39.1
110+
black
111+
'black[jupyter]'
112+
datasets
113+
fire
114+
'git+https://github.com/huggingface/peft.git'
115+
'transformers>=4.31.0'
116+
sentencepiece
117+
py7zr
118+
scipy
119+
optimum
120+
outputDir: /home/datascience/outputs
121+
outputUri: oci://bucket@namespace/outputs/$JOB_RUN_OCID
122+
env:
123+
- name: MODEL_NAME
124+
value: meta-llama/Llama-2-7b-hf
125+
- name: HUGGING_FACE_HUB_TOKEN
126+
value: "<access_token>"
127+
- name: LD_LIBRARY_PATH
128+
value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib

docs/source/user_guide/model_training/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ TensorBoard provides the visualization and the tooling that is needed to watch a
1919

2020
ads_tuner
2121
training_with_oci
22+
training_llm
2223
distributed_training/overview
2324
tensorboard/tensorboard
2425
model_evaluation/index
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Training Large Language Model
2+
*****************************
3+
4+
.. versionadded:: 2.8.8
5+
6+
Oracle Cloud Infrastructure (OCI) `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_
7+
provides fully managed infrastructure to enable training large language model at scale.
8+
This page shows an example of fine-tuning the `Llama 2 <https://ai.meta.com/llama/>`_ model. For model details on the APIs, see :doc:`../jobs/run_pytorch_ddp`.
9+
10+
.. admonition:: Distributed Training with OCI Data Science
11+
:class: note
12+
13+
You need to configure your `networking <https://docs.oracle.com/en-us/iaas/Content/Network/Concepts/overview.htm>`_
14+
and `IAM <https://docs.oracle.com/en-us/iaas/Content/Identity/Concepts/overview.htm>`_ policies.
15+
We recommend running the training on a private subnet.
16+
In this example, internet access is needed to download the source code and the pre-trained model.
17+
18+
The `llama-recipes <llama-recipes>`_ repository contains example code to fine-tune llama2 model.
19+
The example `fine-tuning script <https://github.com/facebookresearch/llama-recipes/blob/main/llama_finetuning.py>`_ support full parameter fine-tuning
20+
and `Parameter-Efficient Fine-Tuning (PEFT) <https://huggingface.co/blog/peft>`_.
21+
With ADS, you can start the training job by taking the source code directly from Github.
22+
23+
Access the Pre-Trained Model
24+
============================
25+
26+
To fine-tune the model, you will first need to access the pre-trained model.
27+
The pre-trained model can be obtained from `Meta <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_
28+
or `HuggingFace <https://huggingface.co/models?sort=trending&search=meta-llama%2Fllama-2>`_.
29+
In this example, we will use the `access token <https://huggingface.co/docs/hub/security-tokens>`_
30+
to download the pre-trained model from HuggingFace (by setting the ``HUGGING_FACE_HUB_TOKEN`` environment variable).
31+
32+
Fine-Tuning the Model
33+
=====================
34+
35+
You can define the training job with ADS Python APIs or YAML. Here the examples for fine-tuning full parameters of the `7B model <https://huggingface.co/meta-llama/Llama-2-7b-hf>`_ using `FSDP <https://engineering.fb.com/2021/07/15/open-source/fsdp/>`_.
36+
37+
.. include:: ../jobs/tabs/llama2_full.rst
38+
39+
You can create and start the job run API call or ADS CLI.
40+
41+
.. include:: ../jobs/tabs/run_job.rst
42+
43+
The job run will:
44+
45+
* Setup the PyTorch conda environment and install additional dependencies.
46+
* Fetch the source code from GitHub and checkout the specific commit.
47+
* Run the training script with the specific arguments, which includes downloading the model and dataset.
48+
* Save the outputs to OCI object storage once the training finishes.
49+
50+
Note that in the training command, there is no need specify the number of nodes, or the number of GPUs. ADS will automatically configure that base on the ``replica`` and ``shape`` you specified.
51+
52+
The fine-tuning runs on the `samsum <https://huggingface.co/datasets/samsum>`_ dataset by default. You can also `add your custom datasets <https://github.com/facebookresearch/llama-recipes/blob/main/docs/Dataset.md#adding-custom-datasets>`_.
53+
54+
The same training script also support Parameter-Efficient Fine-Tuning (PEFT). You can change the ``command`` to the following for PEFT with `LoRA <https://huggingface.co/docs/peft/conceptual_guides/lora>`_
55+
56+
.. code-block:: bash
57+
58+
torchrun llama_finetuning.py --enable_fsdp --use_peft --peft_method lora --pure_bf16 --batch_size_training 1 --micro_batch_size 1 --model_name /home/datascience/llama --output_dir /home/datascience/outputs
59+

0 commit comments

Comments
 (0)