Add train LLM docs.

qiuosier · qiuosier · commit 863141689c98 · 2023-08-21T15:48:12.000-04:00
diff --git a/docs/source/user_guide/jobs/tabs/llama2_full.rst b/docs/source/user_guide/jobs/tabs/llama2_full.rst
@@ -0,0 +1,128 @@
+.. tabs::
+
+  .. code-tab:: python
+    :caption: Python
+
+    from ads.jobs import Job, DataScienceJob, PyTorchDistributedRuntime
+
+    job = (
+        Job(name="LLAMA2-Fine-Tuning")
+        .with_infrastructure(
+            DataScienceJob()
+            .with_log_group_id("<log_group_ocid>")
+            .with_log_id("<log_ocid>")
+            .with_compartment_id("<compartment_ocid>")
+            .with_project_id("<project_ocid>")
+            .with_subnet_id("<subnet_ocid>")
+            .with_shape_name("VM.GPU.A10.1")
+            .with_block_storage_size(256)
+        )
+        .with_runtime(
+            PyTorchDistributedRuntime()
+            # Specify the service conda environment by slug name.
+            .with_service_conda("pytorch20_p39_gpu_v1")
+            .with_git(
+              url="https://github.com/facebookresearch/llama-recipes.git",
+              commit="03faba661f079ee1ecaeb66deaa6bdec920a7bab"
+            )
+            .with_dependency(
+              pip_pkg=" ".join([
+                "'accelerate>=0.21.0'",
+                "appdirs",
+                "loralib",
+                "bitsandbytes==0.39.1",
+                "black",
+                "'black[jupyter]'",
+                "datasets",
+                "fire",
+                "'git+https://github.com/huggingface/peft.git'",
+                "'transformers>=4.31.0'",
+                "sentencepiece",
+                "py7zr",
+                "scipy",
+                "optimum"
+              ])
+            )
+            .with_output("/home/datascience/outputs", "oci://bucket@namespace/outputs/$JOB_RUN_OCID")
+            .with_command(" ".join([
+              "torchrun llama_finetuning.py",
+              "--enable_fsdp",
+              "--pure_bf16",
+              "--batch_size_training 1",
+              "--micro_batch_size 1",
+              "--model_name $MODEL_NAME",
+              "--dist_checkpoint_root_folder /home/datascience/outputs",
+              "--dist_checkpoint_folder fine-tuned"
+            ]))
+            .with_replica(2)
+            .with_environment_variable(
+              MODEL_NAME="meta-llama/Llama-2-7b-hf",
+              HUGGING_FACE_HUB_TOKEN="<access_token>",
+              LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib",
+            )
+        )
+    )
+
+  .. code-tab:: yaml
+    :caption: YAML
+
+    kind: job
+    apiVersion: v1.0
+    spec:
+      name: LLAMA2-Fine-Tuning
+      infrastructure:
+        kind: infrastructure
+        spec:
+          blockStorageSize: 256
+          compartmentId: "<compartment_ocid>"
+          logGroupId: "<log_group_id>"
+          logId: "<log_id>"
+          projectId: "<project_id>"
+          subnetId: "<subnet_id>"
+          shapeName: VM.GPU.A10.2
+        type: dataScienceJob
+      runtime:
+        kind: runtime
+        type: pyTorchDistributed
+        spec:
+          git:
+            url: https://github.com/facebookresearch/llama-recipes.git
+            commit: 03faba661f079ee1ecaeb66deaa6bdec920a7bab
+          command: >-
+            torchrun llama_finetuning.py
+            --enable_fsdp
+            --pure_bf16
+            --batch_size_training 1
+            --micro_batch_size 1
+            --model_name $MODEL_NAME
+            --dist_checkpoint_root_folder /home/datascience/outputs
+            --dist_checkpoint_folder fine-tuned
+          replicas: 2
+          conda:
+            type: service
+            slug: pytorch20_p39_gpu_v1
+          dependencies:
+            pipPackages: >-
+              'accelerate>=0.21.0'
+              appdirs
+              loralib
+              bitsandbytes==0.39.1
+              black
+              'black[jupyter]'
+              datasets
+              fire
+              'git+https://github.com/huggingface/peft.git'
+              'transformers>=4.31.0'
+              sentencepiece
+              py7zr
+              scipy
+              optimum
+          outputDir: /home/datascience/outputs
+          outputUri: oci://bucket@namespace/outputs/$JOB_RUN_OCID
+          env:
+            - name: MODEL_NAME
+              value: meta-llama/Llama-2-7b-hf
+            - name: HUGGING_FACE_HUB_TOKEN
+              value: "<access_token>"
+            - name: LD_LIBRARY_PATH
+              value: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib
diff --git a/docs/source/user_guide/model_training/index.rst b/docs/source/user_guide/model_training/index.rst
@@ -19,6 +19,7 @@ TensorBoard provides the visualization and the tooling that is needed to watch a
 
   ads_tuner
   training_with_oci
+  training_llm
   distributed_training/overview
   tensorboard/tensorboard
   model_evaluation/index
diff --git a/docs/source/user_guide/model_training/training_llm.rst b/docs/source/user_guide/model_training/training_llm.rst
@@ -0,0 +1,59 @@
+Training Large Language Model
+*****************************
+
+.. versionadded:: 2.8.8
+
+Oracle Cloud Infrastructure (OCI) `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_
+provides fully managed infrastructure to enable training large language model at scale.
+This page shows an example of fine-tuning the `Llama 2 <https://ai.meta.com/llama/>`_ model. For model details on the APIs, see :doc:`../jobs/run_pytorch_ddp`.
+
+.. admonition:: Distributed Training with OCI Data Science
+  :class: note
+
+  You need to configure your `networking <https://docs.oracle.com/en-us/iaas/Content/Network/Concepts/overview.htm>`_
+  and `IAM <https://docs.oracle.com/en-us/iaas/Content/Identity/Concepts/overview.htm>`_ policies.
+  We recommend running the training on a private subnet.
+  In this example, internet access is needed to download the source code and the pre-trained model.
+
+The `llama-recipes <llama-recipes>`_ repository contains example code to fine-tune llama2 model.
+The example `fine-tuning script <https://github.com/facebookresearch/llama-recipes/blob/main/llama_finetuning.py>`_ support full parameter fine-tuning
+and `Parameter-Efficient Fine-Tuning (PEFT) <https://huggingface.co/blog/peft>`_.
+With ADS, you can start the training job by taking the source code directly from Github.
+
+Access the Pre-Trained Model
+============================
+
+To fine-tune the model, you will first need to access the pre-trained model.
+The pre-trained model can be obtained from `Meta <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_
+or `HuggingFace <https://huggingface.co/models?sort=trending&search=meta-llama%2Fllama-2>`_.
+In this example, we will use the `access token <https://huggingface.co/docs/hub/security-tokens>`_
+to download the pre-trained model from HuggingFace (by setting the ``HUGGING_FACE_HUB_TOKEN`` environment variable).
+
+Fine-Tuning the Model
+=====================
+
+You can define the training job with ADS Python APIs or YAML. Here the examples for fine-tuning full parameters of the `7B model <https://huggingface.co/meta-llama/Llama-2-7b-hf>`_ using `FSDP <https://engineering.fb.com/2021/07/15/open-source/fsdp/>`_.
+
+.. include:: ../jobs/tabs/llama2_full.rst
+
+You can create and start the job run API call or ADS CLI.
+
+.. include:: ../jobs/tabs/run_job.rst
+
+The job run will:
+
+* Setup the PyTorch conda environment and install additional dependencies.
+* Fetch the source code from GitHub and checkout the specific commit.
+* Run the training script with the specific arguments, which includes downloading the model and dataset.
+* Save the outputs to OCI object storage once the training finishes.
+
+Note that in the training command, there is no need specify the number of nodes, or the number of GPUs. ADS will automatically configure that base on the ``replica`` and ``shape`` you specified.
+
+The fine-tuning runs on the `samsum <https://huggingface.co/datasets/samsum>`_ dataset by default. You can also `add your custom datasets <https://github.com/facebookresearch/llama-recipes/blob/main/docs/Dataset.md#adding-custom-datasets>`_.
+
+The same training script also support Parameter-Efficient Fine-Tuning (PEFT). You can change the ``command`` to the following for PEFT with `LoRA <https://huggingface.co/docs/peft/conceptual_guides/lora>`_
+
+.. code-block:: bash
+
+    torchrun llama_finetuning.py --enable_fsdp --use_peft --peft_method lora --pure_bf16 --batch_size_training 1 --micro_batch_size 1 --model_name /home/datascience/llama --output_dir /home/datascience/outputs
+