|
| 1 | +==================== |
| 2 | +Large Language Model |
| 3 | +==================== |
| 4 | + |
| 5 | +Oracle ADS (Accelerated Data Science) opens the gateway to harnessing the full potential of the Large Language models |
| 6 | +within Oracle Cloud Infrastructure (OCI). `Meta <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_'s |
| 7 | +latest offering, `Llama 2 <https://ai.meta.com/llama/>`_, introduces a collection of pre-trained and |
| 8 | +fine-tuned generative text models, ranging from 7 to 70 billion parameters. These models represent a significant leap |
| 9 | +forward, being trained on 40% more tokens and boasting an extended context length of 4,000 tokens. |
| 10 | + |
| 11 | +Throughout this documentation, we showcase two essential inference frameworks: |
| 12 | + |
| 13 | +- `Text Generation Inference (TGI) <https://github.com/huggingface/text-generation-inference>`_. A purpose-built solution for deploying and serving LLMs from Hugging Face, which we extend to meet the interface requirements of model deployment resources. |
| 14 | + |
| 15 | +- `vLLM <https://vllm.readthedocs.io/>`_. An open-source, high-throughput, and memory-efficient inference and serving engine for LLMs from UC Berkeley. |
| 16 | + |
| 17 | + |
| 18 | +While our primary focus is on the Llama 2 family, the methodology presented here can be applied to other LLMs as well. |
| 19 | + |
| 20 | + |
| 21 | +**Sample Code** |
| 22 | + |
| 23 | +For your convenience, we provide sample code and a complete walkthrough, available in the `Oracle |
| 24 | +GitHub samples repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2>`_. |
| 25 | + |
| 26 | +**Prerequisites** |
| 27 | + |
| 28 | +Using the Llama 2 model requires user agreement acceptance on `Meta's website <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_. Downloading the model |
| 29 | +from `Hugging Face <https://huggingface.co/meta-llama>`_ necessitates an account and agreement to the service terms. Ensure that the model's license permits |
| 30 | +usage for your intended purposes. |
| 31 | + |
| 32 | +**Recommended Hardware** |
| 33 | + |
| 34 | +We recommend specific OCI shapes based on Nvidia A10 GPUs for deploying models. These shapes |
| 35 | +cater to both the 7-billion and 13-billion parameter models, with the latter utilizing quantization techniques to |
| 36 | +optimize GPU memory usage. OCI offers `a variety of GPU options <https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Compute/References/computeshapes.htm>`_ to suit your needs. |
| 37 | + |
| 38 | +**Deployment Approaches** |
| 39 | + |
| 40 | +You can use the following methods to deploy an LLM with OCI Data Science: |
| 41 | + |
| 42 | +- Online Method. This approach involves downloading the LLM directly from the hosting repository into the `Data Science Model Deployment <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm>`_. It minimizes data copying, making it suitable for large models. However, it lacks governance and may not be ideal for production environments or fine-tuning scenarios. |
| 43 | + |
| 44 | +- Offline Method. In this method, you download the LLM model from the host repository and save it in the `Data Science Model Catalog <https://docs.oracle.com/en-us/iaas/data-science/using/models-about.htm>`_. Deployment then occurs directly from the catalog, allowing for better control and governance of the model. |
| 45 | + |
| 46 | +**Inference Container** |
| 47 | + |
| 48 | +We explore two inference options: Hugging Face's Text Generation Inference (TGI) and vLLM from UC Berkeley. These |
| 49 | +containers are crucial for effective model deployment and are optimized to align with OCI Data Science model deployment requirements. |
| 50 | +You can find both the TGI and vLLM Docker files in `our samples repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2>`_. |
| 51 | + |
| 52 | +**Creating the Model Deployment** |
| 53 | + |
| 54 | +The final step involves deploying the model and the inference container by creating a model deployment. Once deployed, |
| 55 | +the model is accessible via a predict URL, allowing HTTP-based model invocation. |
| 56 | + |
| 57 | +**Testing the Model** |
| 58 | + |
| 59 | +To validate your deployed model, a Gradio Chat app can be configured to use the predict URL. This app provides |
| 60 | +parameters such as ``max_tokens``, ``temperature``, and ``top_p`` for fine-tuning model responses. Check our `blog <https://blogs.oracle.com/ai-and-datascience/post/llama2-oci-data-science-cloud-platform>`_ to |
| 61 | +learn more about this. |
| 62 | + |
| 63 | + |
| 64 | +Train Model |
| 65 | +----------- |
| 66 | + |
| 67 | +Check `Training Large Language Model <../model_training/training_llm.rst>`_ to see how to train your large language model |
| 68 | +by Oracle Cloud Infrastructure (OCI) `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_. |
| 69 | + |
| 70 | + |
| 71 | +Register Model |
| 72 | +-------------- |
| 73 | + |
| 74 | +Once you've trained your LLM, we guide you through the process of registering it within OCI, enabling seamless access and management. |
| 75 | + |
| 76 | +Zip all items of the folder using zip/tar utility, preferrably using below command to avoid creating another hierarchy of folder structure inside zipped file. |
| 77 | + |
| 78 | +.. code-block:: bash |
| 79 | +
|
| 80 | + zip my_large_model.zip * -0 |
| 81 | +
|
| 82 | +Upload the zipped artifact created in an object storage bucket in your tenancy. Tools like `rclone <https://rclone.org/>`_, |
| 83 | +can help speed this upload. Using rclone with OCI can be referred from `here <https://docs.oracle.com/en/solutions/move-data-to-cloud-storage-using-rclone/configure-rclone-object-storage.html#GUID-8471A9B3-F812-4358-945E-8F7EEF115241>`_. |
| 84 | + |
| 85 | +Example of using ``oci-cli``: |
| 86 | + |
| 87 | +.. code-block:: bash |
| 88 | +
|
| 89 | + oci os object put -ns <namespace> -bn <bucket> --name <prefix>/my_large_model.zip --file my_large_model.zip |
| 90 | +
|
| 91 | +Next step is to create a model catalog item. Use :py:class:`~ads.model.DataScienceModel` to register the large model to Model Catalog. |
| 92 | + |
| 93 | +.. versionadd:: 2.8.10 |
| 94 | + |
| 95 | +.. code-block:: python |
| 96 | +
|
| 97 | + import ads |
| 98 | + from ads.model import DataScienceModel |
| 99 | +
|
| 100 | + ads.set_auth("resource_principal") |
| 101 | +
|
| 102 | + MODEL_DISPLAY_NAME = "My Large Model" |
| 103 | + ARTIFACT_PATH = "oci://<bucket>@<namespace>/<prefix>/my_large_model.zip" |
| 104 | +
|
| 105 | + model = (DataScienceModel() |
| 106 | + .with_display_name(MODEL_DISPLAY_NAME) |
| 107 | + .with_artifact(ARTIFACT_PATH) |
| 108 | + .create( |
| 109 | + remove_existing_artifact=False |
| 110 | + )) |
| 111 | + model_id = model.id |
| 112 | +
|
| 113 | +Deploy Model |
| 114 | +------------ |
| 115 | + |
| 116 | +The final step involves deploying your registered LLM for real-world applications. We walk you through deploying it in a |
| 117 | +`custom containers (Bring Your Own Container) <http://docs.oracle.com/en-us/iaas/data-science/using/mod-dep-byoc.htm>`_ within the OCI Data |
| 118 | +Science Service, leveraging advanced technologies for optimal performance. |
| 119 | + |
| 120 | +You can define the model deployment with `ADS Python APIs <../model_registration/model_deploy_byoc.rst>`_ or YAML. In the |
| 121 | +examples below, you will need to change with the OCIDs of the resources required for the deployment, like ``project ID``, |
| 122 | +``compartment ID`` etc. All of the configurations with ``<UNIQUE_ID>`` should be replaces with your corresponding ID from |
| 123 | +your tenancy, the resources we created in the previous steps. |
| 124 | + |
| 125 | + |
| 126 | +Online Deployment |
| 127 | +^^^^^^^^^^^^^^^^^ |
| 128 | + |
| 129 | +**Prerequisites** |
| 130 | + |
| 131 | +Check on `GitHub Sample repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#model-deployment-steps>`_ to see how to complete the Prerequisites before actual deployment. |
| 132 | + |
| 133 | +- Zips your Hugging Face user access token and registers it into Model Catalog by following the instruction on ``Register Model`` in this page. |
| 134 | +- Creates logging in the `OCI Logging Service <https://cloud.oracle.com/logging/log-groups>`_ for the model deployment (if you have to already created, you can skip this step). |
| 135 | +- Creates a subnet in `Virtual Cloud Network <https://cloud.oracle.com/networking/vcns>`_ for the model deployment. |
| 136 | +- Executes container build and push process to `Oracle Cloud Container Registry <https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm>`_. |
| 137 | +- You can now use the Bring Your Own Container Deployment in OCI Data Science to the deploy the Llama2 model. |
| 138 | + |
| 139 | +.. include:: ../model_registration/tabs/env-var-online.rst |
| 140 | + |
| 141 | +.. include:: ../model_registration/tabs/ads-md-deploy-online.rst |
| 142 | + |
| 143 | +Offline Deployment |
| 144 | +^^^^^^^^^^^^^^^^^^ |
| 145 | + |
| 146 | +Check on `GitHub Sample repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2-offline#model-deployment-steps>`_ to see how to complete the Prerequisites before actual deployment. |
| 147 | + |
| 148 | +- Registers the zipped artifact into Model Catalog by following the instruction on ``Register Model`` in this page. |
| 149 | +- Creates logging in the `OCI Logging Service <https://cloud.oracle.com/logging/log-groups>`_ for the model deployment (if you have to already created, you can skip this step). |
| 150 | +- Executes container build and push process to `Oracle Cloud Container Registry <https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm>`_. |
| 151 | +- You can now use the Bring Your Own Container Deployment in OCI Data Science to the deploy the Llama2 model. |
| 152 | + |
| 153 | +.. include:: ../model_registration/tabs/env-var-offline.rst |
| 154 | + |
| 155 | +.. include:: ../model_registration/tabs/ads-md-deploy-offline.rst |
| 156 | + |
| 157 | +You can deploy the model through API call or ADS CLI. |
| 158 | + |
| 159 | +Make sure that you've also created and setup your `API Auth Token <https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm>`_ to execute the commands below. |
| 160 | + |
| 161 | +.. include:: ../model_registration/tabs/run_md.rst |
| 162 | + |
| 163 | + |
| 164 | +Inference Model |
| 165 | +--------------- |
| 166 | + |
| 167 | +Once the model is deployed and shown as Active you can execute inference against it. You can run inference against |
| 168 | +the deployed model with oci-cli from your OCI Data Science Notebook or you local environment. |
| 169 | + |
| 170 | +.. include:: ../model_registration/tabs/run_predict.rst |
0 commit comments