Skip to content

Commit 7cc3f8f

Browse files
mingkang111mrDzurb
andauthored
[DOCS] Large Language Model section. (#358)
Co-authored-by: Dmitrii Cherkasov <dmitrii.cherkasov@oracle.com>
1 parent f353f8a commit 7cc3f8f

File tree

9 files changed

+585
-7
lines changed

9 files changed

+585
-7
lines changed

docs/source/user_guide/model_registration/introduction.rst

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,16 @@ If you make changes to the ``score.py`` file, call the ``.verify()`` method to c
4040

4141
The ``.save()`` method is then used to store the model in the model catalog. A call to the ``.deploy()`` method creates a load balancer and the instances needed to have an HTTPS access point to perform inference on the model. Using the ``.predict()`` method, you can send data to the model deployment endpoint and it will return the predictions.
4242

43+
44+
LLMs
45+
----
46+
47+
.. toctree::
48+
:maxdepth: 1
49+
50+
large_language_model
51+
52+
4353
Register
4454
--------
4555

@@ -84,7 +94,3 @@ Frameworks
8494
:maxdepth: 1
8595

8696
framework_specific_instruction
87-
88-
89-
90-
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
====================
2+
Large Language Model
3+
====================
4+
5+
Oracle ADS (Accelerated Data Science) opens the gateway to harnessing the full potential of the Large Language models
6+
within Oracle Cloud Infrastructure (OCI). `Meta <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_'s
7+
latest offering, `Llama 2 <https://ai.meta.com/llama/>`_, introduces a collection of pre-trained and
8+
fine-tuned generative text models, ranging from 7 to 70 billion parameters. These models represent a significant leap
9+
forward, being trained on 40% more tokens and boasting an extended context length of 4,000 tokens.
10+
11+
Throughout this documentation, we showcase two essential inference frameworks:
12+
13+
- `Text Generation Inference (TGI) <https://github.com/huggingface/text-generation-inference>`_. A purpose-built solution for deploying and serving LLMs from Hugging Face, which we extend to meet the interface requirements of model deployment resources.
14+
15+
- `vLLM <https://vllm.readthedocs.io/>`_. An open-source, high-throughput, and memory-efficient inference and serving engine for LLMs from UC Berkeley.
16+
17+
18+
While our primary focus is on the Llama 2 family, the methodology presented here can be applied to other LLMs as well.
19+
20+
21+
**Sample Code**
22+
23+
For your convenience, we provide sample code and a complete walkthrough, available in the `Oracle
24+
GitHub samples repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2>`_.
25+
26+
**Prerequisites**
27+
28+
Using the Llama 2 model requires user agreement acceptance on `Meta's website <https://ai.meta.com/resources/models-and-libraries/llama-downloads/>`_. Downloading the model
29+
from `Hugging Face <https://huggingface.co/meta-llama>`_ necessitates an account and agreement to the service terms. Ensure that the model's license permits
30+
usage for your intended purposes.
31+
32+
**Recommended Hardware**
33+
34+
We recommend specific OCI shapes based on Nvidia A10 GPUs for deploying models. These shapes
35+
cater to both the 7-billion and 13-billion parameter models, with the latter utilizing quantization techniques to
36+
optimize GPU memory usage. OCI offers `a variety of GPU options <https://docs.public.oneportal.content.oci.oraclecloud.com/en-us/iaas/Content/Compute/References/computeshapes.htm>`_ to suit your needs.
37+
38+
**Deployment Approaches**
39+
40+
You can use the following methods to deploy an LLM with OCI Data Science:
41+
42+
- Online Method. This approach involves downloading the LLM directly from the hosting repository into the `Data Science Model Deployment <https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm>`_. It minimizes data copying, making it suitable for large models. However, it lacks governance and may not be ideal for production environments or fine-tuning scenarios.
43+
44+
- Offline Method. In this method, you download the LLM model from the host repository and save it in the `Data Science Model Catalog <https://docs.oracle.com/en-us/iaas/data-science/using/models-about.htm>`_. Deployment then occurs directly from the catalog, allowing for better control and governance of the model.
45+
46+
**Inference Container**
47+
48+
We explore two inference options: Hugging Face's Text Generation Inference (TGI) and vLLM from UC Berkeley. These
49+
containers are crucial for effective model deployment and are optimized to align with OCI Data Science model deployment requirements.
50+
You can find both the TGI and vLLM Docker files in `our samples repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2>`_.
51+
52+
**Creating the Model Deployment**
53+
54+
The final step involves deploying the model and the inference container by creating a model deployment. Once deployed,
55+
the model is accessible via a predict URL, allowing HTTP-based model invocation.
56+
57+
**Testing the Model**
58+
59+
To validate your deployed model, a Gradio Chat app can be configured to use the predict URL. This app provides
60+
parameters such as ``max_tokens``, ``temperature``, and ``top_p`` for fine-tuning model responses. Check our `blog <https://blogs.oracle.com/ai-and-datascience/post/llama2-oci-data-science-cloud-platform>`_ to
61+
learn more about this.
62+
63+
64+
Train Model
65+
-----------
66+
67+
Check `Training Large Language Model <../model_training/training_llm.rst>`_ to see how to train your large language model
68+
by Oracle Cloud Infrastructure (OCI) `Data Science Jobs (Jobs) <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-about.htm>`_.
69+
70+
71+
Register Model
72+
--------------
73+
74+
Once you've trained your LLM, we guide you through the process of registering it within OCI, enabling seamless access and management.
75+
76+
Zip all items of the folder using zip/tar utility, preferrably using below command to avoid creating another hierarchy of folder structure inside zipped file.
77+
78+
.. code-block:: bash
79+
80+
zip my_large_model.zip * -0
81+
82+
Upload the zipped artifact created in an object storage bucket in your tenancy. Tools like `rclone <https://rclone.org/>`_,
83+
can help speed this upload. Using rclone with OCI can be referred from `here <https://docs.oracle.com/en/solutions/move-data-to-cloud-storage-using-rclone/configure-rclone-object-storage.html#GUID-8471A9B3-F812-4358-945E-8F7EEF115241>`_.
84+
85+
Example of using ``oci-cli``:
86+
87+
.. code-block:: bash
88+
89+
oci os object put -ns <namespace> -bn <bucket> --name <prefix>/my_large_model.zip --file my_large_model.zip
90+
91+
Next step is to create a model catalog item. Use :py:class:`~ads.model.DataScienceModel` to register the large model to Model Catalog.
92+
93+
.. versionadd:: 2.8.10
94+
95+
.. code-block:: python
96+
97+
import ads
98+
from ads.model import DataScienceModel
99+
100+
ads.set_auth("resource_principal")
101+
102+
MODEL_DISPLAY_NAME = "My Large Model"
103+
ARTIFACT_PATH = "oci://<bucket>@<namespace>/<prefix>/my_large_model.zip"
104+
105+
model = (DataScienceModel()
106+
.with_display_name(MODEL_DISPLAY_NAME)
107+
.with_artifact(ARTIFACT_PATH)
108+
.create(
109+
remove_existing_artifact=False
110+
))
111+
model_id = model.id
112+
113+
Deploy Model
114+
------------
115+
116+
The final step involves deploying your registered LLM for real-world applications. We walk you through deploying it in a
117+
`custom containers (Bring Your Own Container) <http://docs.oracle.com/en-us/iaas/data-science/using/mod-dep-byoc.htm>`_ within the OCI Data
118+
Science Service, leveraging advanced technologies for optimal performance.
119+
120+
You can define the model deployment with `ADS Python APIs <../model_registration/model_deploy_byoc.rst>`_ or YAML. In the
121+
examples below, you will need to change with the OCIDs of the resources required for the deployment, like ``project ID``,
122+
``compartment ID`` etc. All of the configurations with ``<UNIQUE_ID>`` should be replaces with your corresponding ID from
123+
your tenancy, the resources we created in the previous steps.
124+
125+
126+
Online Deployment
127+
^^^^^^^^^^^^^^^^^
128+
129+
**Prerequisites**
130+
131+
Check on `GitHub Sample repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#model-deployment-steps>`_ to see how to complete the Prerequisites before actual deployment.
132+
133+
- Zips your Hugging Face user access token and registers it into Model Catalog by following the instruction on ``Register Model`` in this page.
134+
- Creates logging in the `OCI Logging Service <https://cloud.oracle.com/logging/log-groups>`_ for the model deployment (if you have to already created, you can skip this step).
135+
- Creates a subnet in `Virtual Cloud Network <https://cloud.oracle.com/networking/vcns>`_ for the model deployment.
136+
- Executes container build and push process to `Oracle Cloud Container Registry <https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm>`_.
137+
- You can now use the Bring Your Own Container Deployment in OCI Data Science to the deploy the Llama2 model.
138+
139+
.. include:: ../model_registration/tabs/env-var-online.rst
140+
141+
.. include:: ../model_registration/tabs/ads-md-deploy-online.rst
142+
143+
Offline Deployment
144+
^^^^^^^^^^^^^^^^^^
145+
146+
Check on `GitHub Sample repository <https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2-offline#model-deployment-steps>`_ to see how to complete the Prerequisites before actual deployment.
147+
148+
- Registers the zipped artifact into Model Catalog by following the instruction on ``Register Model`` in this page.
149+
- Creates logging in the `OCI Logging Service <https://cloud.oracle.com/logging/log-groups>`_ for the model deployment (if you have to already created, you can skip this step).
150+
- Executes container build and push process to `Oracle Cloud Container Registry <https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryoverview.htm>`_.
151+
- You can now use the Bring Your Own Container Deployment in OCI Data Science to the deploy the Llama2 model.
152+
153+
.. include:: ../model_registration/tabs/env-var-offline.rst
154+
155+
.. include:: ../model_registration/tabs/ads-md-deploy-offline.rst
156+
157+
You can deploy the model through API call or ADS CLI.
158+
159+
Make sure that you've also created and setup your `API Auth Token <https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm>`_ to execute the commands below.
160+
161+
.. include:: ../model_registration/tabs/run_md.rst
162+
163+
164+
Inference Model
165+
---------------
166+
167+
Once the model is deployed and shown as Active you can execute inference against it. You can run inference against
168+
the deployed model with oci-cli from your OCI Data Science Notebook or you local environment.
169+
170+
.. include:: ../model_registration/tabs/run_predict.rst

docs/source/user_guide/model_registration/model_load.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
=====================
12
Load Registered Model
23
=====================
34

@@ -43,7 +44,7 @@ Alternatively the ``.from_id()`` method can be used to load a model. In future r
4344
4445
4546
Load Deployed Model
46-
===================
47+
-------------------
4748

4849
Load and recreate :doc:`framework specific wrapper <framework_specific_instruction>` objects using the ``ocid`` value of your OCI Model Deployment instance.
4950

@@ -82,7 +83,7 @@ Alternatively the ``.from_id()`` method can be used to load a model from the Mod
8283
)
8384
8485
Load Model From Object Storage
85-
==============================
86+
------------------------------
8687

8788
Load and recreate :doc:`framework specific wrapper <framework_specific_instruction>` objects from the existing model artifact archive.
8889

@@ -107,7 +108,7 @@ A model loaded from an artifact archive can be registered and deployed.
107108

108109

109110
Large Model Artifacts
110-
=====================
111+
---------------------
111112

112113
.. versionadded:: 2.6.4
113114

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
Creates Model Deployment:
2+
3+
.. tabs::
4+
5+
.. code-tab:: Python3
6+
:caption: Python
7+
8+
from ads.model.deployment import ModelDeployment, ModelDeploymentInfrastructure, ModelDeploymentContainerRuntime
9+
10+
# configure model deployment infrastructure
11+
infrastructure = (
12+
ModelDeploymentInfrastructure()
13+
.with_project_id("ocid1.datascienceproject.oc1.<UNIQUE_ID>")
14+
.with_compartment_id("ocid1.compartment.oc1..<UNIQUE_ID>")
15+
.with_shape_name("VM.GPU3.2")
16+
.with_bandwidth_mbps(10)
17+
.with_web_concurrency(10)
18+
.with_access_log(
19+
log_group_id="ocid1.loggroup.oc1.<UNIQUE_ID>",
20+
log_id="ocid1.log.oc1.<UNIQUE_ID>"
21+
)
22+
.with_predict_log(
23+
log_group_id="ocid1.loggroup.oc1.<UNIQUE_ID>",
24+
log_id="ocid1.log.oc1.<UNIQUE_ID>"
25+
)
26+
)
27+
28+
# configure model deployment runtime
29+
container_runtime = (
30+
ModelDeploymentContainerRuntime()
31+
.with_image("iad.ocir.io/<namespace>/<image>:<tag>")
32+
.with_server_port(5001)
33+
.with_health_check_port(5001)
34+
.with_env(env_var)
35+
.with_deployment_mode("HTTPS_ONLY")
36+
.with_model_uri("ocid1.datasciencemodel.oc1.<UNIQUE_ID>")
37+
.with_region("us-ashburn-1")
38+
.with_overwrite_existing_artifact(True)
39+
.with_remove_existing_artifact(True)
40+
.with_timeout(100)
41+
)
42+
43+
# configure model deployment
44+
deployment = (
45+
ModelDeployment()
46+
.with_display_name("Model Deployment Demo using ADS")
47+
.with_description("The model deployment description.")
48+
.with_freeform_tags({"key1":"value1"})
49+
.with_infrastructure(infrastructure)
50+
.with_runtime(container_runtime)
51+
)
52+
53+
.. code-tab:: yaml
54+
:caption: TGI-YAML
55+
56+
kind: deployment
57+
spec:
58+
displayName: LLama2-7b model deployment - tgi
59+
infrastructure:
60+
kind: infrastructure
61+
type: datascienceModelDeployment
62+
spec:
63+
compartmentId: ocid1.compartment.oc1..<UNIQUE_ID>
64+
projectId: ocid1.datascienceproject.oc1.<UNIQUE_ID>
65+
accessLog:
66+
logGroupId: ocid1.loggroup.oc1.<UNIQUE_ID>
67+
logId: ocid1.log.oc1.<UNIQUE_ID>
68+
predictLog:
69+
logGroupId: ocid1.loggroup.oc1.<UNIQUE_ID>
70+
logId: ocid1.log.oc1.<UNIQUE_ID>
71+
shapeName: VM.GPU.A10.2
72+
replica: 1
73+
bandWidthMbps: 10
74+
webConcurrency: 10
75+
subnetId: ocid1.subnet.oc1.<UNIQUE_ID>
76+
runtime:
77+
kind: runtime
78+
type: container
79+
spec:
80+
modelUri: ocid1.datasciencemodel.oc1.<UNIQUE_ID>
81+
image: <UNIQUE_ID>
82+
serverPort: 5001
83+
healthCheckPort: 5001
84+
env:
85+
MODEL_DEPLOY_PREDICT_ENDPOINT: "/generate"
86+
PARAMS: "--model /opt/ds/model/deployed_model --max-batch-prefill-tokens 1024"
87+
region: us-ashburn-1
88+
overwriteExistingArtifact: True
89+
removeExistingArtifact: True
90+
timeout: 100
91+
deploymentMode: HTTPS_ONLY
92+
93+
.. code-tab:: yaml
94+
:caption: vllm-YAML
95+
96+
kind: deployment
97+
spec:
98+
displayName: LLama2-7b model deployment - vllm
99+
infrastructure:
100+
kind: infrastructure
101+
type: datascienceModelDeployment
102+
spec:
103+
compartmentId: ocid1.compartment.oc1..<UNIQUE_ID>
104+
projectId: ocid1.datascienceproject.oc1.<UNIQUE_ID>
105+
accessLog:
106+
logGroupId: ocid1.loggroup.oc1.<UNIQUE_ID>
107+
logId: ocid1.log.oc1.<UNIQUE_ID>
108+
predictLog:
109+
logGroupId: ocid1.loggroup.oc1.<UNIQUE_ID>
110+
logId: ocid1.log.oc1.<UNIQUE_ID>
111+
shapeName: VM.GPU.A10.2
112+
replica: 1
113+
bandWidthMbps: 10
114+
webConcurrency: 10
115+
runtime:
116+
kind: runtime
117+
type: container
118+
spec:
119+
modelUri: ocid1.datasciencemodel.oc1.<UNIQUE_ID>
120+
image: <UNIQUE_ID>
121+
serverPort: 5001
122+
healthCheckPort: 5001
123+
env:
124+
PARAMS: "--model /opt/ds/model/deployed_model"
125+
TENSOR_PARALLELISM: 2
126+
region: us-ashburn-1
127+
overwriteExistingArtifact: True
128+
removeExistingArtifact: True
129+
timeout: 100
130+
deploymentMode: HTTPS_ONLY

0 commit comments

Comments
 (0)