Skip to content

Commit 8881316

Browse files
authored
Merge branch 'develop' into ODSC-29065/md_opctl_docs
2 parents 48332d1 + 9022784 commit 8881316

File tree

9 files changed

+297
-18
lines changed

9 files changed

+297
-18
lines changed

ads/dataflow/dataflow.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ class SPARK_VERSION(str):
6363
class DataFlow:
6464
@deprecated(
6565
"2.6.3",
66-
details="Use ads.jobs.DataFlow class for creating DataFlow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",
66+
details="Use ads.jobs.DataFlow class for creating Data Flow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",
6767
)
6868
def __init__(
6969
self,
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
* DataFlow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for `setting up storage <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_storage>`_.
2-
* DataFlow requires policies to be set in IAM to access resources to manage and run applications/sessions. Refer to the Data Flow documentation on how to `setup policies <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#policy_set_up>`__.
3-
* DataFlow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application/Session.
1+
* Data Flow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for `setting up storage <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_storage>`_.
2+
* Data Flow requires policies to be set in IAM to access resources to manage and run applications/sessions. Refer to the Data Flow documentation on how to `setup policies <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#policy_set_up>`__.
3+
* Data Flow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application/Session.

docs/source/user_guide/apachespark/dataflow.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Running your Spark Application on OCI Data Flow
33
===============================================
44

5-
Submit your code to DataFlow for workloads that require larger resources.
5+
Submit your code to Data Flow for workloads that require larger resources.
66

77
Notebook Extension
88
==================
@@ -124,7 +124,7 @@ ADS CLI
124124

125125
Sometimes your code is too complex to run in a single cell, and it's better run as a notebook or file. In that case, use the ADS Opctl CLI.
126126

127-
To submit your notebook to DataFlow using the ``ads`` CLI, run:
127+
To submit your notebook to Data Flow using the ``ads`` CLI, run:
128128

129129
.. code-block:: shell
130130
@@ -205,7 +205,7 @@ You can set them using the ``with_{property}`` functions:
205205
- ``with_warehouse_bucket_uri``
206206
- ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
207207

208-
For more details, see `DataFlow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
208+
For more details, see `Data Flow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
209209

210210
``DataFlowRuntime`` stores properties related to the script to be run, such as the path to the script and
211211
CLI arguments. Likewise all properties can be set using ``with_{property}``.

docs/source/user_guide/apachespark/setup-installation.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,8 +178,8 @@ Once the development environment is setup, you could write your code and run it
178178
``core-site.xml`` is setup automatically when you install a pyspark conda pack.
179179
180180
181-
Logging From DataFlow
182-
=====================
181+
Logging From Data Flow
182+
======================
183183
184184
If using the ADS Python SDK,
185185

docs/source/user_guide/cli/opctl/_template/monitoring.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Monitoring With CLI
77
watch
88
+++++
99

10-
You can tail the logs generated by ``OCI Data Science Job Runs`` or ``OCI DataFlow Application Runs`` using the ``watch`` subcommand.
10+
You can tail the logs generated by ``OCI Data Science Job Runs`` or ``OCI Data Flow Application Runs`` using the ``watch`` subcommand.
1111

1212
.. code-block:: shell
1313

docs/source/user_guide/cli/opctl/configure.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ CLI Configuration
99
- You have completed :doc:`ADS CLI installation <../quickstart>`
1010

1111

12-
Setup default values for different options while running ``OCI Data Sciecne Jobs`` or ``OCI DataFlow``. By setting defaults, you can avoid inputing compartment ocid, project ocid, etc.
12+
Setup default values for different options while running ``OCI Data Science Jobs`` or ``OCI Data Flow``. By setting defaults, you can avoid inputing compartment ocid, project ocid, etc.
1313

1414
To setup configuration run -
1515

docs/source/user_guide/jobs/run_script.rst

Lines changed: 274 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,285 @@
11
Run a Script
22
************
33

4-
This section shows how to create a job to run a script.
4+
This example shows you how to create a job running "Hello World" Python scripts. Although Python scripts are used here, you could also run Bash or Shell scripts. The Logging service log and log group are defined in the infrastructure. The output of the script appear in the logs.
5+
6+
Python
7+
======
8+
9+
Suppose you would like to run the following "Hello World" python script named ``job_script.py``.
10+
11+
.. code-block:: python3
12+
13+
print("Hello World")
14+
15+
First, initiate a job with a job name:
16+
17+
.. code-block:: python3
18+
19+
from ads.jobs import Job
20+
job = Job(name="Job Name")
21+
22+
Next, you specify the desired infrastructure to run the job. If you are in a notebook session, ADS can automatically fetch the infrastructure configurations and use them for the job. If you aren't in a notebook session or you want to customize the infrastructure, you can specify them using the methods from the ``DataScienceJob`` class:
23+
24+
.. code-block:: python3
25+
26+
from ads.jobs import DataScienceJob
27+
28+
job.with_infrastructure(
29+
DataScienceJob()
30+
.with_log_group_id("<log_group_ocid>")
31+
.with_log_id("<log_ocid>")
32+
# The following infrastructure configurations are optional
33+
# if you are in an OCI data science notebook session.
34+
# The configurations of the notebook session will be used as defaults
35+
.with_compartment_id("<compartment_ocid>")
36+
.with_project_id("<project_ocid>")
37+
.with_subnet_id("<subnet_ocid>")
38+
.with_shape_name("VM.Standard.E3.Flex")
39+
.with_shape_config_details(memory_in_gbs=16, ocpus=1) # Applicable only for the flexible shapes
40+
.with_block_storage_size(50)
41+
)
42+
43+
In this example, it is a Python script so the ``ScriptRuntime()`` class is used to define the name of the script using the ``.with_source()`` method:
44+
45+
.. code-block:: python3
46+
47+
from ads.jobs import ScriptRuntime
48+
job.with_runtime(
49+
ScriptRuntime().with_source("job_script.py")
50+
)
51+
52+
Finally, you create and run the job, which gives you access to the
53+
``job_run.id``:
54+
55+
.. code-block:: python3
56+
57+
job.create()
58+
job_run = job.run()
59+
60+
Additionally, you can acquire the job run using the OCID:
61+
62+
.. code-block:: python3
63+
64+
from ads.jobs import DataScienceJobRun
65+
job_run = DataScienceJobRun.from_ocid(job_run.id)
66+
67+
The ``.watch()`` method is useful to monitor the progress of the job run:
68+
69+
.. code-block:: python3
70+
71+
job_run.watch()
72+
73+
After the job has been created and runs successfully, you can find
74+
the output of the script in the logs if you configured logging.
75+
76+
YAML
77+
====
78+
79+
You could also initialize a job directly from a YAML string. For example, to create a job identical to the preceding example, you could simply run the following:
80+
81+
.. code-block:: python3
82+
83+
job = Job.from_string(f"""
84+
kind: job
85+
spec:
86+
infrastructure:
87+
kind: infrastructure
88+
type: dataScienceJob
89+
spec:
90+
logGroupId: <log_group_ocid>
91+
logId: <log_ocid>
92+
compartmentId: <compartment_ocid>
93+
projectId: <project_ocid>
94+
subnetId: <subnet_ocid>
95+
shapeName: VM.Standard.E3.Flex
96+
shapeConfigDetails:
97+
memoryInGBs: 16
98+
ocpus: 1
99+
blockStorageSize: 50
100+
name: <resource_name>
101+
runtime:
102+
kind: runtime
103+
type: python
104+
spec:
105+
scriptPathURI: job_script.py
106+
""")
107+
108+
109+
Command Line Arguments
110+
======================
111+
112+
If the Python script that you want to run as a job requires CLI arguments,
113+
use the ``.with_argument()`` method to pass the arguments to the job.
114+
115+
Python
116+
------
117+
118+
Suppose you want to run the following python script named ``job_script_argument.py``:
119+
120+
.. code-block:: python3
121+
122+
import sys
123+
print("Hello " + str(sys.argv[1]) + " and " + str(sys.argv[2]))
124+
125+
This example runs a job with CLI arguments:
126+
127+
.. code-block:: python3
128+
129+
from ads.jobs import Job
130+
from ads.jobs import DataScienceJob
131+
from ads.jobs import ScriptRuntime
132+
133+
job = Job()
134+
job.with_infrastructure(
135+
DataScienceJob()
136+
.with_log_id("<log_id>")
137+
.with_log_group_id("<log_group_id>")
138+
)
139+
140+
# The CLI argument can be passed in using `with_argument` when defining the runtime
141+
job.with_runtime(
142+
ScriptRuntime()
143+
.with_source("job_script_argument.py")
144+
.with_argument("<first_argument>", "<second_argument>")
145+
)
146+
147+
job.create()
148+
job_run = job.run()
149+
150+
After the job run is created and run, you can use the ``.watch()`` method to monitor
151+
its progress:
152+
153+
.. code-block:: python3
154+
155+
job_run.watch()
156+
157+
This job run prints out ``Hello <first_argument> and <second_argument>``.
158+
159+
YAML
160+
----
161+
162+
You could create the preceding example job with the following YAML file:
163+
164+
.. code-block:: yaml
165+
166+
kind: job
167+
spec:
168+
infrastructure:
169+
kind: infrastructure
170+
type: dataScienceJob
171+
spec:
172+
logGroupId: <log_group_ocid>
173+
logId: <log_ocid>
174+
compartmentId: <compartment_ocid>
175+
projectId: <project_ocid>
176+
subnetId: <subnet_ocid>
177+
shapeName: VM.Standard.E3.Flex
178+
shapeConfigDetails:
179+
memoryInGBs: 16
180+
ocpus: 1
181+
blockStorageSize: 50
182+
runtime:
183+
kind: runtime
184+
type: python
185+
spec:
186+
args:
187+
- <first_argument>
188+
- <second_argument>
189+
scriptPathURI: job_script_env.py
190+
191+
192+
Environment Variables
193+
=====================
194+
195+
Similarly, if the script you want to run requires environment variables, you also pass them in using the ``.with_environment_variable()`` method. The key-value pair of the environment variable are passed in using the ``.with_environment_variable()`` method, and are accessed in the Python script using the ``os.environ`` dictionary.
196+
197+
Python
198+
------
199+
200+
Suppose you want to run the following python script named ``job_script_env.py``:
201+
202+
.. code-block:: python3
203+
204+
import os
205+
import sys
206+
print("Hello " + os.environ["KEY1"] + " and " + os.environ["KEY2"])
207+
208+
This example runs a job with environment variables:
209+
210+
.. code-block:: python3
211+
212+
from ads.jobs import Job
213+
from ads.jobs import DataScienceJob
214+
from ads.jobs import ScriptRuntime
215+
216+
job = Job()
217+
job.with_infrastructure(
218+
DataScienceJob()
219+
.with_log_group_id("<log_group_ocid>")
220+
.with_log_id("<log_ocid>")
221+
# The following infrastructure configurations are optional
222+
# if you are in an OCI data science notebook session.
223+
# The configurations of the notebook session will be used as defaults
224+
.with_compartment_id("<compartment_ocid>")
225+
.with_project_id("<project_ocid>")
226+
.with_subnet_id("<subnet_ocid>")
227+
.with_shape_name("VM.Standard.E3.Flex")
228+
.with_shape_config_details(memory_in_gbs=16, ocpus=1)
229+
.with_block_storage_size(50)
230+
)
231+
232+
job.with_runtime(
233+
ScriptRuntime()
234+
.with_source("job_script_env.py")
235+
.with_environment_variable(KEY1="<first_value>", KEY2="<second_value>")
236+
)
237+
job.create()
238+
job_run = job.run()
239+
240+
You can watch the progress of the job run using the ``.watch()`` method:
241+
242+
.. code-block:: python3
243+
244+
job_run.watch()
245+
246+
This job run prints out ``Hello <first_value> and <second_value>``.
247+
248+
YAML
249+
----
250+
251+
You could create the preceding example job with the following YAML file:
5252

6253
The :py:class:`~ads.jobs.ScriptRuntime` is designed for you to define job artifacts and configurations supported by OCI
7254
Data Science Jobs natively. It can be used with any script types that is supported by the OCI Data Science Jobs,
8255
including shell scripts and python scripts.
9256

10-
The source code can be a single script, files in a folder or a zip/tar file.
257+
kind: job
258+
spec:
259+
infrastructure:
260+
kind: infrastructure
261+
type: dataScienceJob
262+
spec:
263+
logGroupId: <log_group_ocid>
264+
logId: <log_ocid>
265+
compartmentId: <compartment_ocid>
266+
projectId: <project_ocid>
267+
subnetId: <subnet_ocid>
268+
shapeName: VM.Standard.E3.Flex
269+
shapeConfigDetails:
270+
memoryInGBs: 16
271+
ocpus: 1
272+
blockStorageSize: 50
273+
runtime:
274+
kind: runtime
275+
type: python
276+
spec:
277+
env:
278+
- name: KEY1
279+
value: <first_value>
280+
- name: KEY2
281+
value: <second_value>
282+
scriptPathURI: job_script_env.py
11283

12284
See also: `Preparing Job Artifacts <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-artifact.htm>`_.
13285

docs/source/user_guide/model_registration/introduction.rst

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. _model-catalog-8:
22

3-
##########################
4-
Register and Deploy Models
5-
##########################
3+
###################################
4+
Register, Manage, and Deploy Models
5+
###################################
66

77

88
You could register your model with OCI Data Science service through ADS. Alternatively, the Oracle Cloud Infrastructure (OCI) Console can be used by going to the Data Science projects page, selecting a project, then click **Models**. The models page shows the model artifacts that are in the model catalog for a given project.
@@ -51,9 +51,16 @@ Register
5151
model_schema
5252
model_metadata
5353
model_file_customization
54+
55+
Manage Model
56+
------------
57+
58+
.. toctree::
59+
:maxdepth: 1
60+
5461
model_version_set
5562

56-
Deploying model
63+
Deploying Model
5764
---------------
5865

5966
.. toctree::

docs/source/user_guide/quick_start/quick_start.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Quick Start
88
* :doc:`Read and Write to Object Storage, Databases and other OCI Resources<../loading_data/connect>`
99
* :doc:`OCI serverless Spark - Data Flow <../apachespark/quickstart>`
1010
* :doc:`Evaluate Trained Models<../model_training/model_evaluation/quick_start>`
11-
* :doc:`Register and Deploy Models<../model_registration/quick_start>`
11+
* :doc:`Register, Manage, and Deploy Models<../model_registration/quick_start>`
1212
* :doc:`Store and Retrieve your data source credentials<../secrets/quick_start>`
1313
* :doc:`Conect to existing OCI Big Data Service<../big_data_service/quick_start>`
1414

0 commit comments

Comments
 (0)