Skip to content

Commit f839b4c

Browse files
authored
Preliminary pipeline docs (#67)
2 parents 4469382 + 00ca552 commit f839b4c

23 files changed

+3587
-43
lines changed

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ Oracle Accelerated Data Science SDK (ADS)
5656
user_guide/big_data_service/index
5757
user_guide/jobs/index
5858
user_guide/logs/logs
59+
user_guide/pipeline/index
5960
user_guide/secrets/index
6061

6162
.. toctree::

docs/source/user_guide/cli/opctl/configure.rst

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,15 @@ CLI Configuration
33

44
**Prerequisite**
55

6-
* You have completed :doc:`ADS CLI installation <../quickstart>`
6+
* You have completed :doc:`ADS CLI installation <../quickstart>`
77

88

99
Setup default values for different options while running ``OCI Data Sciecne Jobs`` or ``OCI DataFlow``. By setting defaults, you can avoid inputing compartment ocid, project ocid, etc.
1010

11-
To setup configuration run -
11+
To setup configuration run -
1212

1313
.. code-block:: shell
14-
14+
1515
ads opctl configure
1616
1717
This will prompt you to setup default ADS CLI configurations for each OCI profile defined in your OCI config. By default, all the files are generated in the ``~/.ads_ops`` folder.
@@ -20,8 +20,8 @@ This will prompt you to setup default ADS CLI configurations for each OCI profil
2020

2121
``~/.ads_ops/config.ini`` will contain OCI profile defaults and conda pack related information. For example:
2222

23-
.. code-block::
24-
23+
.. code-block::
24+
2525
[OCI]
2626
oci_config = ~/.oci/config
2727
oci_profile = ANOTHERPROF
@@ -30,7 +30,7 @@ This will prompt you to setup default ADS CLI configurations for each OCI profil
3030
conda_pack_folder = </local/path/for/saving/condapack>
3131
conda_pack_os_prefix = oci://my-bucket@mynamespace/conda_environments/
3232
33-
``~/.ads_ops/ml_job_config.ini`` will contain defaults for running ``Data Science Job``. Defaults are set for each profile listed in your oci config file. Here is a sample -
33+
``~/.ads_ops/ml_job_config.ini`` will contain defaults for running ``Data Science Job``. Defaults are set for each profile listed in your oci config file. Here is a sample -
3434

3535
.. code-block::
3636
@@ -53,7 +53,7 @@ This will prompt you to setup default ADS CLI configurations for each OCI profil
5353
block_storage_size_in_GBs = 50
5454
5555
56-
``~/.ads_ops/dataflow_config.ini`` will contain defaults for running ``Data Science Job``. Defaults are set for each profile listed in your oci config file. Here is a sample -
56+
``~/.ads_ops/dataflow_config.ini`` will contain defaults for running ``Data Science Job``. Defaults are set for each profile listed in your oci config file. Here is a sample -
5757

5858
.. code-block::
5959
@@ -66,3 +66,30 @@ This will prompt you to setup default ADS CLI configurations for each OCI profil
6666
num_executors = 3
6767
spark_version = 3.0.2
6868
archive_bucket = oci://mybucket@mytenancy/dataflow/archive
69+
70+
``~/.ads_ops/ml_pipeline.ini`` will contain defaults for running ``Data Science Pipeline``. Defaults are set for each profile listed in your oci config file. Here is a sample -
71+
72+
.. code-block::
73+
74+
[DEFAULT]
75+
compartment_id = oci.xxxx.<compartment_ocid>
76+
project_id = oci.xxxx.<project_ocid>
77+
78+
[ANOTHERPROF]
79+
compartment_id = oci.xxxx.<compartment_ocid>
80+
project_id = oci.xxxx.<project_ocid>
81+
82+
83+
``~/.ads_ops/local_backend.ini`` will contain defaults for running jobs and pipeline steps locally. While local operations do not involve connections to OCI services, default
84+
configurations are still set for each profile listed in your oci config file for consistency. Here is a sample -
85+
86+
.. code-block::
87+
88+
[DEFAULT]
89+
max_parallel_containers = 4
90+
pipeline_status_poll_interval_seconds = 5
91+
92+
93+
[ANOTHERPROF]
94+
max_parallel_containers = 4
95+
pipeline_status_poll_interval_seconds = 5

docs/source/user_guide/cli/opctl/local-development-setup.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,18 @@ Local Development Environment Setup
44

55
**Prerequisite**
66

7-
* You have completed :doc:`ADS CLI installation <../quickstart>`
8-
* You have completed :doc:`Configuaration <configure>`
7+
* You have completed :doc:`ADS CLI installation <../quickstart>`
8+
* You have completed :doc:`Configuaration <configure>`
99

10-
Setup up your workstation for development and testing your code locally before you submit it as a OCI Data Science Job. This section will guide you on how to setup environment for -
10+
Setup up your workstation for development and testing your code locally before you submit it as a OCI Data Science Job. This section will guide you on how to setup environment for -
1111

1212
* Building an OCI Data Science compatible conda environments on your workstation or CICD pipeline and publishing to object storage
1313
* Developing and testing code with a conda environment that is compatible with OCI Data Science Notebooks and OCI Data Science Jobs
1414
* Developing and testing code for running Bring Your Own Container (BYOC) jobs.
1515

1616
**Note**
1717

18-
* In this version you cannot directly access the Service provided conda environments from ADS CLI, but you can publish a service provided conda pack from an OCI Data Science Notebook session to your object storage bucket and then use the CLI to access the published version.
18+
* In this version you cannot directly access the Service provided conda environments from ADS CLI, but you can publish a service provided conda pack from an OCI Data Science Notebook session to your object storage bucket and then use the CLI to access the published version.
1919

2020
.. toctree::
2121
:hidden:
@@ -25,5 +25,5 @@ Setup up your workstation for development and testing your code locally before y
2525
localdev/vscode
2626
localdev/condapack
2727
localdev/jobs
28-
29-
28+
localdev/local_jobs
29+
localdev/local_pipelines
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
+++++++++++++++++++
2+
Local Job Execution
3+
+++++++++++++++++++
4+
5+
Your job can be executed in a local container to facilitate development and troubleshooting.
6+
7+
-------------
8+
Prerequisites
9+
-------------
10+
11+
1. :doc:`Install ADS CLI<../../quickstart>`
12+
2. Build a container image.
13+
- :doc:`Build Development Container Image<./jobs_container_image>` and :doc:`install a conda environment<./condapack>`
14+
- :doc:`Build Your Own Container (BYOC)<./jobs>`
15+
16+
------------
17+
Restrictions
18+
------------
19+
20+
When running locally, your job is subject to the following restrictions:
21+
- The job must use API Key auth. Resource Principal auth is not supported in a local container. See https://docs.oracle.com/iaas/Content/API/Concepts/apisigningkey.htm
22+
- You can only use conda environment published to your own Object Storage bucket. See :doc:`Working with Conda packs<./condapack>`
23+
- Your job files must be present on your local machine.
24+
- Any network calls must be reachable by your local machine. (i.e. Your job cannot connect to an endpoint that is only reachable within the job's subnet.)
25+
- Your local machine meets the hardware requirements of your job.
26+
27+
----------------
28+
Running your Job
29+
----------------
30+
31+
Using a conda environment
32+
=========================
33+
34+
This example below demonstrates how to run a local job using an installed conda environment:
35+
36+
.. code-block:: shell
37+
38+
ads opctl run --backend local --conda-slug myconda_p38_cpu_v1 --source-folder /path/to/my/job/files/ --entrypoint bin/my_script.py --cmd-args "--some-arg" --env-var "MY_VAR=12345"
39+
40+
Parameter explanation:
41+
- ``--backend local``: Run the job locally in a docker container.
42+
- ``--conda-slug myconda_p38_cpu_v1``: Use the ``myconda_p38_cpu_v1`` conda environment. Note that you must install this conda environment locally first.
43+
The local conda environment directory will be automatically mounted into the container and activated before the entrypoint is executed.
44+
- ``--source-folder /path/to/my/job/files/``: The local directory containing your job files. This directory is mounted into the container as a volume.
45+
- ``--entrypoint bin/my_script.py``: Set the container entrypoint to ``bin/my_script.py``. Note that this path is relative to the path specified with the ``--source-folder`` parameter.
46+
- ``--cmd-args "--some-arg"``: Pass ``--some-arg`` to the container entrypoint.
47+
- ``--env-var "MY_VAR=12345": Define envrionment variable ``MY_VAR`` with value ``12345``.
48+
49+
Using a custom image
50+
====================
51+
52+
This example below demonstrates how to run a local job using a custom container image:
53+
54+
.. code-block:: shell
55+
56+
ads opctl run --backend local --image my_image --entrypoint /path/to/my/binary --command my_cmd --env-var "MY_VAR=12345"
57+
58+
Parameter explanation:
59+
- ``--backend local``: Run the job locally in a docker container.
60+
- ``--image my_image``: Use the custom container image named ``my_image``.
61+
- ``--entrypoint /path/to/my/binary``: Set the container entrypoint to ``/path/to/my/binary``. Note that this path is within the container image.
62+
- ``--command my_cmd``: Set the container command to ``my_cmd``.
63+
- ``--env-var "MY_VAR=12345": Define envrionment variable ``MY_VAR`` with value ``12345``.
64+
65+
Viewing container output
66+
========================
67+
When the container is running, you can use the ``docker logs`` command to view its output. See https://docs.docker.com/engine/reference/commandline/logs/
68+
69+
Alternatively, you can use the ``--debug`` parameter to print the container stdout/stderr messages to your shell. Note that Python buffers output by default, so you may see output written
70+
to the shell in bursts. If you want to see output displayed in real-time, specify ``--env-var PYTHONUNBUFFERED=1``.
71+
72+
.. code-block:: shell
73+
74+
ads opctl run --backend local --conda-slug myconda_p38_cpu_v1 --source-folder /path/to/my/job/files/ --entrypoint my_script.py --env-var "PYTHONUNBUFFERED=1" --debug
75+
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
++++++++++++++++++++++++
2+
Local Pipeline Execution
3+
++++++++++++++++++++++++
4+
5+
Your pipeline can be executed locally to facilitate development and troubleshooting. Each pipeline step is executed in its own local container.
6+
7+
-------------
8+
Prerequisites
9+
-------------
10+
11+
1. :doc:`Install ADS CLI<../../quickstart>`
12+
2. :doc:`Build Development Container Image<./jobs_container_image>` and :doc:`install a conda environment<./condapack>`
13+
14+
------------
15+
Restrictions
16+
------------
17+
18+
Your pipeline steps are subject to the :doc:`same restrictions as local jobs<./local_jobs>`.
19+
20+
They are also subject to these additional restrictions:
21+
22+
- Pipeline steps must be of kind ``customScript``.
23+
- Custom container images are not yet supported. You must use the development container image with a conda environment.
24+
25+
---------------------------------------
26+
Configuring Local Pipeline Orchestrator
27+
---------------------------------------
28+
29+
Use ``ads opctl configure``. Refer to the ``local_backend.ini`` description in the configuration :doc:`instructions<../configure>`.
30+
31+
Most importantly, ``max_parallel_containers`` controls how many pipeline steps may be executed in parallel on your machine. Your pipeline DAG may allow multiple steps to be executed in parallel,
32+
but your local machine may not have enough cpu cores / memory to effectively run them all simultaneously.
33+
34+
---------------------
35+
Running your Pipeline
36+
---------------------
37+
38+
Local pipeline execution requires you to define your pipeline in a yaml file. Refer to the YAML examples :doc:`here<../../../pipeline/examples>`.
39+
40+
Then, invoke the following command to run your pipeline.
41+
42+
.. code-block:: shell
43+
44+
ads opctl run --backend local --file my_pipeline.yaml --source-folder /path/to/my/pipeline/step/files
45+
46+
Parameter explanation:
47+
- ``--backend local``: Run the pipeline locally using docker containers.
48+
- ``--file my_pipeline.yaml``: The yaml file defining your pipeline.
49+
- ``--source-folder /path/to/my/pipeline/step/files``: The local directory containing the files used by your pipeline steps. This directory is mounted into the container as a volume.
50+
Defaults to the current working directory if no value is provided.
51+
52+
Source folder and relative paths
53+
================================
54+
If your pipeline step runtimes are of type ``script`` or ``notebook``, the paths in your yaml files must be relative to the ``--source-folder``.
55+
56+
Pipeline steps using a runtime of type ``python`` are able to define their own working directory that will be mounted into the step's container instead.
57+
58+
For example, suppose your yaml file looked like this:
59+
60+
.. code-block:: yaml
61+
62+
kind: pipeline
63+
spec:
64+
displayName: example
65+
dag:
66+
- (step_1, step_2) >> step_3
67+
stepDetails:
68+
- kind: customScript
69+
spec:
70+
description: A step running a notebook
71+
name: step_1
72+
runtime:
73+
kind: runtime
74+
spec:
75+
conda:
76+
slug: myconda_p38_cpu_v1
77+
type: service
78+
notebookEncoding: utf-8
79+
notebookPathURI: step_1_files/my-notebook.ipynb
80+
type: notebook
81+
- kind: customScript
82+
spec:
83+
description: A step running a shell script
84+
name: step_2
85+
runtime:
86+
kind: runtime
87+
spec:
88+
conda:
89+
slug: myconda_p38_cpu_v1
90+
type: service
91+
scriptPathURI: step_2_files/my-script.sh
92+
type: script
93+
- kind: customScript
94+
spec:
95+
description: A step running a python script
96+
name: step_3
97+
runtime:
98+
kind: runtime
99+
spec:
100+
conda:
101+
slug: myconda_p38_cpu_v1
102+
type: service
103+
workingDir: /step_3/custom/working/dir
104+
scriptPathURI: my-python.py
105+
type: python
106+
type: pipeline
107+
108+
And suppose the pipeline is executed locally with the following command:
109+
110+
.. code-block:: shell
111+
112+
ads opctl run --backend local --file my_pipeline.yaml --source-folder /my/files
113+
114+
``step_1`` uses a ``notebook`` runtime. The container for ``step_1`` will mount the ``/my/files`` directory into the container. The ``/my/files/step_1_files/my-notebook.ipynb`` notebook file
115+
will be converted into a python script and executed in the container.
116+
117+
``step_2`` uses a ``script`` runtime. The container for ``step_2`` will mount the ``/my/files`` directory into the container. The ``/my/files/step_2_files/my-script.sh`` shell script will
118+
be executed in the container.
119+
120+
``step_3`` uses a ``python`` runtime. Instead of mounting the ``/my/files`` directory specified by ``--source-folder``, the ``/step_3/custom/working/dir`` directory will be mounted into the
121+
container. The ``/step_3/custom/working/dir/my-python.py`` script will be executed in the container.
122+
123+
Viewing container output and orchestration messages
124+
===================================================
125+
When a container is running, you can use the ``docker logs`` command to view its output. See https://docs.docker.com/engine/reference/commandline/logs/
126+
127+
Alternatively, you can use the ``--debug`` parameter to print each container's stdout/stderr messages to your shell. Note that Python buffers output by default, so you may see output written
128+
to the shell in bursts. If you want to see output displayed in real-time for a particular step, specify a non-zero value for the ``PYTHONUNBUFFERED`` environment variable in your step's runtime
129+
specification. For example:
130+
131+
.. code-block:: yaml
132+
133+
- kind: customScript
134+
spec:
135+
description: A step running a shell script
136+
name: step_1
137+
runtime:
138+
kind: runtime
139+
spec:
140+
conda:
141+
slug: myconda_p38_cpu_v1
142+
type: service
143+
scriptPathURI: my-script.sh
144+
env:
145+
PYTHONUNBUFFERED: 1
146+
type: script
147+
148+
149+
Pipeline steps can run in parallel. You may want your pipeline steps to prefix their log output to easily distinguish which lines of output are coming from which step.
150+
151+
When the ``--debug`` parameter is specified, the CLI will also output pipeline orchestration messages. These include messages about which steps are being started and a summary of each
152+
step's result when the pipeline finishes execution.
153+
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
* ``GitPythonRuntime`` allows you to run source code from a Git repository, see :ref:`Run from Git <job_run_git>`.
2+
* ``NotebookRuntime`` allows you to run a JupyterLab Python notebook, see :ref:`Run a Notebook <job_run_a_notebook>`.
3+
* ``PythonRuntime`` allows you to run Python code with additional options, including setting a working directory, adding Python paths, and copying output files, see :ref:`Run a ZIP file or folder <job_run_zip>`.
4+
* ``ScriptRuntime`` allows you to run Python, Bash, and Java scripts from a single source file (``.zip`` or ``.tar.gz``) or code directory, see :ref:`Run a Script <job_run_script>` and :ref:`Run a ZIP file or folder <job_run_zip>`.

docs/source/user_guide/jobs/data_science_job.rst

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,7 @@ Runtime
8585

8686
A job can have different types of *runtime* depending on the source code you want to run:
8787

88-
* ``ScriptRuntime`` allows you to run Python, Bash, and Java scripts from a single source file (``.zip`` or ``.tar.gz``) or code directory, see `Run a Script <run_script.html>`__ and `Run a ZIP file or folder <run_zip.html>`__.
89-
* ``PythonRuntime`` allows you to run Python code with additional options, including setting a working directory, adding python paths, and copying output files, see `Run a ZIP file or folder <run_zip.html>`__.
90-
* ``NotebookRuntime`` allows you to run a JupyterLab Python notebook, see `Run a Notebook <run_notebook.html>`__.
91-
* ``GitPythonRuntime`` allows you to run source code from a Git repository, see `Run from Git <run_git.html>`__.
88+
.. include:: _template/runtime_types.rst
9289

9390
All of these runtime options allow you to configure a `Data Science Conda Environment <https://docs.oracle.com/en-us/iaas/data-science/using/conda_understand_environments.htm>`__ for running your code. For example, to define a python script as a job runtime with a TensorFlow conda environment you could use:
9491

docs/source/user_guide/jobs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ Oracle Cloud Infrastructure (OCI) Data Science jobs enable you to define and run
1818
run_zip
1919
../cli/opctl/_template/jobs
2020
../cli/opctl/_template/monitoring
21+
../cli/opctl/localdev/local_jobs

docs/source/user_guide/jobs/run_git.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _job_run_git:
2+
13
Run a Git Repo
24
**************
35

docs/source/user_guide/jobs/run_notebook.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _job_run_a_notebook:
2+
13
Run a Notebook
24
**************
35

0 commit comments

Comments
 (0)