Merge branch 'develop' into ODSC-29065/md_opctl_docs

z7ye · web-flow · commit 888131646608 · 2023-05-03T16:36:53.000-07:00
diff --git a/ads/dataflow/dataflow.py b/ads/dataflow/dataflow.py
@@ -63,7 +63,7 @@ class SPARK_VERSION(str):
 class DataFlow:
     @deprecated(
         "2.6.3",
-        details="Use ads.jobs.DataFlow class for creating DataFlow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",
+        details="Use ads.jobs.DataFlow class for creating Data Flow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",
     )
     def __init__(
         self,
diff --git a/docs/source/user_guide/_template/prerequisite/data_flow.rst b/docs/source/user_guide/_template/prerequisite/data_flow.rst
@@ -1,3 +1,3 @@
-* DataFlow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for `setting up storage <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_storage>`_.
-* DataFlow requires policies to be set in IAM to access resources to manage and run applications/sessions. Refer to the Data Flow documentation on how to `setup policies <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#policy_set_up>`__.
-* DataFlow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application/Session.
+* Data Flow requires a bucket to store the logs, and a data warehouse bucket. Refer to the Data Flow documentation for `setting up storage <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_storage>`_.
+* Data Flow requires policies to be set in IAM to access resources to manage and run applications/sessions. Refer to the Data Flow documentation on how to `setup policies <https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#policy_set_up>`__.
+* Data Flow natively supports conda packs published to OCI Object Storage. Ensure the Data Flow Resource has read access to the bucket or path of your published conda pack, and that the spark version >= 3 when running your Data Flow Application/Session.
diff --git a/docs/source/user_guide/apachespark/dataflow.rst b/docs/source/user_guide/apachespark/dataflow.rst
@@ -2,7 +2,7 @@
 Running your Spark Application on OCI Data Flow
 ===============================================
 
-Submit your code to DataFlow for workloads that require larger resources.
+Submit your code to Data Flow for workloads that require larger resources.
 
 Notebook Extension
 ==================
@@ -124,7 +124,7 @@ ADS CLI
 
 Sometimes your code is too complex to run in a single cell, and it's better run as a notebook or file. In that case, use the ADS Opctl CLI.
 
-To submit your notebook to DataFlow using the ``ads`` CLI, run:
+To submit your notebook to Data Flow using the ``ads`` CLI, run:
 
 .. code-block:: shell
 
@@ -205,7 +205,7 @@ You can set them using the ``with_{property}`` functions:
 - ``with_warehouse_bucket_uri``
 - ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
 
-For more details, see `DataFlow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
+For more details, see `Data Flow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
 
 ``DataFlowRuntime`` stores properties related to the script to be run, such as the path to the script and
 CLI arguments. Likewise all properties can be set using ``with_{property}``.
diff --git a/docs/source/user_guide/apachespark/setup-installation.rst b/docs/source/user_guide/apachespark/setup-installation.rst
@@ -178,8 +178,8 @@ Once the development environment is setup, you could write your code and run it
 ``core-site.xml`` is setup automatically when you install a pyspark conda pack.
 
 
-Logging From DataFlow
-=====================
+Logging From Data Flow
+======================
 
 If using the ADS Python SDK, 
 
diff --git a/docs/source/user_guide/cli/opctl/_template/monitoring.rst b/docs/source/user_guide/cli/opctl/_template/monitoring.rst
@@ -7,7 +7,7 @@ Monitoring With CLI
 watch
 +++++
 
-You can tail the logs generated by ``OCI Data Science Job Runs`` or ``OCI DataFlow Application Runs`` using the ``watch`` subcommand.
+You can tail the logs generated by ``OCI Data Science Job Runs`` or ``OCI Data Flow Application Runs`` using the ``watch`` subcommand.
 
 .. code-block:: shell
   
diff --git a/docs/source/user_guide/cli/opctl/configure.rst b/docs/source/user_guide/cli/opctl/configure.rst
@@ -9,7 +9,7 @@ CLI Configuration
     - You have completed :doc:`ADS CLI installation <../quickstart>`
 
 
-Setup default values for different options while running ``OCI Data Sciecne Jobs`` or ``OCI DataFlow``. By setting defaults, you can avoid inputing compartment ocid, project ocid, etc.
+Setup default values for different options while running ``OCI Data Science Jobs`` or ``OCI Data Flow``. By setting defaults, you can avoid inputing compartment ocid, project ocid, etc.
 
 To setup configuration run -
 
diff --git a/docs/source/user_guide/jobs/run_script.rst b/docs/source/user_guide/jobs/run_script.rst
@@ -1,13 +1,285 @@
 Run a Script
 ************
 
-This section shows how to create a job to run a script.
+This example shows you how to create a job running "Hello World" Python scripts.  Although Python scripts are used here, you could also run Bash or Shell scripts.  The Logging service log and log group are defined in the infrastructure.  The output of the script appear in the logs.
+
+Python
+======
+
+Suppose you would like to run the following "Hello World" python script named ``job_script.py``.
+
+.. code-block:: python3
+
+  print("Hello World")
+
+First, initiate a job with a job name:
+
+.. code-block:: python3
+
+  from ads.jobs import Job
+  job = Job(name="Job Name")
+
+Next, you specify the desired infrastructure to run the job. If you are in a notebook session, ADS can automatically fetch the infrastructure configurations and use them for the job. If you aren't in a notebook session or you want to customize the infrastructure, you can specify them using the methods from the ``DataScienceJob`` class:
+
+.. code-block:: python3
+
+  from ads.jobs import DataScienceJob
+
+  job.with_infrastructure(
+    DataScienceJob()
+    .with_log_group_id("<log_group_ocid>")
+    .with_log_id("<log_ocid>")
+    # The following infrastructure configurations are optional
+    # if you are in an OCI data science notebook session.
+    # The configurations of the notebook session will be used as defaults
+    .with_compartment_id("<compartment_ocid>")
+    .with_project_id("<project_ocid>")
+    .with_subnet_id("<subnet_ocid>")
+    .with_shape_name("VM.Standard.E3.Flex")
+    .with_shape_config_details(memory_in_gbs=16, ocpus=1) # Applicable only for the flexible shapes
+    .with_block_storage_size(50)
+  )
+
+In this example, it is a Python script so the ``ScriptRuntime()`` class is used to define the name of the script using the ``.with_source()`` method:
+
+.. code-block:: python3
+
+    from ads.jobs import ScriptRuntime
+    job.with_runtime(
+      ScriptRuntime().with_source("job_script.py")
+    )
+
+Finally, you create and run the job, which gives you access to the
+``job_run.id``:
+
+.. code-block:: python3
+
+    job.create()
+    job_run = job.run()
+
+Additionally, you can acquire the job run using the OCID:
+
+.. code-block:: python3
+
+    from ads.jobs import DataScienceJobRun
+    job_run = DataScienceJobRun.from_ocid(job_run.id)
+
+The ``.watch()`` method is useful to monitor the progress of the job run:
+
+.. code-block:: python3
+
+    job_run.watch()
+
+After the job has been created and runs successfully, you can find
+the output of the script in the logs if you configured logging.
+
+YAML
+====
+
+You could also initialize a job directly from a YAML string.  For example, to create a job identical to the preceding example, you could simply run the following:
+
+.. code-block:: python3
+
+  job = Job.from_string(f"""
+  kind: job
+  spec:
+    infrastructure:
+      kind: infrastructure
+      type: dataScienceJob
+      spec:
+        logGroupId: <log_group_ocid>
+        logId: <log_ocid>
+        compartmentId: <compartment_ocid>
+        projectId: <project_ocid>
+        subnetId: <subnet_ocid>
+        shapeName: VM.Standard.E3.Flex
+        shapeConfigDetails:
+          memoryInGBs: 16
+          ocpus: 1
+        blockStorageSize: 50
+    name: <resource_name>
+    runtime:
+      kind: runtime
+      type: python
+      spec:
+        scriptPathURI: job_script.py
+  """)
+
+
+Command Line Arguments
+======================
+
+If the Python script that you want to run as a job requires CLI arguments,
+use the ``.with_argument()`` method to pass the arguments to the job.
+
+Python
+------
+
+Suppose you want to run the following python script named ``job_script_argument.py``:
+
+.. code-block:: python3
+
+    import sys
+    print("Hello " + str(sys.argv[1]) + " and " + str(sys.argv[2]))
+
+This example runs a job with CLI arguments:
+
+.. code-block:: python3
+
+  from ads.jobs import Job
+  from ads.jobs import DataScienceJob
+  from ads.jobs import ScriptRuntime
+
+  job = Job()
+  job.with_infrastructure(
+    DataScienceJob()
+    .with_log_id("<log_id>")
+    .with_log_group_id("<log_group_id>")
+  )
+
+  # The CLI argument can be passed in using `with_argument` when defining the runtime
+  job.with_runtime(
+    ScriptRuntime()
+      .with_source("job_script_argument.py")
+      .with_argument("<first_argument>", "<second_argument>")
+    )
+
+  job.create()
+  job_run = job.run()
+
+After the job run is created and run, you can use the ``.watch()`` method to monitor
+its progress:
+
+.. code-block:: python3
+
+    job_run.watch()
+
+This job run prints out ``Hello <first_argument> and <second_argument>``.
+
+YAML
+----
+
+You could create the preceding example job with the following YAML file:
+
+.. code-block:: yaml
+
+  kind: job
+  spec:
+    infrastructure:
+      kind: infrastructure
+      type: dataScienceJob
+      spec:
+        logGroupId: <log_group_ocid>
+        logId: <log_ocid>
+        compartmentId: <compartment_ocid>
+        projectId: <project_ocid>
+        subnetId: <subnet_ocid>
+        shapeName: VM.Standard.E3.Flex
+        shapeConfigDetails:
+          memoryInGBs: 16
+          ocpus: 1
+        blockStorageSize: 50
+    runtime:
+      kind: runtime
+      type: python
+      spec:
+        args:
+        - <first_argument>
+        - <second_argument>
+      scriptPathURI: job_script_env.py
+
+
+Environment Variables
+=====================
+
+Similarly, if the script you want to run requires environment variables, you also pass them in using the ``.with_environment_variable()`` method. The key-value pair of the environment variable are passed in using the ``.with_environment_variable()`` method, and are accessed in the Python script using the ``os.environ`` dictionary.
+
+Python
+------
+
+Suppose you want to run the following python script named ``job_script_env.py``:
+
+.. code-block:: python3
+
+  import os
+  import sys
+  print("Hello " + os.environ["KEY1"] + " and " + os.environ["KEY2"])
+
+This example runs a job with environment variables:
+
+.. code-block:: python3
+
+  from ads.jobs import Job
+  from ads.jobs import DataScienceJob
+  from ads.jobs import ScriptRuntime
+
+  job = Job()
+  job.with_infrastructure(
+    DataScienceJob()
+    .with_log_group_id("<log_group_ocid>")
+    .with_log_id("<log_ocid>")
+    # The following infrastructure configurations are optional
+    # if you are in an OCI data science notebook session.
+    # The configurations of the notebook session will be used as defaults
+    .with_compartment_id("<compartment_ocid>")
+    .with_project_id("<project_ocid>")
+    .with_subnet_id("<subnet_ocid>")
+    .with_shape_name("VM.Standard.E3.Flex")
+    .with_shape_config_details(memory_in_gbs=16, ocpus=1)
+    .with_block_storage_size(50)
+  )
+
+  job.with_runtime(
+    ScriptRuntime()
+    .with_source("job_script_env.py")
+    .with_environment_variable(KEY1="<first_value>", KEY2="<second_value>")
+  )
+  job.create()
+  job_run = job.run()
+
+You can watch the progress of the job run using the ``.watch()`` method:
+
+.. code-block:: python3
+
+  job_run.watch()
+
+This job run prints out ``Hello <first_value> and <second_value>``.
+
+YAML
+----
+
+You could create the preceding example job with the following YAML file:
 
 The :py:class:`~ads.jobs.ScriptRuntime` is designed for you to define job artifacts and configurations supported by OCI
 Data Science Jobs natively. It can be used with any script types that is supported by the OCI Data Science Jobs,
 including shell scripts and python scripts.
 
-The source code can be a single script, files in a folder or a zip/tar file.
+  kind: job
+  spec:
+    infrastructure:
+      kind: infrastructure
+      type: dataScienceJob
+      spec:
+        logGroupId: <log_group_ocid>
+        logId: <log_ocid>
+        compartmentId: <compartment_ocid>
+        projectId: <project_ocid>
+        subnetId: <subnet_ocid>
+        shapeName: VM.Standard.E3.Flex
+        shapeConfigDetails:
+          memoryInGBs: 16
+          ocpus: 1
+        blockStorageSize: 50
+    runtime:
+      kind: runtime
+      type: python
+      spec:
+        env:
+        - name: KEY1
+          value: <first_value>
+        - name: KEY2
+          value: <second_value>
+      scriptPathURI: job_script_env.py
 
 See also: `Preparing Job Artifacts <https://docs.oracle.com/en-us/iaas/data-science/using/jobs-artifact.htm>`_.
 
diff --git a/docs/source/user_guide/model_registration/introduction.rst b/docs/source/user_guide/model_registration/introduction.rst
@@ -1,8 +1,8 @@
 .. _model-catalog-8:
 
-##########################
-Register and Deploy Models
-##########################
+###################################
+Register, Manage, and Deploy Models
+###################################
 
 
 You could register your model with OCI Data Science service through ADS. Alternatively, the Oracle Cloud Infrastructure (OCI) Console can be used by going to the Data Science projects page, selecting a project, then click **Models**. The models page shows the model artifacts that are in the model catalog for a given project.
@@ -51,9 +51,16 @@ Register
   model_schema
   model_metadata
   model_file_customization
+
+Manage Model
+------------
+
+.. toctree::
+  :maxdepth: 1
+
   model_version_set
 
-Deploying model
+Deploying Model
 ---------------
 
 .. toctree::
diff --git a/docs/source/user_guide/quick_start/quick_start.rst b/docs/source/user_guide/quick_start/quick_start.rst
@@ -8,7 +8,7 @@ Quick Start
 * :doc:`Read and Write to Object Storage, Databases and other OCI Resources<../loading_data/connect>`
 * :doc:`OCI serverless Spark - Data Flow <../apachespark/quickstart>`
 * :doc:`Evaluate Trained Models<../model_training/model_evaluation/quick_start>`
-* :doc:`Register and Deploy Models<../model_registration/quick_start>`
+* :doc:`Register, Manage, and Deploy Models<../model_registration/quick_start>`
 * :doc:`Store and Retrieve your data source credentials<../secrets/quick_start>`
 * :doc:`Conect to existing OCI Big Data Service<../big_data_service/quick_start>`
 

Original file line number	Diff line number	Diff line change
`@@ -63,7 +63,7 @@ class SPARK_VERSION(str):`
`63`	`63`	`class DataFlow:`
`64`	`64`	`@deprecated(`
`65`	`65`	`"2.6.3",`
`66`		`- details="Use ads.jobs.DataFlow class for creating DataFlow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",`
	`66`	`+ details="Use ads.jobs.DataFlow class for creating Data Flow applications and runs. Check https://accelerated-data-science.readthedocs.io/en/latest/user_guide/apachespark/dataflow.html#create-run-data-flow-application-using-ads-python-sdk",`
`67`	`67`	`)`
`68`	`68`	`def __init__(`
`69`	`69`	`self,`