Improves Data Flow user guide with the new features. (#80)

mrDzurb · web-flow · commit cfb7bef32e46 · 2023-02-23T15:59:48.000-08:00
diff --git a/docs/source/user_guide/apachespark/dataflow.rst b/docs/source/user_guide/apachespark/dataflow.rst
@@ -36,6 +36,7 @@ Define config. If you have not yet configured your dataflow setting, or would li
   dataflow_config.logs_bucket_uri = "oci://<my-bucket>@<my-tenancy>/"
   dataflow_config.spark_version = "3.2.1"
   dataflow_config.configuration = {"spark.driver.memory": "512m"}
+  dataflow_config.private_endpoint_id = "ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>"
 
 Use the config defined above to submit the cell.
 
@@ -159,6 +160,11 @@ You could submit a notebook using ADS SDK APIs. Here is an example to submit a n
 		.with_executor_shape("VM.Standard.E4.Flex")
 		.with_executor_shape_config(ocpus=4, memory_in_gbs=64)
         .with_logs_bucket_uri("oci://mybucket@mytenancy/")
+        .with_private_endpoint_id("ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>")
+        .with_configuration({
+            "spark.driverEnv.myEnvVariable": "value1",
+            "spark.executorEnv.myEnvVariable": "value2",
+        })
     )
     rt = (
         DataFlowNotebookRuntime()
@@ -197,6 +203,7 @@ You can set them using the ``with_{property}`` functions:
 - ``with_num_executors``
 - ``with_spark_version``
 - ``with_warehouse_bucket_uri``
+- ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
 
 For more details, see `DataFlow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
 
@@ -209,6 +216,7 @@ The ``DataFlowRuntime`` properties are:
 - ``with_archive_uri`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#third-party-libraries>`__)
 - ``with_archive_bucket``
 - ``with_custom_conda``
+- ``with_configuration``
 
 For more details, see the `runtime class documentation <../../ads.jobs.html#module-ads.jobs.builders.runtimes.python_runtime>`__.
 
@@ -217,7 +225,7 @@ object can be reused and combined with various ``DataFlowRuntime`` parameters to
 create applications.
 
 In the following "hello-world" example, ``DataFlow`` is populated with ``compartment_id``,
-``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config`` 
+``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config``
 and ``spark_version``. ``DataFlowRuntime`` is populated with ``script_uri`` and
 ``script_bucket``. The ``script_uri`` specifies the path to the script. It can be
 local or remote (an Object Storage path). If the path is local, then
@@ -267,6 +275,10 @@ accepted. In the next example, the prefix is given for ``script_bucket``.
             .with_script_uri(os.path.join(td, "script.py"))
             .with_script_bucket("oci://mybucket@namespace/prefix")
             .with_custom_conda("oci://<mybucket>@<mynamespace>/<path/to/conda_pack>")
+            .with_configuration({
+                "spark.driverEnv.myEnvVariable": "value1",
+                "spark.executorEnv.myEnvVariable": "value2",
+            })
         )
         df = Job(name=name, infrastructure=dataflow_configs, runtime=runtime_config)
         df.create()
@@ -374,6 +386,10 @@ In the next example, ``archive_uri`` is given as an Object Storage location.
 		    .with_executor_shape("VM.Standard.E4.Flex")
 		    .with_executor_shape_config(ocpus=4, memory_in_gbs=64)
             .with_spark_version("3.0.2")
+            .with_configuration({
+                "spark.driverEnv.myEnvVariable": "value1",
+                "spark.executorEnv.myEnvVariable": "value2",
+            })
         )
         runtime_config = (
             DataFlowRuntime()
@@ -545,12 +561,16 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
         language: PYTHON
         logsBucketUri: <logs_bucket_uri>
         numExecutors: 1
-        sparkVersion: 2.4.4
+        sparkVersion: 3.2.1
+        privateEndpointId: <private_endpoint_ocid>
       type: dataFlow
     name: dataflow_app_name
     runtime:
       kind: runtime
       spec:
+        configuration:
+            spark.driverEnv.myEnvVariable: value1
+            spark.executorEnv.myEnvVariable: value2
         scriptBucket: bucket_name
         scriptPathURI: oci://<bucket_name>@<namespace>/<prefix>
       type: dataFlow
@@ -618,6 +638,12 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
             sparkVersion:
                 required: false
                 type: string
+            privateEndpointId:
+                required: false
+                type: string
+            configuration:
+                required: false
+                type: dict
     type:
         allowed:
             - dataFlow
@@ -662,11 +688,9 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
                             - service
                         required: true
                         type: string
-            env:
-                type: list
+            configuration:
                 required: false
-                schema:
-                    type: dict
+                type: dict
             freeform_tag:
                 required: false
                 type: dict