Skip to content

Commit 70f10b0

Browse files
committed
Improves DataFlow user guide with the new features.
1 parent 071979a commit 70f10b0

File tree

1 file changed

+15
-2
lines changed

1 file changed

+15
-2
lines changed

docs/source/user_guide/apachespark/dataflow.rst

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Define config. If you have not yet configured your dataflow setting, or would li
3636
dataflow_config.logs_bucket_uri = "oci://<my-bucket>@<my-tenancy>/"
3737
dataflow_config.spark_version = "3.2.1"
3838
dataflow_config.configuration = {"spark.driver.memory": "512m"}
39+
dataflow_config.private_endpoint_id = "ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>"
3940
4041
Use the config defined above to submit the cell.
4142

@@ -159,6 +160,7 @@ You could submit a notebook using ADS SDK APIs. Here is an example to submit a n
159160
.with_executor_shape("VM.Standard.E4.Flex")
160161
.with_executor_shape_config(ocpus=4, memory_in_gbs=64)
161162
.with_logs_bucket_uri("oci://mybucket@mytenancy/")
163+
.with_private_endpoint_id("ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>")
162164
)
163165
rt = (
164166
DataFlowNotebookRuntime()
@@ -167,6 +169,7 @@ You could submit a notebook using ADS SDK APIs. Here is an example to submit a n
167169
) # This could be local path or http path to notebook ipynb file
168170
.with_script_bucket("<my-bucket>")
169171
.with_exclude_tag(["ignore", "remove"]) # Cells to Ignore
172+
.with_environment_variable(env1="test", env2= "test2") # will be propagated to both driver and executor
170173
)
171174
job = Job(infrastructure=df, runtime=rt).create(overwrite=True)
172175
df_run = job.run(wait=True)
@@ -197,6 +200,7 @@ You can set them using the ``with_{property}`` functions:
197200
- ``with_num_executors``
198201
- ``with_spark_version``
199202
- ``with_warehouse_bucket_uri``
203+
- ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
200204

201205
For more details, see `DataFlow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
202206

@@ -209,6 +213,7 @@ The ``DataFlowRuntime`` properties are:
209213
- ``with_archive_uri`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#third-party-libraries>`__)
210214
- ``with_archive_bucket``
211215
- ``with_custom_conda``
216+
- ``with_environment_variable``
212217

213218
For more details, see the `runtime class documentation <../../ads.jobs.html#module-ads.jobs.builders.runtimes.python_runtime>`__.
214219

@@ -217,7 +222,7 @@ object can be reused and combined with various ``DataFlowRuntime`` parameters to
217222
create applications.
218223

219224
In the following "hello-world" example, ``DataFlow`` is populated with ``compartment_id``,
220-
``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config``
225+
``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config``
221226
and ``spark_version``. ``DataFlowRuntime`` is populated with ``script_uri`` and
222227
``script_bucket``. The ``script_uri`` specifies the path to the script. It can be
223228
local or remote (an Object Storage path). If the path is local, then
@@ -267,6 +272,7 @@ accepted. In the next example, the prefix is given for ``script_bucket``.
267272
.with_script_uri(os.path.join(td, "script.py"))
268273
.with_script_bucket("oci://mybucket@namespace/prefix")
269274
.with_custom_conda("oci://<mybucket>@<mynamespace>/<path/to/conda_pack>")
275+
.with_environment_variable(env1="test", env2= "test2") # will be propagated to both driver and executor
270276
)
271277
df = Job(name=name, infrastructure=dataflow_configs, runtime=runtime_config)
272278
df.create()
@@ -545,14 +551,18 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
545551
language: PYTHON
546552
logsBucketUri: <logs_bucket_uri>
547553
numExecutors: 1
548-
sparkVersion: 2.4.4
554+
sparkVersion: 3.2.1
555+
privateEndpointId: <private_endpoint_ocid>
549556
type: dataFlow
550557
name: dataflow_app_name
551558
runtime:
552559
kind: runtime
553560
spec:
554561
scriptBucket: bucket_name
555562
scriptPathURI: oci://<bucket_name>@<namespace>/<prefix>
563+
env:
564+
- name: env1
565+
value: test1
556566
type: dataFlow
557567
558568
**Data Flow Infrastructure YAML Schema**
@@ -618,6 +628,9 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
618628
sparkVersion:
619629
required: false
620630
type: string
631+
privateEndpointId:
632+
required: false
633+
type: string
621634
type:
622635
allowed:
623636
- dataFlow

0 commit comments

Comments
 (0)