Skip to content

Commit cfb7bef

Browse files
authored
Improves Data Flow user guide with the new features. (#80)
2 parents 20b2baa + 9b2b451 commit cfb7bef

File tree

1 file changed

+30
-6
lines changed

1 file changed

+30
-6
lines changed

docs/source/user_guide/apachespark/dataflow.rst

Lines changed: 30 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ Define config. If you have not yet configured your dataflow setting, or would li
3636
dataflow_config.logs_bucket_uri = "oci://<my-bucket>@<my-tenancy>/"
3737
dataflow_config.spark_version = "3.2.1"
3838
dataflow_config.configuration = {"spark.driver.memory": "512m"}
39+
dataflow_config.private_endpoint_id = "ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>"
3940
4041
Use the config defined above to submit the cell.
4142

@@ -159,6 +160,11 @@ You could submit a notebook using ADS SDK APIs. Here is an example to submit a n
159160
.with_executor_shape("VM.Standard.E4.Flex")
160161
.with_executor_shape_config(ocpus=4, memory_in_gbs=64)
161162
.with_logs_bucket_uri("oci://mybucket@mytenancy/")
163+
.with_private_endpoint_id("ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>")
164+
.with_configuration({
165+
"spark.driverEnv.myEnvVariable": "value1",
166+
"spark.executorEnv.myEnvVariable": "value2",
167+
})
162168
)
163169
rt = (
164170
DataFlowNotebookRuntime()
@@ -197,6 +203,7 @@ You can set them using the ``with_{property}`` functions:
197203
- ``with_num_executors``
198204
- ``with_spark_version``
199205
- ``with_warehouse_bucket_uri``
206+
- ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
200207

201208
For more details, see `DataFlow class documentation <https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/ads.jobs.html#module-ads.jobs.builders.infrastructure.dataflow>`__.
202209

@@ -209,6 +216,7 @@ The ``DataFlowRuntime`` properties are:
209216
- ``with_archive_uri`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#third-party-libraries>`__)
210217
- ``with_archive_bucket``
211218
- ``with_custom_conda``
219+
- ``with_configuration``
212220

213221
For more details, see the `runtime class documentation <../../ads.jobs.html#module-ads.jobs.builders.runtimes.python_runtime>`__.
214222

@@ -217,7 +225,7 @@ object can be reused and combined with various ``DataFlowRuntime`` parameters to
217225
create applications.
218226

219227
In the following "hello-world" example, ``DataFlow`` is populated with ``compartment_id``,
220-
``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config``
228+
``driver_shape``, ``driver_shape_config``, ``executor_shape``, ``executor_shape_config``
221229
and ``spark_version``. ``DataFlowRuntime`` is populated with ``script_uri`` and
222230
``script_bucket``. The ``script_uri`` specifies the path to the script. It can be
223231
local or remote (an Object Storage path). If the path is local, then
@@ -267,6 +275,10 @@ accepted. In the next example, the prefix is given for ``script_bucket``.
267275
.with_script_uri(os.path.join(td, "script.py"))
268276
.with_script_bucket("oci://mybucket@namespace/prefix")
269277
.with_custom_conda("oci://<mybucket>@<mynamespace>/<path/to/conda_pack>")
278+
.with_configuration({
279+
"spark.driverEnv.myEnvVariable": "value1",
280+
"spark.executorEnv.myEnvVariable": "value2",
281+
})
270282
)
271283
df = Job(name=name, infrastructure=dataflow_configs, runtime=runtime_config)
272284
df.create()
@@ -374,6 +386,10 @@ In the next example, ``archive_uri`` is given as an Object Storage location.
374386
.with_executor_shape("VM.Standard.E4.Flex")
375387
.with_executor_shape_config(ocpus=4, memory_in_gbs=64)
376388
.with_spark_version("3.0.2")
389+
.with_configuration({
390+
"spark.driverEnv.myEnvVariable": "value1",
391+
"spark.executorEnv.myEnvVariable": "value2",
392+
})
377393
)
378394
runtime_config = (
379395
DataFlowRuntime()
@@ -545,12 +561,16 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
545561
language: PYTHON
546562
logsBucketUri: <logs_bucket_uri>
547563
numExecutors: 1
548-
sparkVersion: 2.4.4
564+
sparkVersion: 3.2.1
565+
privateEndpointId: <private_endpoint_ocid>
549566
type: dataFlow
550567
name: dataflow_app_name
551568
runtime:
552569
kind: runtime
553570
spec:
571+
configuration:
572+
spark.driverEnv.myEnvVariable: value1
573+
spark.executorEnv.myEnvVariable: value2
554574
scriptBucket: bucket_name
555575
scriptPathURI: oci://<bucket_name>@<namespace>/<prefix>
556576
type: dataFlow
@@ -618,6 +638,12 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
618638
sparkVersion:
619639
required: false
620640
type: string
641+
privateEndpointId:
642+
required: false
643+
type: string
644+
configuration:
645+
required: false
646+
type: dict
621647
type:
622648
allowed:
623649
- dataFlow
@@ -662,11 +688,9 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
662688
- service
663689
required: true
664690
type: string
665-
env:
666-
type: list
691+
configuration:
667692
required: false
668-
schema:
669-
type: dict
693+
type: dict
670694
freeform_tag:
671695
required: false
672696
type: dict

0 commit comments

Comments
 (0)