Skip to content

Commit fb3ed9e

Browse files
authored
docs/odsc 41635/update doc for dataflow pool support (#224)
1 parent 439ac05 commit fb3ed9e

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

docs/source/user_guide/apachespark/dataflow-spark-magic.rst

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,26 @@ Example path : ``oci://<your-bucket>@<your-tenancy-namespace>/conda_environments
191191
"configuration":{\
192192
"spark.archives": "oci://<your-bucket>@<your-tenancy-namespace>/conda_environments/cpu/PySpark 3.2 and Data Flow/2.0/pyspark32_p38_cpu_v2#conda>"}}'
193193
194+
**Example command with the Data Flow Pools**
195+
196+
.. versionadded:: 2.8.7
197+
198+
The `Data Flow Pools <https://docs.oracle.com/en-us/iaas/data-flow/using/pools.htm>`__ achieve fast job startup, resource isolation, manage budgets, and prioritize your Spark workloads. Use the `poolId` to use the Pool resources.
199+
200+
.. code-block:: python
201+
202+
%create_session -l python -c '{\
203+
"compartmentId":"<compartment_id>",\
204+
"displayName":"TestDataFlowSession",\
205+
"sparkVersion":"3.2.1",\
206+
"driverShape":"VM.Standard.E4.Flex",\
207+
"executorShape":"VM.Standard.E4.Flex",\
208+
"numExecutors":1,\
209+
"driverShapeConfig":{"ocpus":1,"memoryInGBs":16},\
210+
"executorShapeConfig":{"ocpus":1,"memoryInGBs":16},\
211+
"poolId": "<ocid1.dataflowpool...>",\
212+
"logsBucketUri" : "oci://<bucket_name>@<namespace>/"}'
213+
194214
195215
Update Session
196216
**************
@@ -296,4 +316,4 @@ Check the result:
296316
.. code-block:: python
297317
298318
print(type(df_nyc_tlc))
299-
df_nyc_tlc.head()
319+
df_nyc_tlc.head()

docs/source/user_guide/apachespark/dataflow.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,8 @@ Define config. If you have not yet configured your dataflow setting, or would li
3737
dataflow_config.spark_version = "3.2.1"
3838
dataflow_config.configuration = {"spark.driver.memory": "512m"}
3939
dataflow_config.private_endpoint_id = "ocid1.dataflowprivateendpoint.oc1.iad.<your private endpoint ocid>"
40+
# For using Data Flow Pools
41+
# dataflow_config.poolId = "ocid1.dataflowpool.oc1..<unique_ocid>"
4042
4143
Use the config defined above to submit the cell.
4244

@@ -207,6 +209,7 @@ You can set them using the ``with_{property}`` functions:
207209
- ``with_spark_version``
208210
- ``with_warehouse_bucket_uri``
209211
- ``with_private_endpoint_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pe-allowing.htm#pe-allowing>`__)
212+
- ``with_pool_id`` (`doc <https://docs.oracle.com/en-us/iaas/data-flow/using/pools.htm>`__)
210213
- ``with_defined_tags``
211214
- ``with_freeform_tags``
212215

@@ -274,6 +277,8 @@ accepted. In the next example, the prefix is given for ``script_bucket``.
274277
.with_executor_shape("VM.Standard.E4.Flex")
275278
.with_executor_shape_config(ocpus=4, memory_in_gbs=64)
276279
.with_spark_version("3.0.2")
280+
# For using Data Flow Pool
281+
# .with_pool_id("ocid1.dataflowpool.oc1..<unique_ocid>")
277282
.with_defined_tag(
278283
**{"Oracle-Tags": {"CreatedBy": "test_name@oracle.com"}}
279284
)
@@ -576,6 +581,7 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
576581
numExecutors: 1
577582
sparkVersion: 3.2.1
578583
privateEndpointId: <private_endpoint_ocid>
584+
poolId: <dataflow_pool_ocid>
579585
definedTags:
580586
Oracle-Tags:
581587
CreatedBy: test_name@oracle.com
@@ -659,6 +665,9 @@ into the ``Job.from_yaml()`` function to build a Data Flow job:
659665
privateEndpointId:
660666
required: false
661667
type: string
668+
poolId:
669+
required: false
670+
type: string
662671
configuration:
663672
required: false
664673
type: dict

0 commit comments

Comments
 (0)