You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before submitting the workload to jobs, you can run it locally to test your code, dependencies, configurations etc.
4
-
With ``-b local`` flag, it uses a local backend. Further when you need to run this workload on odsc jobs, simply use ``-b job`` flag instead.
4
+
With ``-b local`` flag, it uses a local backend. Further when you need to run this workload on OCI data science jobs, simply use ``-b job`` flag instead.
5
5
6
6
.. code-block:: bash
7
7
@@ -13,9 +13,10 @@ If your code requires to use any oci services (like object bucket), you need to
13
13
14
14
oci_key_mnt = ~/.oci:/home/oci_dist_training/.oci
15
15
16
-
**Submit the workload:**
17
-
16
+
Note that the local backend requires the source code for your workload is available locally in the source folder specified in the ``config.ini`` file.
17
+
If you specified Git repository or OCI object storage location as source code location in your workflow YAML, please make sure you have a local copy available for local testing.
18
18
19
+
**Submit the workload:**
19
20
20
21
.. code-block:: bash
21
22
@@ -24,22 +25,23 @@ If your code requires to use any oci services (like object bucket), you need to
24
25
**Note:**: This will automatically push the docker image to the
Copy file name to clipboardExpand all lines: docs/source/user_guide/model_training/distributed_training/dask/creating.rst
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -230,7 +230,7 @@ To view the logs from a job run, you could run -
230
230
231
231
ads opctl watch oci.xxxx.<job_run_ocid>
232
232
233
-
You could stream the logs from any of the job run ocid using ``ads opctl watch`` command. Your could run this comand from mutliple terminal to watch all of the job runs. Typically, watching ``mainJobRunId`` should yeild most informative log.
233
+
You could stream the logs from any of the job run ocid using ``ads opctl watch`` command. You could run this command from multiple terminal to watch all of the job runs. Typically, watching ``mainJobRunId`` should yield most informative log.
234
234
235
235
To find the IP address of the scheduler dashboard, you could check the configuration file generated by the Main job by running -
For this example, the code to run was inspired from an example
10
-
`found here <https://github.com/Azure/azureml-examples/blob/main/python-sdk/workflows/train/pytorch/cifar-distributed/src/train.py>`_
10
+
`found here <https://github.com/Azure/azureml-examples/blob/32eeda9e9f394bd6c3b687b55e2740abc50b116c/sdk/python/jobs/single-step/pytorch/distributed-training/src/train.py>`_
11
11
12
12
Note that ``MASTER_ADDR``, ``MASTER_PORT``, ``WORLD_SIZE``, ``RANK``, and ``LOCAL_RANK`` are environment variables
If you are behind proxy, ads opctl will automatically use your proxy settings (defined via ``no_proxy``, ``http_proxy`` and ``https_proxy``).
@@ -397,7 +397,15 @@ the output from the dry run will show all the actions and infrastructure configu
397
397
398
398
.. include:: ../_test_and_submit.rst
399
399
400
-
.. _hvd_saving_artifacts:
400
+
**Monitoring the workload logs**
401
+
402
+
To view the logs from a job run, you could run -
403
+
404
+
.. code-block:: bash
405
+
406
+
ads opctl watch oci.xxxx.<job_run_ocid>
407
+
408
+
You could stream the logs from any of the job run ocid using ``ads opctl watch`` command. You could run this command from multiple terminal to watch all of the job runs. Typically, watching ``mainJobRunId`` should yield most informative log.
0 commit comments