Skip to content

Commit f03320d

Browse files
committed
Update pytorch creating.rst
1 parent 7620895 commit f03320d

File tree

1 file changed

+13
-5
lines changed
  • docs/source/user_guide/model_training/distributed_training/pytorch

1 file changed

+13
-5
lines changed

docs/source/user_guide/model_training/distributed_training/pytorch/creating.rst

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Creating PyTorch Distributed Workloads
77
**Write your training code:**
88

99
For this example, the code to run was inspired from an example
10-
`found here <https://github.com/Azure/azureml-examples/blob/main/python-sdk/workflows/train/pytorch/cifar-distributed/src/train.py>`_
10+
`found here <https://github.com/Azure/azureml-examples/blob/32eeda9e9f394bd6c3b687b55e2740abc50b116c/sdk/python/jobs/single-step/pytorch/distributed-training/src/train.py>`_
1111

1212
Note that ``MASTER_ADDR``, ``MASTER_PORT``, ``WORLD_SIZE``, ``RANK``, and ``LOCAL_RANK`` are environment variables
1313
that will automatically be set.
@@ -20,7 +20,7 @@ that will automatically be set.
2020
# BSD 3-Clause License
2121
#
2222
# Script adapted from:
23-
# https://github.com/Azure/azureml-examples/blob/main/python-sdk/workflows/train/pytorch/cifar-distributed/src/train.py
23+
# https://github.com/Azure/azureml-examples/blob/32eeda9e9f394bd6c3b687b55e2740abc50b116c/sdk/python/jobs/single-step/pytorch/distributed-training/src/train.py
2424
# ==============================================================================
2525
2626
@@ -302,7 +302,7 @@ Specify image name and tag
302302
export TAG=latest
303303
304304
305-
Build the container image.
305+
Build the container image
306306

307307
.. code-block:: bash
308308
@@ -318,7 +318,7 @@ The code is assumed to be in the current working directory. To override the sour
318318
ads opctl distributed-training build-image \
319319
-t $TAG \
320320
-reg $IMAGE_NAME \
321-
-df oci_dist_training_artifacts/horovod/v1/oci_dist_training_artifacts/pytorch/v1/Dockerfile
321+
-df oci_dist_training_artifacts/pytorch/v1/Dockerfile
322322
-s <code_dir>
323323
324324
If you are behind proxy, ads opctl will automatically use your proxy settings (defined via ``no_proxy``, ``http_proxy`` and ``https_proxy``).
@@ -397,7 +397,15 @@ the output from the dry run will show all the actions and infrastructure configu
397397

398398
.. include:: ../_test_and_submit.rst
399399

400-
.. _hvd_saving_artifacts:
400+
**Monitoring the workload logs**
401+
402+
To view the logs from a job run, you could run -
403+
404+
.. code-block:: bash
405+
406+
ads opctl watch oci.xxxx.<job_run_ocid>
407+
408+
You could stream the logs from any of the job run ocid using ``ads opctl watch`` command. You could run this command from multiple terminal to watch all of the job runs. Typically, watching ``mainJobRunId`` should yield most informative log.
401409

402410
.. include:: ../_save_artifacts.rst
403411

0 commit comments

Comments
 (0)