You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For this example, the code to run was inspired from an example
10
-
`found here <https://github.com/Azure/azureml-examples/blob/main/python-sdk/workflows/train/pytorch/cifar-distributed/src/train.py>`_
10
+
`found here <https://github.com/Azure/azureml-examples/blob/32eeda9e9f394bd6c3b687b55e2740abc50b116c/sdk/python/jobs/single-step/pytorch/distributed-training/src/train.py>`_
11
11
12
12
Note that ``MASTER_ADDR``, ``MASTER_PORT``, ``WORLD_SIZE``, ``RANK``, and ``LOCAL_RANK`` are environment variables
If you are behind proxy, ads opctl will automatically use your proxy settings (defined via ``no_proxy``, ``http_proxy`` and ``https_proxy``).
@@ -397,7 +397,15 @@ the output from the dry run will show all the actions and infrastructure configu
397
397
398
398
.. include:: ../_test_and_submit.rst
399
399
400
-
.. _hvd_saving_artifacts:
400
+
**Monitoring the workload logs**
401
+
402
+
To view the logs from a job run, you could run -
403
+
404
+
.. code-block:: bash
405
+
406
+
ads opctl watch oci.xxxx.<job_run_ocid>
407
+
408
+
You could stream the logs from any of the job run ocid using ``ads opctl watch`` command. You could run this command from multiple terminal to watch all of the job runs. Typically, watching ``mainJobRunId`` should yield most informative log.
0 commit comments