Skip to content

Commit b04cc80

Browse files
committed
Update docs for distributed training
1 parent 3226228 commit b04cc80

File tree

28 files changed

+907
-240
lines changed

28 files changed

+907
-240
lines changed

docs/source/index.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,16 @@ Oracle Accelerated Data Science SDK (ADS)
1212
.. toctree::
1313
:hidden:
1414
:maxdepth: 5
15-
:caption: History:
15+
:caption: Getting Started:
1616

1717
release_notes
18+
user_guide/quick_start/quick_start
1819

1920
.. toctree::
2021
:hidden:
2122
:maxdepth: 5
2223
:caption: Installation and Configuration:
2324

24-
user_guide/quick_start/quick_start
2525
user_guide/cli/quickstart
2626
user_guide/cli/authentication
2727
user_guide/cli/opctl/configure
@@ -33,19 +33,18 @@ Oracle Accelerated Data Science SDK (ADS)
3333
:caption: Tasks:
3434

3535
user_guide/loading_data/connect
36-
user_guide/apachespark/spark
3736
user_guide/data_labeling/index
3837
user_guide/data_transformation/data_transformation
3938
user_guide/data_visualization/visualization
4039
user_guide/model_training/index
4140
user_guide/model_registration/introduction
42-
user_guide/ADSString/index
4341

4442
.. toctree::
4543
:hidden:
4644
:maxdepth: 5
4745
:caption: Integrations:
4846

47+
user_guide/apachespark/spark
4948
user_guide/big_data_service/index
5049
user_guide/jobs/index
5150
user_guide/logs/logs
Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,5 @@
11
.. _ADSString:
22

3-
######################
4-
Manipulating Text Data
5-
######################
6-
7-
83
TextStrings
94
-----------
105

@@ -18,10 +13,4 @@ TextStrings
1813
regex_match
1914
still_a_string
2015

21-
Text Extraction
22-
---------------
23-
24-
.. toctree::
25-
:maxdepth: 1
2616

27-
../text_extraction/text_dataset

docs/source/user_guide/apachespark/spark.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
=========================
2-
Working with Apache Spark
3-
=========================
1+
============
2+
Apache Spark
3+
============
44

55

66
.. admonition:: DataFlow

docs/source/user_guide/cli/opctl/localdev/vscode.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Setting up Visual Studio Code
66

77
**Prerequisites**
88

9-
1. ADS CLI is :doc:`configured<configure>`
9+
1. ADS CLI is :doc:`configured<../configure>`
1010
2. Install Visual Studio Code
1111
3. :doc:`Build Development Container Image<jobs_container_image>`
1212
4. Install Visual Studio Code extension for `Remote Development <https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack>`_

docs/source/user_guide/data_labeling/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. _data-labeling-8:
22

3-
#############
4-
Labeling Data
5-
#############
3+
##########
4+
Label Data
5+
##########
66

77
The Oracle Cloud Infrastructure (OCI) Data Labeling service allows you to create and browse datasets, view data records (text, images) and apply labels for the purposes of building AI/machine learning (ML) models. The service also provides interactive user interfaces that enable the labeling process. After you label records, you can export the dataset as line-delimited JSON Lines (JSONL) for use in model development.
88

docs/source/user_guide/data_transformation/data_transformation.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _data-transformations-8:
22

3-
Data Transformations
4-
####################
3+
Transform Data
4+
##############
55

66
When datasets are loaded with DatasetFactory, they can be transformed and manipulated easily with the built-in functions. Underlying, an ``ADSDataset`` object is a Pandas dataframe. Any operation that can be performed to a `Pandas dataframe <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`_ can also be applied to an ADS Dataset.
77

@@ -520,3 +520,11 @@ You can split the dataset right after the ``DatasetFactory.open()`` statement:
520520
ds = DatasetFactory.open("path/data.csv").set_target('target')
521521
train, test = ds.train_test_split(test_size=0.25)
522522
523+
Text Data
524+
*********
525+
526+
.. toctree::
527+
:maxdepth: 3
528+
529+
../ADSString/index
530+
../text_extraction/text_dataset

docs/source/user_guide/data_visualization/visualization.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. _data-visualization-8:
22

3-
##################
4-
Data Visualization
5-
##################
3+
##############
4+
Visualize Data
5+
##############
66

77
Data visualization is an important aspect of data exploration, analysis, and communication. Generally, visualization of the data is one of the first steps in any analysis. It allows the analysts to efficiently gain an understanding of the data and guides the exploratory data analysis (EDA) and the modeling process.
88

docs/source/user_guide/loading_data/connect.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
############
2-
Loading Data
3-
############
1+
#########
2+
Load Data
3+
#########
44

55

66
Connecting to Data Sources

docs/source/user_guide/model_registration/introduction.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. _model-catalog-8:
22

3-
#################################
4-
Model Registration and Deployment
5-
#################################
3+
##########################
4+
Register and Deploy Models
5+
##########################
66

77

88
You could register your model with OCI Data Science service through ADS. Alternatively, the Oracle Cloud Infrastructure (OCI) Console can be used by going to the Data Science projects page, selecting a project, then click **Models**. The models page shows the model artifacts that are in the model catalog for a given project.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
2+
**Profiling using Nvidia Nsights**
3+
4+
5+
`Nvidia Nsights <https://github.com/horovod/horovod/tree/master/examples/elastic/pytorch>`__. is a system wide profiling tool from Nvidia that can be used to profile Deep Learning workloads.
6+
7+
Nsights requires no change in your training code. This works on process level. You can enable this experimental feature(highlighted in bold) in your training setup via the following configuration in the runtime yaml file.
8+
9+
10+
.. code-block:: bash
11+
12+
- name: PROFILE
13+
value: 1
14+
- name: PROFILE_CMD
15+
value: ""nsys profile -w true -t cuda,nvtx,osrt,cudnn,cublas -s none -o /opt/ml/nsight_report -x true""
16+
17+
18+
Refer `this <https://docs.nvidia.com/nsight-systems/UserGuide/index.html#cli-profile-command-switch-options>`__ for nsys profile command options. You can modify the command within the ``PROFILE_CMD`` but remember this is all experimental. The profiling reports are generated per node. You need to download the reports to your computer manually or via the oci command.
19+
20+
.. code-block:: bash
21+
22+
oci os object bulk-download \
23+
-ns <namespace> \
24+
-bn <bucket_name> \
25+
--download-dir /path/on/your/computer \
26+
--prefix path/on/bucket/<job_id>
27+
28+
To view the reports, you would need to install Nsight Systems app from `here <https://developer.nvidia.com/nsight-systems>`_. Thereafter, open the downloaded reports in the Nsight Systems app.

0 commit comments

Comments
 (0)