You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Maxim Integrated AI project is comprised of five repositories:
6
6
@@ -52,18 +52,71 @@ where “....” is the project root, for example `~/Documents/Source/AI`.
52
52
53
53
### Prerequisites
54
54
55
-
This software currently supports Ubuntu Linux 20.04 LTS. The server version is sufficient, see https://ubuntu.com/download/server. *Alternatively, Ubuntu Linux can also be used inside the Windows Subsystem for Linux (WSL2) by following
56
-
https://docs.nvidia.com/cuda/wsl-user-guide/. However, please note that WSL2 with CUDA is a pre-release and unexpected behavior may occur.*
55
+
This software requires PyTorch. *For TensorFlow / Keras, please use the `develop-tf` branch.*
57
56
58
-
When going beyond simple models, model training does not work well without CUDA hardware acceleration. The network loader (“izer”) does not require CUDA, and very simple models can also be trained on systems without CUDA.
57
+
PyTorch operating system and hardware support are constantly evolving. This document does not cover all possible combinations of operating system and hardware, and there is only one officially supported platform.
59
58
60
-
*Recommendation:* Install the latest version of CUDA 11 on Ubuntu 20.04 LTS. See https://developer.nvidia.com/cuda-toolkit-archive.
59
+
#### Platform Recommendation and Full Support
61
60
62
-
*Note: When using multiple GPUs, the software will automatically use all available GPUs and distribute the workload. To prevent this, set the `CUDA_VISIBLE_DEVICES` environment variable. Use the `--gpus` command line argument to set the default GPU.*
61
+
Full support and documentation are provided for the following platform:
62
+
63
+
* CPU: 64-bit amd64/x86_64 “PC” with [Ubuntu Linux 20.04 LTS](https://ubuntu.com/download/server)
64
+
* GPU for hardware acceleration (optional): Nvidia with [CUDA 11](https://developer.nvidia.com/cuda-toolkit-archive)
65
+
*[PyTorch 1.8.1 (LTS)](https://pytorch.org/get-started/locally/) on Python 3.8.11
66
+
67
+
Limited support and advice for using other hardware and software combinations is available as follows.
68
+
69
+
#### Operating System Support
70
+
71
+
##### Linux
72
+
73
+
**The only officially supported platform for model training** is Ubuntu Linux 20.04 LTS on amd64/x86_64, either the desktop or the [server version](https://ubuntu.com/download/server).
74
+
75
+
*Note that hardware acceleration/CUDA is <u>not available</u> in PyTorch for Raspberry Pi 4 and other <u>aarch64/arm64</u> devices, even those running Ubuntu Linux 20.04. See also [Development on Raspberry Pi 4 and 400](docs/RaspberryPi.md) (unsupported).*
76
+
77
+
This document also provides instructions for installing on RedHat Enterprise Linux / CentOS 8 with limited support.
78
+
79
+
##### Windows
80
+
81
+
Ubuntu Linux 20.04 can be used inside the Windows Subsystem for Linux (WSL2) by following
82
+
https://docs.nvidia.com/cuda/wsl-user-guide/.
83
+
*Please note that WSL2 with CUDA is a pre-release, and unexpected behavior may occur, for example unwanted upgrades to a pre-release of the operating system.*
84
+
85
+
##### macOS
86
+
87
+
The software works on macOS, but model training suffers from the lack of hardware acceleration.
88
+
89
+
##### Virtual Machines (Unsupported)
90
+
91
+
This software works inside a virtual machine running Ubuntu Linux 20.04. However, GPU passthrough is typically <u>not available</u> for Linux VMs, so there will be no CUDA hardware acceleration. Certain Nvidia cards support [vGPU software](https://www.nvidia.com/en-us/data-center/graphics-cards-for-virtualization/); see also [vGPUs and CUDA](https://docs.nvidia.com/cuda/vGPU/), but vGPU features may come at substantial additional cost and vGPU software is not covered by this document.
92
+
93
+
##### Docker Containers (Unsupported)
94
+
95
+
This software also works inside Docker containers. However, CUDA support inside containers requires Nvidia Docker ([see blog entry](https://developer.nvidia.com/blog/nvidia-docker-gpu-server-application-deployment-made-easy/)) and is not covered by this document.
96
+
97
+
#### PyTorch and Python
98
+
99
+
The officially supported version of [PyTorch is 1.8.1 (LTS)](https://pytorch.org/get-started/locally/) running on Python 3.8.11. Newer versions will typically work, but are not covered by support, documentation, and installation scripts.
100
+
101
+
#### Hardware Acceleration
102
+
103
+
When going beyond simple models, model training does not work well without CUDA hardware acceleration. The network loader (“izer”) does <u>not</u> require CUDA, and very simple models can also be trained on systems without CUDA.
104
+
105
+
* CUDA requires Nvidia GPUs.
106
+
107
+
* There is a PyTorch pre-release with ROCm acceleration for certain AMD GPUs on Linux ([see blog entry](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/)), but this is not currently covered by the installation instructions in this document, and it is not supported.
108
+
109
+
* There is neither CUDA nor ROCm support on macOS, and therefore no hardware acceleration.
110
+
111
+
* PyTorch does not include CUDA support for aarch64/arm64 systems. *Rebuilding PyTorch from source is not covered by this document.*
112
+
113
+
##### Using Multiple GPUs
114
+
115
+
When using multiple GPUs (graphics cards), the software will automatically use all available GPUs and distribute the workload. To prevent this (for example, when the GPUs are not balanced), set the `CUDA_VISIBLE_DEVICES` environment variable. Use the `--gpus` command line argument to set the default GPU.
63
116
64
117
#### Shared (Multi-User) and Remote Systems
65
118
66
-
On a shared (multi-user) system that has previously been set up, only local installation is needed. CUDA and any `apt-get` or `brew` tasks are not necessary.
119
+
On a shared (multi-user) system that has previously been set up, only local installation is needed. CUDA and any `apt-get` or `brew` tasks are not necessary, with the exception of the CUDA [Environment Setup](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup).
67
120
68
121
The `screen` command (or alternatively, the more powerful `tmux`) can be used inside a remote terminal to disconnect a session from the controlling terminal, so that a long running training session doesn’t abort due to network issues, or local power saving. In addition, screen can log all console output to a text file.
69
122
@@ -78,17 +131,25 @@ Ctrl+A,D to disconnect
78
131
79
132
`man screen` and `man tmux` describe the software in more detail.
80
133
81
-
#### Recommended Software
134
+
#### Additional Software
82
135
83
136
The following software is optional, and can be replaced with other similar software of the user’s choosing.
84
137
85
-
1. Visual Studio Code (Editor, Free), https://code.visualstudio.com, with the “Remote - SSH” plugin
86
-
2. Typora (Markdown Editor, Free during beta), http://typora.io
or GitHub Desktop (Graphical Git Client, Free), https://desktop.github.com
91
-
5. Beyond Compare (Diff and Merge Tool, $60), https://scootersoftware.com
138
+
1. Code Editor
139
+
Visual Studio Code (free), https://code.visualstudio.com or the VSCodium version, https://vscodium.com, with the “Remote - SSH” plugin; *to use Visual Studio Code on Windows as a full development environment (including debug), see https://github.com/MaximIntegratedTechSupport/VSCode-Maxim*
140
+
Sublime Text ($100), https://www.sublimetext.com
141
+
2. Markdown Editor
142
+
Typora (free during beta), http://typora.io
143
+
3. Serial Terminal
144
+
CoolTerm (free), http://freeware.the-meiers.org
145
+
Serial ($30), https://apps.apple.com/us/app/serial/id877615577?mt=12
Tera Term (free), https://osdn.net/projects/ttssh2/releases/
148
+
4. Graphical Git Client
149
+
GitHub Desktop (free), https://desktop.github.com
150
+
Git Fork ($50), https://git-fork.com
151
+
5. Diff and Merge Tool
152
+
Beyond Compare ($60), https://scootersoftware.com
92
153
93
154
### Project Installation
94
155
@@ -215,10 +276,6 @@ Nirvana Distiller is package for neural network compression and quantization. Ne
215
276
216
277
Manifold is a model-agnostic visual debugging tool for machine learning. The [Manifold guide](https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/Guides/Manifold.md) shows how to integrate this optional package into the training software.
217
278
218
-
#### Windows Systems
219
-
220
-
Windows/MS-DOS is not supported for training networks at this time. *This includes the Windows Subsystem for Linux (WSL) since it currently lacks CUDA support.*
221
-
222
279
### Upstream Code
223
280
224
281
Change to the project root and run the following commands. Use your GitHub credentials if prompted.
@@ -870,6 +927,37 @@ The example shows a fractionally-strided convolution with a stride of 2, a pad o
870
927
871
928
## Model Training and Quantization
872
929
930
+
#### Hardware Acceleration
931
+
932
+
If hardware acceleration is not available, skip the following two steps and continue with [Training Script](#Training Script).
933
+
934
+
1. Before the first training session, check that CUDA hardware acceleration is available using `nvidia-smi -q`:
The main training software is `train.py`. It drives the training aspects, including model creation, checkpointing, model save, and status display (see `--help` for the many supported options, and the `scripts/train_*.sh` scripts for example usage).
874
962
875
963
The `ai84net.py` and `ai85net.py` files contain models that fit into AI84’s weight memory. These models rely on the MAX78000/MAX78002 hardware operators that are defined in `ai8x.py`.
@@ -1066,6 +1154,27 @@ The `ai8x.py` file contains customized PyTorch classes (subclasses of `torch.nn.
1066
1154
2. Rounding and clipping that matches the hardware.
1067
1155
3. Support for quantized operation (when using the `-8`command line argument).
1068
1156
1157
+
##### set_device()
1158
+
1159
+
`ai8x.py` defines the `set_device()`functionwhich configures the training system:
1160
+
1161
+
```python
1162
+
def set_device(
1163
+
device,
1164
+
simulate,
1165
+
round_avg,
1166
+
verbose=True,
1167
+
):
1168
+
```
1169
+
1170
+
where *device* is `85` (the MAX78000 device code), *simulate* is `True` when clipping and rounding are set to simulate hardware behavior, and *round_avg* picks one of the two hardware rounding modes for AvgPool.
1171
+
1172
+
##### update_model()
1173
+
1174
+
ai8x.py defines `update_model()`. This functionis called after loading a checkpoint file, and recursively applies output shift, weight scaling, and quantization clamping to the model.
1175
+
1176
+
1177
+
1069
1178
#### List of Predefined Modules
1070
1179
1071
1180
The following modules are predefined:
@@ -1116,7 +1225,9 @@ The following modules are predefined:
1116
1225
1117
1226
Dropout modules such as `torch.nn.Dropout()` and `torch.nn.Dropout2d()` are automatically disabled during inference, and can therefore be used for training without affecting inference.
1118
1227
1119
-
#### view and reshape
1228
+
*Note: Using [batch normalization](#Batch Normalization) in conjunction with dropout can sometimes degrade training results.*
1229
+
1230
+
#### view() and reshape()
1120
1231
1121
1232
There are two supported cases for`view()` or `reshape()`.
1122
1233
@@ -1163,6 +1274,8 @@ After fusing/folding, the network will no longer contain any batchnorm layers. T
1163
1274
* When using [Quantization-Aware Training (QAT)](#Quantization-Aware Training (QAT)), batchnorm layers <u>are automatically folded</u> during training and no further action is needed.
1164
1275
* When using [Post-Training Quantization](#Post-Training Quantization), the `batchnormfuser.py` script (see [BatchNorm Fusing](#BatchNorm-Fusing)) must be called before `quantize.py` to explicitly fuse the batchnorm layers.
1165
1276
1277
+
*Note: Using batch normalization in conjunction with [dropout](#Dropout) can sometimes degrade training results.*
1278
+
1166
1279
### Model Comparison and Feature Attribution
1167
1280
1168
1281
Both TensorBoard and [Manifold](#Manifold) can be used for model comparison and feature attribution.
@@ -1426,13 +1539,13 @@ The loader returns a tuple of two PyTorch Datasets for training and test data.
1426
1539
1427
1540
##### Normalizing Input Data
1428
1541
1429
-
For training, input data is expected to be in the range $[–\frac{128}{128}, +\frac{127}{128}]$. When evaluating quantized weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of $[–128, +127]$. Conversely, the majority of PyTorch datasets are PIL images of range $[0, 1]$. The respective data loaders therefore call the `ai8x.normalize()` function, which expects an input of 0 to 1 and normalizes the data to either of these output ranges.
1542
+
For training, input data is expected to be in the range $[–\frac{128}{128}, +\frac{127}{128}]$. When evaluating quantized weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of $[–128, +127]$. Conversely, the majority of PyTorch datasets are PIL images of range $[0, 1]$. The respective data loaders therefore call the `ai8x.normalize()` function, which expects an input of 0 to 1 and normalizes the data, automatically switching between the two supported data ranges.
1430
1543
1431
1544
When running inference on MAX7800X hardware, it is important to take the native data format into account, and it is desirable to perform as little preprocessing as possible during inference. For example, an image sensor may return “signed” data in the range $[–128, +127]$ for each color. No additional preprocessing or mapping is needed for this sensor since the model was trained with this data range.
1432
1545
1433
1546
In many cases, image data is delivered as fewer than 8 bits per channel (for example, RGB565). In these cases, retraining the model with this limited range (0 to 31 for 5-bit color and 0 to 63 for 6-bit color, respectively) can potentially eliminate the need for inference-time preprocessing.
1434
1547
1435
-
On the other hand, a different sensor may produce unsigned data values in the full 8-bit range $[0, 255]$. This range must be mapped to $[–128, +127]$ to match hardware and the trained model. The mapping can be performed during inference by subtracting 128 from each input byte, but this requires extra processing time during inference.
1548
+
On the other hand, a different sensor may produce unsigned data values in the full 8-bit range $[0, 255]$. This range must be mapped to $[–128, +127]$ to match hardware and the trained model. The mapping can be performed during inference by subtracting 128 from each input byte, but this requires extra (pre-)processing time during inference.
1436
1549
1437
1550
##### `datasets` Data Structure
1438
1551
@@ -1756,9 +1869,7 @@ The `bias` configuration is only used for test data. *To use bias with trained n
1756
1869
1757
1870
##### `dataset` (Mandatory)
1758
1871
1759
-
`dataset` configures the data setfor the network. This determines the input data size and dimensions as well as the number of input channels.
1760
-
1761
-
Data sets are for example `mnist`, `fashionmnist`, and `cifar-10`.
1872
+
`dataset` configures the data setfor the network. Data sets are for example `mnist`, `fashionmnist`, and `cifar-10`. This key is descriptive only, it does not configure input or output dimensions or channel count.
0 commit comments