Skip to content

Commit 02337ec

Browse files
author
Robert Muchsel
authored
Update checker; documentation improvements (#154)
1 parent 6f66454 commit 02337ec

File tree

12 files changed

+298
-30
lines changed

12 files changed

+298
-30
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,4 @@
2020
**/__pycache__/
2121
/super-linter.log
2222
/super-linter.report/
23+
/.version-check

README.md

Lines changed: 136 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_July 20, 2021_
3+
_August 9, 2021_
44

55
The Maxim Integrated AI project is comprised of five repositories:
66

@@ -52,18 +52,71 @@ where “....” is the project root, for example `~/Documents/Source/AI`.
5252

5353
### Prerequisites
5454

55-
This software currently supports Ubuntu Linux 20.04 LTS. The server version is sufficient, see https://ubuntu.com/download/server. *Alternatively, Ubuntu Linux can also be used inside the Windows Subsystem for Linux (WSL2) by following
56-
https://docs.nvidia.com/cuda/wsl-user-guide/. However, please note that WSL2 with CUDA is a pre-release and unexpected behavior may occur.*
55+
This software requires PyTorch. *For TensorFlow / Keras, please use the `develop-tf` branch.*
5756

58-
When going beyond simple models, model training does not work well without CUDA hardware acceleration. The network loader (“izer”) does not require CUDA, and very simple models can also be trained on systems without CUDA.
57+
PyTorch operating system and hardware support are constantly evolving. This document does not cover all possible combinations of operating system and hardware, and there is only one officially supported platform.
5958

60-
*Recommendation:* Install the latest version of CUDA 11 on Ubuntu 20.04 LTS. See https://developer.nvidia.com/cuda-toolkit-archive.
59+
#### Platform Recommendation and Full Support
6160

62-
*Note: When using multiple GPUs, the software will automatically use all available GPUs and distribute the workload. To prevent this, set the `CUDA_VISIBLE_DEVICES` environment variable. Use the `--gpus` command line argument to set the default GPU.*
61+
Full support and documentation are provided for the following platform:
62+
63+
* CPU: 64-bit amd64/x86_64 “PC” with [Ubuntu Linux 20.04 LTS](https://ubuntu.com/download/server)
64+
* GPU for hardware acceleration (optional): Nvidia with [CUDA 11](https://developer.nvidia.com/cuda-toolkit-archive)
65+
* [PyTorch 1.8.1 (LTS)](https://pytorch.org/get-started/locally/) on Python 3.8.11
66+
67+
Limited support and advice for using other hardware and software combinations is available as follows.
68+
69+
#### Operating System Support
70+
71+
##### Linux
72+
73+
**The only officially supported platform for model training** is Ubuntu Linux 20.04 LTS on amd64/x86_64, either the desktop or the [server version](https://ubuntu.com/download/server).
74+
75+
*Note that hardware acceleration/CUDA is <u>not available</u> in PyTorch for Raspberry Pi 4 and other <u>aarch64/arm64</u> devices, even those running Ubuntu Linux 20.04. See also [Development on Raspberry Pi 4 and 400](docs/RaspberryPi.md) (unsupported).*
76+
77+
This document also provides instructions for installing on RedHat Enterprise Linux / CentOS 8 with limited support.
78+
79+
##### Windows
80+
81+
Ubuntu Linux 20.04 can be used inside the Windows Subsystem for Linux (WSL2) by following
82+
https://docs.nvidia.com/cuda/wsl-user-guide/.
83+
*Please note that WSL2 with CUDA is a pre-release, and unexpected behavior may occur, for example unwanted upgrades to a pre-release of the operating system.*
84+
85+
##### macOS
86+
87+
The software works on macOS, but model training suffers from the lack of hardware acceleration.
88+
89+
##### Virtual Machines (Unsupported)
90+
91+
This software works inside a virtual machine running Ubuntu Linux 20.04. However, GPU passthrough is typically <u>not available</u> for Linux VMs, so there will be no CUDA hardware acceleration. Certain Nvidia cards support [vGPU software](https://www.nvidia.com/en-us/data-center/graphics-cards-for-virtualization/); see also [vGPUs and CUDA](https://docs.nvidia.com/cuda/vGPU/), but vGPU features may come at substantial additional cost and vGPU software is not covered by this document.
92+
93+
##### Docker Containers (Unsupported)
94+
95+
This software also works inside Docker containers. However, CUDA support inside containers requires Nvidia Docker ([see blog entry](https://developer.nvidia.com/blog/nvidia-docker-gpu-server-application-deployment-made-easy/)) and is not covered by this document.
96+
97+
#### PyTorch and Python
98+
99+
The officially supported version of [PyTorch is 1.8.1 (LTS)](https://pytorch.org/get-started/locally/) running on Python 3.8.11. Newer versions will typically work, but are not covered by support, documentation, and installation scripts.
100+
101+
#### Hardware Acceleration
102+
103+
When going beyond simple models, model training does not work well without CUDA hardware acceleration. The network loader (“izer”) does <u>not</u> require CUDA, and very simple models can also be trained on systems without CUDA.
104+
105+
* CUDA requires Nvidia GPUs.
106+
107+
* There is a PyTorch pre-release with ROCm acceleration for certain AMD GPUs on Linux ([see blog entry](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/)), but this is not currently covered by the installation instructions in this document, and it is not supported.
108+
109+
* There is neither CUDA nor ROCm support on macOS, and therefore no hardware acceleration.
110+
111+
* PyTorch does not include CUDA support for aarch64/arm64 systems. *Rebuilding PyTorch from source is not covered by this document.*
112+
113+
##### Using Multiple GPUs
114+
115+
When using multiple GPUs (graphics cards), the software will automatically use all available GPUs and distribute the workload. To prevent this (for example, when the GPUs are not balanced), set the `CUDA_VISIBLE_DEVICES` environment variable. Use the `--gpus` command line argument to set the default GPU.
63116

64117
#### Shared (Multi-User) and Remote Systems
65118

66-
On a shared (multi-user) system that has previously been set up, only local installation is needed. CUDA and any `apt-get` or `brew` tasks are not necessary.
119+
On a shared (multi-user) system that has previously been set up, only local installation is needed. CUDA and any `apt-get` or `brew` tasks are not necessary, with the exception of the CUDA [Environment Setup](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup).
67120

68121
The `screen` command (or alternatively, the more powerful `tmux`) can be used inside a remote terminal to disconnect a session from the controlling terminal, so that a long running training session doesn’t abort due to network issues, or local power saving. In addition, screen can log all console output to a text file.
69122

@@ -78,17 +131,25 @@ Ctrl+A,D to disconnect
78131

79132
`man screen` and `man tmux` describe the software in more detail.
80133

81-
#### Recommended Software
134+
#### Additional Software
82135

83136
The following software is optional, and can be replaced with other similar software of the user’s choosing.
84137

85-
1. Visual Studio Code (Editor, Free), https://code.visualstudio.com, with the “Remote - SSH” plugin
86-
2. Typora (Markdown Editor, Free during beta), http://typora.io
87-
3. CoolTerm (Serial Terminal, Free), http://freeware.the-meiers.org
88-
or Serial ($30), https://apps.apple.com/us/app/serial/id877615577?mt=12
89-
4. Git Fork (Graphical Git Client, $50), https://git-fork.com
90-
or GitHub Desktop (Graphical Git Client, Free), https://desktop.github.com
91-
5. Beyond Compare (Diff and Merge Tool, $60), https://scootersoftware.com
138+
1. Code Editor
139+
Visual Studio Code (free), https://code.visualstudio.com or the VSCodium version, https://vscodium.com, with the “Remote - SSH” plugin; *to use Visual Studio Code on Windows as a full development environment (including debug), see https://github.com/MaximIntegratedTechSupport/VSCode-Maxim*
140+
Sublime Text ($100), https://www.sublimetext.com
141+
2. Markdown Editor
142+
Typora (free during beta), http://typora.io
143+
3. Serial Terminal
144+
CoolTerm (free), http://freeware.the-meiers.org
145+
Serial ($30), https://apps.apple.com/us/app/serial/id877615577?mt=12
146+
Putty (free), https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
147+
Tera Term (free), https://osdn.net/projects/ttssh2/releases/
148+
4. Graphical Git Client
149+
GitHub Desktop (free), https://desktop.github.com
150+
Git Fork ($50), https://git-fork.com
151+
5. Diff and Merge Tool
152+
Beyond Compare ($60), https://scootersoftware.com
92153

93154
### Project Installation
94155

@@ -215,10 +276,6 @@ Nirvana Distiller is package for neural network compression and quantization. Ne
215276

216277
Manifold is a model-agnostic visual debugging tool for machine learning. The [Manifold guide](https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/Guides/Manifold.md) shows how to integrate this optional package into the training software.
217278

218-
#### Windows Systems
219-
220-
Windows/MS-DOS is not supported for training networks at this time. *This includes the Windows Subsystem for Linux (WSL) since it currently lacks CUDA support.*
221-
222279
### Upstream Code
223280

224281
Change to the project root and run the following commands. Use your GitHub credentials if prompted.
@@ -870,6 +927,37 @@ The example shows a fractionally-strided convolution with a stride of 2, a pad o
870927

871928
## Model Training and Quantization
872929

930+
#### Hardware Acceleration
931+
932+
If hardware acceleration is not available, skip the following two steps and continue with [Training Script](#Training Script).
933+
934+
1. Before the first training session, check that CUDA hardware acceleration is available using `nvidia-smi -q`:
935+
936+
```shell
937+
(ai8x-training) $ nvidia-smi -q
938+
...
939+
Driver Version : 470.57.02
940+
CUDA Version : 11.4
941+
942+
Attached GPUs : 1
943+
GPU 00000000:01:00.0
944+
Product Name : NVIDIA TITAN RTX
945+
Product Brand : Titan
946+
...
947+
```
948+
949+
2. Verify that PyTorch recognizes CUDA:
950+
951+
```shell
952+
(ai8x-training) $ ./check_cuda.py
953+
System: linux
954+
Python version: 3.8.11 (default, Jul 14 2021, 12:46:05) [GCC 9.3.0]
955+
PyTorch version: 1.8.1+cu111
956+
CUDA acceleration: available in PyTorch
957+
```
958+
959+
#### Training Script
960+
873961
The main training software is `train.py`. It drives the training aspects, including model creation, checkpointing, model save, and status display (see `--help` for the many supported options, and the `scripts/train_*.sh` scripts for example usage).
874962

875963
The `ai84net.py` and `ai85net.py` files contain models that fit into AI84’s weight memory. These models rely on the MAX78000/MAX78002 hardware operators that are defined in `ai8x.py`.
@@ -1066,6 +1154,27 @@ The `ai8x.py` file contains customized PyTorch classes (subclasses of `torch.nn.
10661154
2. Rounding and clipping that matches the hardware.
10671155
3. Support for quantized operation (when using the `-8` command line argument).
10681156
1157+
##### set_device()
1158+
1159+
`ai8x.py` defines the `set_device()` function which configures the training system:
1160+
1161+
```python
1162+
def set_device(
1163+
device,
1164+
simulate,
1165+
round_avg,
1166+
verbose=True,
1167+
):
1168+
```
1169+
1170+
where *device* is `85` (the MAX78000 device code), *simulate* is `True` when clipping and rounding are set to simulate hardware behavior, and *round_avg* picks one of the two hardware rounding modes for AvgPool.
1171+
1172+
##### update_model()
1173+
1174+
ai8x.py defines `update_model()`. This function is called after loading a checkpoint file, and recursively applies output shift, weight scaling, and quantization clamping to the model.
1175+
1176+
1177+
10691178
#### List of Predefined Modules
10701179
10711180
The following modules are predefined:
@@ -1116,7 +1225,9 @@ The following modules are predefined:
11161225
11171226
Dropout modules such as `torch.nn.Dropout()` and `torch.nn.Dropout2d()` are automatically disabled during inference, and can therefore be used for training without affecting inference.
11181227
1119-
#### view and reshape
1228+
*Note: Using [batch normalization](#Batch Normalization) in conjunction with dropout can sometimes degrade training results.*
1229+
1230+
#### view() and reshape()
11201231
11211232
There are two supported cases for `view()` or `reshape()`.
11221233
@@ -1163,6 +1274,8 @@ After fusing/folding, the network will no longer contain any batchnorm layers. T
11631274
* When using [Quantization-Aware Training (QAT)](#Quantization-Aware Training (QAT)), batchnorm layers <u>are automatically folded</u> during training and no further action is needed.
11641275
* When using [Post-Training Quantization](#Post-Training Quantization), the `batchnormfuser.py` script (see [BatchNorm Fusing](#BatchNorm-Fusing)) must be called before `quantize.py` to explicitly fuse the batchnorm layers.
11651276
1277+
*Note: Using batch normalization in conjunction with [dropout](#Dropout) can sometimes degrade training results.*
1278+
11661279
### Model Comparison and Feature Attribution
11671280
11681281
Both TensorBoard and [Manifold](#Manifold) can be used for model comparison and feature attribution.
@@ -1426,13 +1539,13 @@ The loader returns a tuple of two PyTorch Datasets for training and test data.
14261539
14271540
##### Normalizing Input Data
14281541
1429-
For training, input data is expected to be in the range $[–\frac{128}{128}, +\frac{127}{128}]$. When evaluating quantized weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of $[–128, +127]$. Conversely, the majority of PyTorch datasets are PIL images of range $[0, 1]$. The respective data loaders therefore call the `ai8x.normalize()` function, which expects an input of 0 to 1 and normalizes the data to either of these output ranges.
1542+
For training, input data is expected to be in the range $[–\frac{128}{128}, +\frac{127}{128}]$. When evaluating quantized weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of $[–128, +127]$. Conversely, the majority of PyTorch datasets are PIL images of range $[0, 1]$​​. The respective data loaders therefore call the `ai8x.normalize()` function, which expects an input of 0 to 1 and normalizes the data, automatically switching between the two supported data ranges.
14301543
14311544
When running inference on MAX7800X hardware, it is important to take the native data format into account, and it is desirable to perform as little preprocessing as possible during inference. For example, an image sensor may return “signed” data in the range $[–128, +127]$ for each color. No additional preprocessing or mapping is needed for this sensor since the model was trained with this data range.
14321545
14331546
In many cases, image data is delivered as fewer than 8 bits per channel (for example, RGB565). In these cases, retraining the model with this limited range (0 to 31 for 5-bit color and 0 to 63 for 6-bit color, respectively) can potentially eliminate the need for inference-time preprocessing.
14341547
1435-
On the other hand, a different sensor may produce unsigned data values in the full 8-bit range $[0, 255]$. This range must be mapped to $[–128, +127]$ to match hardware and the trained model. The mapping can be performed during inference by subtracting 128 from each input byte, but this requires extra processing time during inference.
1548+
On the other hand, a different sensor may produce unsigned data values in the full 8-bit range $[0, 255]$. This range must be mapped to $[–128, +127]$ to match hardware and the trained model. The mapping can be performed during inference by subtracting 128 from each input byte, but this requires extra (pre-)processing time during inference.
14361549
14371550
##### `datasets` Data Structure
14381551
@@ -1756,9 +1869,7 @@ The `bias` configuration is only used for test data. *To use bias with trained n
17561869
17571870
##### `dataset` (Mandatory)
17581871
1759-
`dataset` configures the data set for the network. This determines the input data size and dimensions as well as the number of input channels.
1760-
1761-
Data sets are for example `mnist`, `fashionmnist`, and `cifar-10`.
1872+
`dataset` configures the data set for the network. Data sets are for example `mnist`, `fashionmnist`, and `cifar-10`. This key is descriptive only, it does not configure input or output dimensions or channel count.
17621873
17631874
##### `output_map` (Optional)
17641875

README.pdf

41.2 KB
Binary file not shown.

assets/embedded-ai85/templateMakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@ ifeq "$(TARGET)" ""
4242
TARGET=MAX78000
4343
endif
4444

45+
MAKE=make
46+
4547
# Create Target name variables
4648
TARGET_UC:=$(shell echo $(TARGET) | tr a-z A-Z)
4749
TARGET_LC:=$(shell echo $(TARGET) | tr A-Z a-z)

assets/embedded-riscv-ai85/templateMakefile.ARM

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ ifeq "$(TARGET)" ""
4141
TARGET=MAX78000
4242
endif
4343

44+
MAKE=make
45+
4446
# Create Target name variables
4547
TARGET_UC:=$(shell echo $(TARGET) | tr a-z A-Z)
4648
TARGET_LC:=$(shell echo $(TARGET) | tr A-Z a-z)

assets/embedded-riscv-ai85/templateMakefile.RISCV

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ ifeq "$(TARGET)" ""
4141
TARGET=MAX78000
4242
endif
4343

44+
MAKE=make
45+
4446
# Create Target name variables
4547
TARGET_UC:=$(shell echo $(TARGET) | tr a-z A-Z)
4648
TARGET_LC:=$(shell echo $(TARGET) | tr A-Z a-z)

izer/assets.py

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,29 @@ def from_template(
5050
else:
5151
elf_file = f'{tc.dev.partnum.lower()}.elf'
5252

53-
for _, _, files in sorted(os.walk(os.path.join(base, source))):
53+
basepath = os.path.join(base, source)
54+
for folderpath, _, files in sorted(os.walk(basepath)):
55+
folder = os.path.relpath(folderpath, basepath)
56+
if folder != '.':
57+
test_path = os.path.join(test_name, folder)
58+
os.makedirs(os.path.join(target, test_path), exist_ok=True)
59+
else:
60+
test_path = test_name
61+
5462
for name in sorted(files):
63+
if folder != '.':
64+
source_path = os.path.join(folder, name)
65+
else:
66+
source_path = name
5567
if name.startswith(template):
5668
dst = os.path.join(
5769
target,
58-
test_name,
70+
test_path,
5971
name[len(template):].replace('##__PROJ_NAME__##', test_name),
6072
)
61-
with open(os.path.join(base, source, name)) as infile, open(dst, 'w+') as outfile:
73+
with open(
74+
os.path.join(base, source, source_path)
75+
) as infile, open(dst, 'w+') as outfile:
6276
for line in infile:
6377
outfile.write(
6478
line.replace('##__PROJ_NAME__##', test_name).
@@ -67,4 +81,5 @@ def from_template(
6781
replace('##__FILE_INSERT__##', insert)
6882
)
6983
else:
70-
shutil.copy(os.path.join(base, source, name), os.path.join(target, test_name))
84+
shutil.copy(os.path.join(base, source, source_path),
85+
os.path.join(target, test_path))

izer/backend/max7800x.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2808,6 +2808,8 @@ def run_eltwise(
28082808
test_name, board_name, '')
28092809
assets.from_template('assets', 'eclipse', base_directory,
28102810
test_name, board_name, '')
2811+
assets.from_template('assets', 'vscode', base_directory,
2812+
test_name, board_name, '')
28112813
assets.from_template('assets', 'device-all', base_directory,
28122814
test_name, board_name, insert)
28132815
assets.from_template('assets', 'device-ai' + str(device), base_directory,

izer/commandline.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,12 @@ def get_parser() -> argparse.Namespace:
356356
group.add_argument('--max-verify-length', '--max-checklines',
357357
type=int, metavar='N', default=None, dest='max_count',
358358
help="output only N output check lines (default: all)")
359+
group.add_argument('--no-version-check', action='store_true', default=False,
360+
help='do not check GitHub for newer versions of the repository')
361+
group.add_argument('--version-check-interval', type=int, metavar='HOURS', default=24,
362+
help='version check update interval (hours), default = 24')
363+
group.add_argument('--upstream', metavar='REPO', default="MaximIntegratedAI/ai8x-synthesis",
364+
help='GitHub repository name for update checking')
359365

360366
args = parser.parse_args()
361367

0 commit comments

Comments
 (0)