Skip to content

Commit 6f66454

Browse files
author
Robert Muchsel
authored
README: Additional pyenv detail; add sample output for quantize/evaluate (#153)
* README: Additional pyenv detail; add sample output for quantize/evaluate
1 parent 2f2f589 commit 6f66454

File tree

3 files changed

+64
-9
lines changed

3 files changed

+64
-9
lines changed

README.md

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_July 19, 2021_
3+
_July 20, 2021_
44

55
The Maxim Integrated AI project is comprised of five repositories:
66

@@ -52,7 +52,7 @@ where “....” is the project root, for example `~/Documents/Source/AI`.
5252

5353
### Prerequisites
5454

55-
This software currently supports Ubuntu Linux 18.04 LTS and 20.04 LTS. The server version is sufficient, see https://ubuntu.com/download/server. *Alternatively, Ubuntu Linux can also be used inside the Windows Subsystem for Linux (WSL2) by following
55+
This software currently supports Ubuntu Linux 20.04 LTS. The server version is sufficient, see https://ubuntu.com/download/server. *Alternatively, Ubuntu Linux can also be used inside the Windows Subsystem for Linux (WSL2) by following
5656
https://docs.nvidia.com/cuda/wsl-user-guide/. However, please note that WSL2 with CUDA is a pre-release and unexpected behavior may occur.*
5757

5858
When going beyond simple models, model training does not work well without CUDA hardware acceleration. The network loader (“izer”) does not require CUDA, and very simple models can also be trained on systems without CUDA.
@@ -134,7 +134,7 @@ $ sudo dnf install openssl-devel zlib-devel \
134134
libsndfile libsndfile-devel portaudio-devel
135135
```
136136

137-
#### Python 3.8
137+
#### Python 3.8 / pyenv
138138

139139
*The software in this project uses Python 3.8.11 or a later 3.8.x version.*
140140

@@ -242,10 +242,24 @@ If you want to use the “develop” branch, switch to “develop” using this
242242
$ git checkout develop # optional
243243
```
244244

245-
Then continue with the following:
245+
Next, set the local directory to use Python 3.8.11.
246246

247247
```shell
248248
$ pyenv local 3.8.11
249+
```
250+
251+
And verify that the correct Python version is used:
252+
253+
```shell
254+
$ python3 --version
255+
Python 3.8.11
256+
```
257+
258+
If this does <u>*not*</u> return the correct version, please install and initialize [pyenv](#Python 3.8 / pyenv).
259+
260+
Then continue with the following:
261+
262+
```shell
249263
$ python3 -m venv .
250264
$ source bin/activate
251265
(ai8x-training) $ pip3 install -U pip wheel setuptools
@@ -458,6 +472,8 @@ Any given processor has visibility of:
458472

459473
#### Weight Memory
460474

475+
*Note: Depending on context, weights may also be referred to as “kernels” or “masks”. Additionally, weights are also part of a network’s “parameters”.*
476+
461477
For each of the four 16-processor quadrants, weight memory and processors can be visualized as follows. Assuming one input channel processed by processor 0, and 8 output channels, the 8 shaded kernels will be used:
462478

463479
![Weight Memory Map](docs/KernelMemory.png)
@@ -1031,7 +1047,7 @@ The ONNX model export (via `--summary onnx` or `--summary onnx_simplified`) is p
10311047
```
10321048
$ nvidia-smi
10331049
+-----------------------------------------------------------------------------+
1034-
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
1050+
| NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 |
10351051
|-------------------------------+----------------------+----------------------+
10361052
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
10371053
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
@@ -1157,11 +1173,13 @@ Both TensorBoard and [Manifold](#Manifold) can be used for model comparison and
11571173
11581174
For classification models, TensorBoard supports the optional `--param-hist` and `--embedding` command line arguments. `--embedding` randomly selects up to 100 data points from the last batch of each verification epoch. These can be viewed in the “projector” tab in TensorBoard.
11591175
1176+
`--pr-curves` adds support for displaying precision-recall curves.
1177+
11601178
To start the TensorBoard server, use a second terminal window:
11611179
11621180
```shell
11631181
(ai8x-training) $ tensorboard --logdir='./logs'
1164-
TensorBoard 2.2.2 at http://127.0.0.1:6006/ (Press CTRL+C to quit)
1182+
TensorBoard 2.4.1 at http://127.0.0.1:6006/ (Press CTRL+C to quit)
11651183
```
11661184
11671185
On a shared system, add the `--port 0` command line option.
@@ -1170,9 +1188,9 @@ The training progress can be observed by starting TensorBoard and pointing a web
11701188
11711189
##### Examples
11721190
1173-
TensorBoard produces graphs and displays metrics that may help optimize the training process, and can compare the performance of multiple training sessions and their settings. Additionally, TensorBoard can show a graphical representation of the model and its parameters. For more information, please see the [TensorBoard web site](https://www.tensorflow.org/tensorboard/).
1191+
TensorBoard produces graphs and displays metrics that may help optimize the training process, and can compare the performance of multiple training sessions and their settings. Additionally, TensorBoard can show a graphical representation of the model and its parameters, and help discover labeling errors. For more information, please see the [TensorBoard web site](https://www.tensorflow.org/tensorboard/).
11741192
1175-
<img src="docs/lr.png" alt="learning rate" style="zoom: 50%;" /><img src="docs/top1.png" alt="top-1" style="zoom:50%;" /><img src="docs/objectiveloss.png" alt="objective loss" style="zoom:42%;" /><img src="docs/histogram.png" alt="histogram" style="zoom:50%;" /><img src="docs/model.png" alt="model" style="zoom:50%;" />
1193+
<img src="docs/lr.png" alt="learning rate" style="zoom: 50%;" /><img src="docs/top1.png" alt="top-1" style="zoom:50%;" /><img src="docs/objectiveloss.png" alt="objective loss" style="zoom:42%;" /><img src="docs/histogram.png" alt="histogram" style="zoom:50%;" /><img src="docs/model.png" alt="model" style="zoom:50%;" /><img src="docs/projector.png" alt="projector" style="zoom:50%;" />
11761194
11771195
##### Remote Access to TensorBoard
11781196
@@ -1290,14 +1308,51 @@ Example for MNIST:
12901308
12911309
```shell
12921310
(ai8x-synthesis) $ scripts/quantize_mnist.sh
1311+
Configuring device: MAX78000
1312+
Converting checkpoint file trained/ai85-mnist-qat8.pth.tar to trained/ai85-mnist-qat8-q.pth.tar
1313+
1314+
Model keys (state_dict):
1315+
conv1.output_shift, conv1.weight_bits, conv1.bias_bits, conv1.quantize_activation, conv1.adjust_output_shift, conv1.op.weight, conv2.output_shift, conv2.weight_bits, conv2.bias_bits, conv2.quantize_activation, conv2.adjust_output_shift, conv2.op.weight, conv3.output_shift, conv3.weight_bits, conv3.bias_bits, conv3.quantize_activation, conv3.adjust_output_shift, conv3.op.weight, conv4.output_shift, conv4.weight_bits, conv4.bias_bits, conv4.quantize_activation, conv4.adjust_output_shift, conv4.op.weight, fc.output_shift, fc.weight_bits, fc.bias_bits, fc.quantize_activation, fc.adjust_output_shift, fc.op.weight, fc.op.bias, conv1.shift_quantile, conv2.shift_quantile, conv3.shift_quantile, conv4.shift_quantile, fc.shift_quantile
1316+
conv1.op.weight avg_max: 0.34562021 max: 0.51949096 mean: 0.02374955 factor: [128.] bits: 8
1317+
conv2.op.weight avg_max: 0.2302317 max: 0.269847 mean: -0.021919029 factor: [256.] bits: 8
1318+
conv3.op.weight avg_max: 0.42106587 max: 0.49686784 mean: -0.021314206 factor: [256.] bits: 8
1319+
conv4.op.weight avg_max: 0.49237916 max: 0.5019533 mean: 0.010923488 factor: [128.] bits: 8
1320+
fc.op.weight avg_max: 0.9884483 max: 1.0039074 mean: -0.0033990005 factor: [64.] bits: 8
1321+
fc.op.bias avg_max: 0.00029080958 max: 0.26957372 mean: -0.00029080958 factor: [64.] bits: 8
12931322
```
12941323
1295-
To evaluate the quantized network for MAX78000 (run from the training project):
1324+
To evaluate the quantized network for MAX78000 (**run from the training project**):
12961325
12971326
```shell
12981327
(ai8x-training) $ scripts/evaluate_mnist.sh
1328+
...
1329+
--- test ---------------------
1330+
10000 samples (256 per mini-batch)
1331+
Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
1332+
1333+
Test: [ 10/ 40] Loss 0.007288 Top1 99.531250 Top5 100.000000
1334+
Test: [ 20/ 40] Loss 0.010161 Top1 99.414062 Top5 100.000000
1335+
Test: [ 30/ 40] Loss 0.007681 Top1 99.492188 Top5 100.000000
1336+
Test: [ 40/ 40] Loss 0.009589 Top1 99.440000 Top5 100.000000
1337+
==> Top1: 99.440 Top5: 100.000 Loss: 0.010
1338+
1339+
==> Confusion:
1340+
[[ 978 0 1 0 0 0 0 0 1 0]
1341+
[ 0 1132 1 1 0 0 1 0 0 0]
1342+
[ 0 0 1028 0 0 0 0 4 0 0]
1343+
[ 0 1 0 1007 0 1 0 1 0 0]
1344+
[ 0 0 1 0 977 0 1 0 1 2]
1345+
[ 1 0 0 3 0 884 3 0 0 1]
1346+
[ 3 0 1 0 1 3 949 0 1 0]
1347+
[ 0 2 1 0 0 0 0 1024 0 1]
1348+
[ 0 0 2 1 1 1 0 0 968 1]
1349+
[ 0 0 0 0 7 1 0 4 0 997]]
1350+
1351+
Log file for this run: 2021.07.20-123302/2021.07.20-123302.log
12991352
```
13001353
1354+
*Note that the “Loss” output is not always directly comparable to the unquantized network, depending on the loss function itself.*
1355+
13011356
#### Alternative Quantization Approaches
13021357
13031358
If quantization-aware training is not desired, post-training quantization can be improved using more sophisticated methods. For example, see

README.pdf

165 KB
Binary file not shown.

docs/projector.png

118 KB
Loading

0 commit comments

Comments
 (0)