Skip to content

Commit 754b438

Browse files
author
Robert Muchsel
authored
Upgrade to PyTorch 1.8.1; fix streaming buffer overwrite detection (#120)
* Upgrade to PyTorch 1.8.1 * README updates * Make 'arch' / 'extras' optional in checkpoint file * Fix streaming buffer overlap detection * Handle checkpoint files with all-zero weights and print warning
1 parent dd90fd8 commit 754b438

File tree

9 files changed

+66
-37
lines changed

9 files changed

+66
-37
lines changed

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.8.6
1+
3.8.9

README.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_March 31, 2021_
3+
_April 8, 2021_
44

55
The Maxim Integrated AI project is comprised of four repositories:
66

@@ -90,9 +90,9 @@ The following software is optional, and can be replaced with other similar softw
9090

9191
### Project Installation
9292

93-
*The software in this project uses Python 3.8.6 or a later 3.8.x version.*
93+
*The software in this project uses Python 3.8.9 or a later 3.8.x version.*
9494

95-
It is not necessary to install Python 3.8.6 system-wide, or to rely on the system-provided Python. To manage Python versions, use `pyenv` (https://github.com/pyenv/pyenv).
95+
It is not necessary to install Python 3.8.9 system-wide, or to rely on the system-provided Python. To manage Python versions, use `pyenv` (https://github.com/pyenv/pyenv).
9696

9797
On macOS (no CUDA support available):
9898

@@ -107,7 +107,7 @@ $ sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
107107
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
108108
libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev \
109109
libsndfile-dev portaudio19-dev
110-
$ curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
110+
$ curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash # NOTE: Verify contents of the script before running it!!
111111
```
112112

113113
Then, add to either `~/.bash_profile`, `~/.bashrc`, or `~/.profile` (as shown by the terminal output of the previous step):
@@ -119,7 +119,7 @@ eval "$(pyenv virtualenv-init -)"
119119

120120
If you use zsh as the shell (default on macOS), add these same commands to `~/.zprofile` or `~/.zshrc` in addition to adding them to the bash startup scripts.
121121

122-
Next, close the Terminal, open a new Terminal and install Python 3.8.6.
122+
Next, close the Terminal, open a new Terminal and install Python 3.8.9.
123123

124124
On macOS:
125125

@@ -131,13 +131,13 @@ $ env \
131131
PKG_CONFIG_PATH="$(brew --prefix tcl-tk)/lib/pkgconfig" \
132132
CFLAGS="-I$(brew --prefix tcl-tk)/include" \
133133
PYTHON_CONFIGURE_OPTS="--with-tcltk-includes='-I$(brew --prefix tcl-tk)/include' --with-tcltk-libs='-L$(brew --prefix tcl-tk)/lib -ltcl8.6 -ltk8.6'" \
134-
pyenv install 3.8.6
134+
pyenv install 3.8.9
135135
```
136136

137137
On Linux:
138138

139139
```shell
140-
$ pyenv install 3.8.6
140+
$ pyenv install 3.8.9
141141
```
142142

143143
#### git Environment
@@ -229,7 +229,7 @@ Then continue with the following:
229229

230230
```shell
231231
$ git submodule update --init
232-
$ pyenv local 3.8.6
232+
$ pyenv local 3.8.9
233233
$ python3 -m venv .
234234
$ source bin/activate
235235
(ai8x-training) $ pip3 install -U pip wheel setuptools
@@ -240,7 +240,7 @@ The next step differs depending on whether the system uses Linux with CUDA 11.x,
240240
For CUDA 11.x on Linux:
241241

242242
```shell
243-
(ai8x-training) $ pip3 install -r requirements-cu111.txt
243+
(ai8x-training) $ pip3 install -r requirements-cu11.txt
244244
```
245245

246246
For all other systems, including CUDA 10.2 on Linux:
@@ -275,7 +275,7 @@ For minor updates, pull the latest code and install the updated wheels:
275275
(ai8x-training) $ git pull
276276
(ai8x-training) $ git submodule update --init
277277
(ai8x-training) $ pip3 install -U pip setuptools
278-
(ai8x-training) $ pip3 install -U -r requirements.txt # or requirements-cu111.txt with CUDA 11.x
278+
(ai8x-training) $ pip3 install -U -r requirements.txt # or requirements-cu11.txt with CUDA 11.x
279279
```
280280

281281
Updating Python frequently requires updating `pyenv` first. Should `pyenv install x.y.z`
@@ -307,7 +307,7 @@ Then continue:
307307

308308
```shell
309309
$ git submodule update --init
310-
$ pyenv local 3.8.6
310+
$ pyenv local 3.8.9
311311
$ python3 -m venv .
312312
$ source bin/activate
313313
(ai8x-synthesis) $ pip3 install -U pip setuptools
@@ -646,9 +646,10 @@ Because of the fact that a processor has its own dedicated weight memory, this w
646646

647647
For each layer, a set of active processors must be specified. The number input channels for the layer must be equal to or a multiple of the active processors, and the input data for that layer must be located in data memory instances accessible to the selected processors.
648648

649-
It is possible to specify a relative offset into the data memory instance that applies to all processors. _Example:_ Assuming HWC data format, specifying the offset as 8192 bytes will cause processors 0-3 to read their input from the second half of data memory 0, processors 4-7 will read from the second half of data memory instance 1, etc.
649+
It is possible to specify a relative offset into the data memory instance that applies to all processors.
650+
_Example:_ Assuming HWC data format, specifying the offset as 16384 bytes (or 0x4000) will cause processors 0-3 to read their input from the second half of data memory 0, processors 4-7 will read from the second half of data memory instance 1, etc.
650651

651-
For most simple networks with limited data sizes, it is easiest to ping-pong between the first and second halves of the data memories - specify the data offset as 0 for the first layer, 0x2000 for the second layer, 0 for the third layer, etc. This strategy avoids overlapping inputs and outputs when a given processor is used in two consecutive layers.
652+
For most simple networks with limited data sizes, it is easiest to ping-pong between the first and second halves of the data memories specify the data offset as 0 for the first layer, 0x4000 for the second layer, 0 for the third layer, etc. This strategy avoids overlapping inputs and outputs when a given processor is used in two consecutive layers.
652653

653654
Even though it is supported by the accelerator, the Network Generator will not be able to check for inadvertent overwriting of unprocessed input data by newly generated output data when overlapping data or streaming data. Use the `--overlap-data` command line switch to disable these checks, and to allow overlapped data.
654655

@@ -823,11 +824,15 @@ The following table describes the most important command line arguments for `tra
823824
| `--8-bit-mode`, `-8` | Simluate quantized operation for hardware device (8-bit data) | |
824825
| `--exp-load-weights-from` | Load weights from file | |
825826
| *Export* | | |
826-
| `--summary onnx` | Export trained model to ONNX (default name: to model.onnx) | |
827+
| `--summary onnx` | Export trained model to ONNX (default name: to model.onnx) *see description below* | |
827828
| `--summary onnx_simplified` | Export trained model to simplified ONNX file (default name: model.onnx) | |
828829
| `--summary-filename` | Change the file name for the exported model | `--summary-filename mnist.onnx` |
829830
| `--save-sample` | Save data[index] from the test set to a NumPy pickle for use as sample data | `--save-sample 10` |
830831

832+
#### ONNX Model Export
833+
834+
The ONNX model export (via `--summary onnx` or `--summary onnx_simplified`) is primarily intended for visualization of the model. ONNX does not support all of the operators that `ai8x.py` uses, and these operators are therefore removed from the export (see function `onnx_export_prep()` in `ai8x.py`). The ONNX file does contain the trained weights and *may* therefore be usable for inference under certain circumstances. However, it is important to note that the ONNX file **will not** be usable for training (for example, the ONNX `floor` operator has a gradient of zero which is incompatible with quantization-aware training as implemented in `ai8x.py`).
835+
831836
### Observing GPU Resources
832837

833838
`nvidia-smi` can be used in a different terminal during training to examine the GPU resource usage of the training process. In the following example, the GPU is using 100% of its compute capabilities, but not all of the available memory. In this particular case, the batch size could be increased to use more memory.
@@ -1910,7 +1915,7 @@ Perform minimum accelerator initialization so it can be configured or restarted.
19101915
Configure the accelerator for the given network.
19111916

19121917
`int cnn_load_weights(void);`
1913-
Load the accelerator weights.
1918+
Load the accelerator weights. Note that `cnn_init()` must be called before loading weights after reset or wake from sleep.
19141919

19151920
`int cnn_verify_weights(void);`
19161921
Verify the accelerator weights (used for debug only).
@@ -2172,4 +2177,4 @@ https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/CONTRIBUT
21722177

21732178
---
21742179

2175-
o
2180+
o

README.pdf

5.35 KB
Binary file not shown.

distiller

izer/checkpoint.py

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
from . import op as opn
1616
from . import tornadocnn as tc
17-
from .eprint import eprint
17+
from .eprint import eprint, wprint
1818
from .utils import fls
1919

2020

@@ -58,12 +58,16 @@ def load(
5858
checkpoint = torch.load(checkpoint_file, map_location='cpu')
5959
print(f'Reading {checkpoint_file} to configure network weights...')
6060

61-
if 'state_dict' not in checkpoint or 'arch' not in checkpoint:
62-
raise RuntimeError("\nNo `state_dict` or `arch` in checkpoint file.")
63-
64-
if arch and checkpoint['arch'].lower() != arch.lower():
65-
eprint(f"Network architecture of configuration file ({arch}) does not match "
66-
f"network architecture of checkpoint file ({checkpoint['arch']}).")
61+
if 'state_dict' not in checkpoint:
62+
eprint("No `state_dict` in checkpoint file.")
63+
if 'arch' not in checkpoint:
64+
wprint("No `arch` in checkpoint file.")
65+
checkpoint_arch = ''
66+
else:
67+
checkpoint_arch = checkpoint['arch']
68+
if arch and checkpoint_arch.lower() != arch.lower():
69+
eprint(f"Network architecture of configuration file ({arch}) does not match "
70+
f"network architecture of checkpoint file ({checkpoint_arch}).")
6771

6872
checkpoint_state = checkpoint['state_dict']
6973
layers = 0
@@ -90,6 +94,9 @@ def load(
9094
w = checkpoint_state[k].numpy().astype(np.int64)
9195
w_min, w_max, w_abs = w.min(), w.max(), np.abs(w)
9296

97+
if np.all(w == 0):
98+
wprint(f'All weights for `{k}` are zero.')
99+
93100
# Determine quantization or make sure that what was given fits
94101
if quantization[seq] is not None:
95102
if quantization[seq] == -1:
@@ -98,7 +105,8 @@ def load(
98105
assert w_min >= -(2**(quantization[seq]-1))
99106
assert w_max < 2**(quantization[seq]-1)
100107
else:
101-
if tc.dev.SUPPORT_BINARY_WEIGHTS and w_abs.min() == w_abs.max() == 1:
108+
if tc.dev.SUPPORT_BINARY_WEIGHTS and w_abs.min() == w_abs.max() == 1 \
109+
and not np.any(w_abs == 0):
102110
quantization[seq] = -1
103111
else:
104112
if w_max > 0:
@@ -109,7 +117,10 @@ def load(
109117
w_min_m = int(w_min)
110118
else:
111119
w_min_m = int(abs(w_min)) - 1
112-
quantization[seq] = 1 << (fls(max(fls(w_max_m), fls(w_min_m)) + 1) + 1)
120+
if w_max_m > 0 or w_min_m > 0:
121+
quantization[seq] = 1 << (fls(max(fls(w_max_m), fls(w_min_m)) + 1) + 1)
122+
else:
123+
quantization[seq] = 1 # all weights zero
113124
assert quantization[seq] <= 8
114125
quant.append(quantization[seq])
115126

@@ -166,6 +177,9 @@ def load(
166177
w = checkpoint_state[bias_name].numpy(). \
167178
astype(np.int64) // tc.dev.BIAS_DIV
168179

180+
if np.all(w == 0):
181+
wprint(f'All bias values for `{bias_name}` are zero.')
182+
169183
w_min, w_max = w.min(), w.max()
170184
assert w_min >= -(2**(bias_quantization[seq]-1))
171185
assert w_max < 2**(bias_quantization[seq]-1)
@@ -210,7 +224,7 @@ def load(
210224
seq += 1
211225

212226
if verbose:
213-
print(f'Checkpoint for epoch {checkpoint["epoch"]}, model {checkpoint["arch"]} - '
227+
print(f'Checkpoint for epoch {checkpoint["epoch"]}, model {checkpoint_arch} - '
214228
'weight and bias data:')
215229
print(' InCh OutCh Weights Quant Shift Min Max Size '
216230
'Key Bias Quant Min Max Size Key')

izer/max7800x.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2741,7 +2741,7 @@ def run_eltwise(
27412741
memfile.close()
27422742

27432743
data_buf.append(out_buf.reshape(out_size))
2744-
if streaming[ll]:
2744+
if next_sequence[ll] != -1 and streaming[next_sequence[ll]]:
27452745
# When streaming, the output should not overwrite the input of prior layers since
27462746
# these layers are still needed.
27472747
in_map = [a if a is not None else b for a, b, in zip(in_map, out_map)]

izer/quantize.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from . import tornadocnn as tc
1818
from . import yamlcfg
1919
from .devices import device
20-
from .eprint import wprint
20+
from .eprint import eprint, wprint
2121

2222
CONV_SCALE_BITS = 8
2323
CONV_DEFAULT_WEIGHT_BITS = 8
@@ -50,7 +50,7 @@ def convert_checkpoint(input_file, output_file, arguments):
5050
print(get_contents_table(checkpoint))
5151

5252
if 'state_dict' not in checkpoint:
53-
raise RuntimeError("\nNo state_dict in checkpoint file.")
53+
eprint("No `state_dict` in checkpoint file.")
5454

5555
checkpoint_state = checkpoint['state_dict']
5656
compression_sched = checkpoint['compression_sched'] \
@@ -96,6 +96,9 @@ def get_max_bit_shift(t, return_bit_shift=False):
9696
# If not using quantization-aware training (QAT),
9797
# scale to our fixed point representation using any of four methods
9898
# The 'magic constant' seems to work best for SCALE
99+
if 'extras' not in checkpoint:
100+
wprint("No `extras` in checkpoint file.")
101+
checkpoint['extras'] = {}
99102
if arguments.clip_mode is not None:
100103
if arguments.clip_mode == 'STDDEV':
101104
sat_fn = partial(mean_n_stds_max_abs, n_stds=arguments.stddev)

izer/rtlsim.py

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,13 @@ def create_runtest_sv(
9191
)
9292
if tc.dev.MODERN_SIM:
9393
runfile.write(
94-
'\n`define CNN_ENA `DIGITAL_TOP.xuut1.x16proc[0].xproc.xuut.cnnena\n'
95-
'`define CNN_CLK `DIGITAL_TOP.xuut1.x16proc[0].xproc.xuut.clk\n\n'
94+
'\n`ifdef gate_sims\n'
95+
' `define CNN_ENA `DIGITAL_TOP.xuut1.x16proc_0__xproc_xuut.xcnn_fsm2.cnnena'
96+
'\n `define CNN_CLK `DIGITAL_TOP.xuut1.x16proc_0__xproc_xuut.clk\n'
97+
'`else\n'
98+
' `define CNN_ENA `DIGITAL_TOP.xuut1.x16proc[0].xproc.xuut.cnnena\n'
99+
' `define CNN_CLK `DIGITAL_TOP.xuut1.x16proc[0].xproc.xuut.clk\n'
100+
'`endif\n\n'
96101
)
97102
else:
98103
runfile.write(
@@ -110,6 +115,7 @@ def create_runtest_sv(
110115
if result_output:
111116
runfile.write('int chk_stat;\n')
112117
runfile.write(
118+
'logic chk_clk;\n'
113119
'\ninitial begin\n'
114120
)
115121
if result_output:
@@ -142,7 +148,8 @@ def create_runtest_sv(
142148
' $display("CNN enabled");\n'
143149
' end\n'
144150
'end\n\n'
145-
'always @(negedge `CNN_ENA) begin\n'
151+
'assign #10 chk_clk = `CNN_ENA;\n\n'
152+
'always @(negedge chk_clk) begin\n'
146153
' if (start_ena) begin\n'
147154
' end_time = $realtime;\n'
148155
' clkena1 = 1;\n'

requirements.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
numpy>=1.19,<1.20
1+
numpy>=1.20.2,<1.21
22
PyYAML>=5.1.1
33
tabulate==0.8.3
44
future>=0.17.1
55
six>=1.12.0
66
scipy>=1.3.0
7-
torch==1.7.1
7+
torch==1.8.1
88
pytest~=4.6.4
99
onnx>=1.7.0
10-
tensorboard==2.4.0
10+
tensorboard==2.4.1
1111
colorama>=0.4.4
1212
-e file:distiller

0 commit comments

Comments
 (0)