Skip to content

Commit 491cf8d

Browse files
author
Robert Muchsel
authored
README updates; AI87 updates (#114)
2 parents 58c08b1 + c65b2d9 commit 491cf8d

15 files changed

+210
-85
lines changed

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_March 2, 2021_
3+
_March 15, 2021_
44

55
The Maxim Integrated AI project is comprised of four repositories:
66

@@ -609,13 +609,17 @@ $$ w_0 * w_1 = 128/128 → saturation → 01111111 (= 127/128) $$
609609

610610
#### HWC
611611

612-
All internal data are stored in HWC format, 4 channels per 32-bit word. Assuming 3-color (or 3-channel) input, one byte will be unused. Example:
612+
All internal data are stored in HWC format, 4 channels per 32-bit word. Assuming 3-color (or 3-channel) input, one byte will be unused. The highest frequency in this data format is the channel, so the channels are interleaved.
613+
614+
Example:
613615

614616
![0BGR 0BGR 0 BGR 0BGR...](docs/HWC.png)
615617

616618
#### CHW
617619

618-
The input layer can alternatively also use the CHW format (sequence of channels), for example:
620+
The input layer can alternatively also use the CHW format (a sequence of channels). The highest frequency in this data format is the width or X-axis (W), and the lowest frequency is the channel. Assuming an RGB input, all red pixels are followed by all green pixels, followed by all blue pixels.
621+
622+
Example:
619623

620624
![RRRRRR...GGGGGG...BBBBBB...](docs/CHW.png)
621625

@@ -780,6 +784,8 @@ The `ai84net.py` and `ai85net.py` files contain models that fit into AI84’s we
780784

781785
To train the FP32 model for MNIST on MAX78000, run `scripts/train_mnist.sh` from the `ai8x-training` project. This script will place checkpoint files into the log directory. Training makes use of the Distiller framework, but the `train.py` software has been modified slightly to improve it and add some MAX78000/MAX78002 specifics.
782786

787+
Since training can take hours or days, the training script does not overwrite any weights previously produced. Results are placed in sub-directories under `logs/` named with date and time when training began. The latest results are always soft-linked to by `latest-log_dir` and `latest_log_file`.
788+
783789
### Command Line Arguments
784790

785791
The following table describes the most important command line arguments for `train.py`. Use `--help` for a complete list.
@@ -790,7 +796,7 @@ The following table describes the most important command line arguments for `tra
790796
| *Device selection* | | |
791797
| `--device` | Set device (default: AI84) | `--device MAX78000` |
792798
| *Model and dataset* | | |
793-
| `-a`, `--arch` | Set model (collected from models folder) | `--model ai85net5` |
799+
| `-a`, `--arch`, `--model` | Set model (collected from models folder) | `--model ai85net5` |
794800
| `--dataset` | Set dataset (collected from datasets folder) | `--dataset MNIST` |
795801
| `--data` | Path to dataset (default: data) | `--data /data/ml` |
796802
| *Training* | | |
@@ -802,6 +808,7 @@ The following table describes the most important command line arguments for `tra
802808
| `--resume-from` | Resume from previous checkpoint | `--resume-from chk.pth.tar` |
803809
| `--qat-policy` | Define QAT policy in YAML file (default: qat_policy.yaml). Use ‘’None” to disable QAT. | `--qat-policy qat_policy.yaml` |
804810
| *Display and statistics* | | |
811+
| `--enable-tensorboard` | Enable logging to TensorBoard (default: disabled) | |
805812
| `--confusion` | Display the confusion matrix | |
806813
| `--param-hist` | Collect parameter statistics | |
807814
| `--pr-curves` | Generate precision-recall curves | |
@@ -941,7 +948,7 @@ Both TensorBoard and Manifold can be used for model comparison and feature attri
941948

942949
#### TensorBoard
943950

944-
TensorBoard is built into `train.py`. It provides a local web server that can be started before, during, or after training and it picks up all data that is written to the `logs/` directory.
951+
TensorBoard is built into `train.py`. When enabled using `--enable-tensorboard`, it provides a local web server that can be started before, during, or after training and it picks up all data that is written to the `logs/` directory.
945952

946953
For classification models, TensorBoard supports the optional `--param-hist` and `--embedding` command line arguments. `--embedding` randomly selects up to 100 data points from the last batch of each verification epoch. These can be viewed in the “projector” tab in TensorBoard.
947954

@@ -1169,6 +1176,7 @@ The following table describes the most important command line arguments for `ai8
11691176
| `--prefix` | Set test name prefix | `--prefix mnist` |
11701177
| `--board-name` | Set the target board (default: `EvKit_V1`) | `--board-name FTHR_RevA` |
11711178
| *Code generation* | | |
1179+
| `--overwrite` | Produce output even when the target directory exists (default: abort) | |
11721180
| `--compact-data` | Use *memcpy* to load input data in order to save code space | |
11731181
| `--compact-weights` | Use *memcpy* to load weights in order to save code space | |
11741182
| `--mexpress` | Use faster kernel loading | |
@@ -1227,6 +1235,8 @@ The following table describes the most important command line arguments for `ai8
12271235

12281236
### YAML Network Description
12291237

1238+
The [quick-start guide](https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/Guides/YAML%20Quickstart.md) provides a short overview of the purpose and structure of the YAML network description file.
1239+
12301240
An example network description for the ai85net5 architecture and MNIST is shown below:
12311241

12321242
```yaml
@@ -1422,7 +1432,7 @@ Example:
14221432

14231433
##### `activate` (Optional)
14241434

1425-
This key describes whether to activate the layer output (the default is to not activate). When specified, this key must be `ReLU`, `Abs` or `None` (the default).
1435+
This key describes whether to activate the layer output (the default is to not activate). When specified, this key must be `ReLU`, `Abs` or `None` (the default). *Please note that there is always an implicit non-linearity when outputting 8-bit data since outputs are clamped to $[–1, +127/128]$ during training.*
14261436

14271437
Note that the output values are clipped (saturated) to $[0, +127]$. Because of this, `ReLU` behaves more similar to PyTorch’s `nn.Hardtanh(min_value=0, max_value=127)` than to PyTorch’s `nn.ReLU()`.
14281438

@@ -2123,3 +2133,4 @@ https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/CONTRIBUT
21232133

21242134
---
21252135

2136+
o

README.pdf

16 KB
Binary file not shown.

izer/apbaccess.py

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"""
1010
import os
1111

12-
from . import kernels, toplevel
12+
from . import toplevel
1313
from . import tornadocnn as tc
1414
from . import unload
1515
from .eprint import eprint, wprint
@@ -456,7 +456,9 @@ def write_kern(
456456
k,
457457
size=9,
458458
verify_only=False,
459-
calcx4=False,
459+
calcx4=None,
460+
kern_offs=None,
461+
count=None,
460462
):
461463
"""
462464
Write single kernel `k` of length `size` for layer `ll`, processor `p` to index `idx` in
@@ -465,7 +467,20 @@ def write_kern(
465467
assert p < tc.dev.MAX_PROC
466468
assert idx < tc.dev.mask_width(p)
467469

468-
idx_x4 = idx if not calcx4 else kernels.calcx4_index(idx)
470+
if calcx4[ll]:
471+
start = kern_offs[ll]
472+
mem, rem = divmod((idx - start), (count + 3) // 4)
473+
start //= 4
474+
if idx < tc.dev.MASK_WIDTH_SMALL:
475+
assert 0 <= mem < 4
476+
idx_x4 = mem * (tc.dev.MASK_WIDTH_SMALL // 4) + rem + start
477+
else:
478+
idx_x4 = idx - tc.dev.MASK_WIDTH_SMALL
479+
idx_x4 = mem * ((tc.dev.MASK_WIDTH_LARGE - tc.dev.MASK_WIDTH_SMALL) // 4) + rem \
480+
+ tc.dev.MASK_WIDTH_SMALL + start
481+
else:
482+
idx_x4 = idx
483+
469484
addr = tc.dev.C_GROUP_OFFS * (p // tc.dev.P_NUMPRO) \
470485
+ tc.dev.C_MRAM_BASE \
471486
+ (p % tc.dev.P_NUMPRO) * tc.dev.MASK_OFFS * 16 + idx_x4 * 16

izer/commandline.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ def get_parser():
7070

7171
# Code generation
7272
group = parser.add_argument_group('Code generation')
73+
group.add_argument('--overwrite', action='store_true', default=False,
74+
help="overwrite destination if it exists (default: abort)")
7375
group.add_argument('--compact-data', action='store_true', default=False,
7476
help="use memcpy() to load input data in order to save code space")
7577
group.add_argument('--compact-weights', action='store_true', default=False,
@@ -253,7 +255,7 @@ def get_parser():
253255

254256
# Streaming
255257
group = parser.add_argument_group('Streaming tweaks')
256-
group.add_argument('--overlap-data', '--overwrite-ok', '--allow-overwrite',
258+
group.add_argument('--overlap-data',
257259
dest='overwrite_ok', action='store_true', default=False,
258260
help="allow output to overwrite input (default: warn/stop)")
259261
group.add_argument('--override-start', type=lambda x: int(x, 0), metavar='N',

izer/izer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -610,6 +610,7 @@ def main():
610610
snoop_sequence=snoop_sequence,
611611
simulated_sequence=simulated_sequence,
612612
debug_snoop=args.debug_snoop,
613+
overwrite=args.overwrite,
613614
)
614615
if not args.embedded_code and args.autogen.lower() != 'none':
615616
rtlsim.append_regression(

izer/kernels.py

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,8 @@ def load( # pylint: disable=too-many-branches,too-many-statements
119119
assert kern_offs[ll] == start_offs
120120
continue
121121

122+
qfactor = 8 // abs(quantization[ll])
123+
122124
if flatten[ll]:
123125
kernel_reshaped = kernel[ll].reshape(
124126
output_chan[ll],
@@ -135,15 +137,30 @@ def load( # pylint: disable=too-many-branches,too-many-statements
135137
in_chan = in_expand_thresh[ll]
136138
elif calcx4[ll]:
137139
# FIXME for output channels % 4 != 0
140+
assert output_chan[ll] % 4 == 0
138141
kernel_reshaped = kernel[ll].reshape(
139142
output_chan[ll] // 4,
140143
4,
141-
in_expand[ll],
142-
in_expand_thresh[ll],
144+
-1,
145+
).transpose(1, 0, 2).reshape(
146+
kernel[ll].shape
147+
)
148+
149+
in_exp = in_expand[ll]
150+
in_chan = input_chan[ll]
151+
elif ll == 0 and quad and qfactor != 1:
152+
# FIXME for output channels % (4 * qfactor) != 0
153+
assert output_chan[ll] % (4 * qfactor) == 0
154+
kernel_reshaped = kernel[ll].reshape(
155+
output_chan[ll] // (4 * qfactor),
156+
qfactor,
157+
4,
158+
input_chan[ll],
143159
kernel_size[ll][0] * kernel_size[ll][1],
144-
).transpose(0, 2, 1, 3, -1).reshape(
160+
).transpose(0, 2, 1, 3, 4).reshape(
145161
kernel[ll].shape
146162
)
163+
147164
in_exp = in_expand[ll]
148165
in_chan = input_chan[ll]
149166
else:
@@ -186,7 +203,6 @@ def load( # pylint: disable=too-many-branches,too-many-statements
186203
kern_offs[ll] *= 4
187204

188205
ksize = kernel_size[ll][0] * kernel_size[ll][1]
189-
qfactor = 8 // abs(quantization[ll])
190206
next_layer_map = output_processor_map[ll]
191207
first_output_proc = ffs(next_layer_map)
192208
start_col = first_output_proc % tc.dev.P_SHARED # First target column out of 4 shared
@@ -411,8 +427,12 @@ def add_kernel_data(ll, p, col_target, b):
411427
for col in range(0, tc.dev.mask_width(p)):
412428
ll = kernel_map[p][col]
413429
if ll != _INVALID_VALUE:
414-
k = kernel_data[p][col]
415-
apb.write_kern(ll, p, col, k, verify_only=verify, calcx4=calcx4[ll])
430+
apb.write_kern(ll, p, col, kernel_data[p][col],
431+
verify_only=verify, calcx4=calcx4,
432+
kern_offs=kern_offs,
433+
count=in_expand[ll] * output_chan[ll] * 9
434+
* abs(quantization[ll])
435+
// (kernel_size[ll][0] * kernel_size[ll][1] * 8))
416436
apb.function_footer() # verify_weights()
417437

418438
if not (embedded_code or mexpress):
@@ -424,10 +444,14 @@ def add_kernel_data(ll, p, col_target, b):
424444
if ll != _INVALID_VALUE:
425445
k = kernel_data[p][col]
426446
if not zero_sram or np.any(k != 0):
427-
apb.write_kern(ll, p, col, k, calcx4=calcx4[ll])
447+
apb.write_kern(ll, p, col, k, calcx4=calcx4,
448+
kern_offs=kern_offs,
449+
count=in_expand[ll] * output_chan[ll] * 9
450+
* abs(quantization[ll])
451+
// (kernel_size[ll][0] * kernel_size[ll][1] * 8))
428452
apb.function_footer() # load_weights()
429453

430-
if embedded_code or mexpress:
454+
else: # embedded_code or mexpress
431455
# Write kernels, combining layers and processors where possible to reduce the number
432456
# of constants and calls to memcpy.
433457
apb.output('// Kernels:\n', api)

izer/max7800x.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ def create_net( # pylint: disable=too-many-arguments,too-many-locals,too-many-b
168168
snoop_sequence=None,
169169
simulated_sequence=None,
170170
debug_snoop=False,
171+
overwrite=False,
171172
):
172173
"""
173174
Chain multiple CNN layers, create and save input and output
@@ -530,7 +531,10 @@ def create_net( # pylint: disable=too-many-arguments,too-many-locals,too-many-b
530531
target_dir = os.path.join(base_directory, test_name)
531532
os.makedirs(target_dir, exist_ok=False)
532533
except OSError:
533-
wprint(target_dir, 'exists')
534+
if not overwrite:
535+
eprint('The target folder', target_dir, 'exists. Use --overwrite to proceed.')
536+
else:
537+
wprint('--overwrite specified, writing to ', target_dir, ' even though it exists.')
534538

535539
# Redirect stdout?
536540
if log:

0 commit comments

Comments
 (0)