Skip to content

Commit 06ee5a0

Browse files
author
Robert Muchsel
authored
Change defaults for --mexpress, --compact-data, --new-kernel-loader; balance energy on MAX78002; README (#243)
* Balance energy use on MAX78002 for loading data and weights (default), add --max-speed for fastest load/unload * Correct APB/IPO clock display on MAX78002 * Make --mexpress, --compact-data and --new-kernel-loader default; add --no-mexpress, and --no-compact-data * Check whether checkpoint is missing 'epoch' * README: Update Development Flow graphics, correctly display .md in dark mode
1 parent bfaeb1f commit 06ee5a0

38 files changed

+76
-39
lines changed

README.md

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ADI MAX78000/MAX78002 Model Training and Synthesis
22

3-
June 2, 2022
3+
June 14, 2022
44

55
ADI’s MAX78000/MAX78002 project is comprised of five repositories:
66

@@ -106,7 +106,7 @@ When going beyond simple models, model training does not work well without CUDA
106106

107107
* There is a PyTorch pre-release with ROCm acceleration for certain AMD GPUs on Linux ([see blog entry](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/)), but this is not currently covered by the installation instructions in this document, and it is not supported.
108108

109-
* At this time, there is neither CUDA nor ROCm nor Neural Engine support on macOS, and therefore no hardware acceleration (there is a pre-release version of PyTorch with M1 acceleration, and M1 acceleration will be supported in a future release of these tools).
109+
* At this time, there is neither CUDA nor ROCm nor Neural Engine support on macOS, and therefore no hardware acceleration (there is a pre-release version of PyTorch with M1 acceleration on macOS 12.3 or later, and M1 acceleration will be supported in a future release of these tools).
110110

111111
* PyTorch does not include CUDA support for aarch64/arm64 systems. *Rebuilding PyTorch from source is not covered by this document.*
112112

@@ -2147,15 +2147,18 @@ The following table describes the most important command line arguments for `ai8
21472147
| `--board-name` | Set the target board (default: `EvKit_V1`) | `--board-name FTHR_RevA` |
21482148
| *Code generation* | | |
21492149
| `--overwrite` | Produce output even when the target directory exists (default: abort) | |
2150-
| `--compact-data` | Use *memcpy* to load input data in order to save code space | |
21512150
| `--compact-weights` | Use *memcpy* to load weights in order to save code space | |
2152-
| `--mexpress` | Use faster kernel loading | |
2151+
| `--mexpress` | Use faster kernel loading (default) | |
2152+
| `--no-mexpress` | Use alternate kernel loading (slower) | |
21532153
| `--mlator` | Use hardware to swap output bytes (useful for large multi-channel outputs) | |
21542154
| `--softmax` | Add software Softmax functions to generated code | |
21552155
| `--boost` | Turn on a port pin to boost the CNN supply | `--boost 2.5` |
21562156
| `--timer` | Insert code to time the inference using a timer | `--timer 0` |
21572157
| `--no-wfi` | Do not use WFI (wait for interrupt) instructions when waiting for CNN completion | |
21582158
| `--define` | Additional preprocessor defines | `--define "FAST GOOD"` |
2159+
| *MAX78002* | | |
2160+
| `--no-pipeline` | **MAX78002 only**: Disable the pipeline and run the CNN on the slower APB clock. This reduces power consumption, but increases inference time and in most cases overall energy usage. | |
2161+
| `--max-speed` | **MAX78002 only:** In pipeline mode, load weights and input data on the PLL clock divided by 1 instead of divided by 4. This is approximately 50% faster, but uses 200% of the energy compared to the default settings. | |
21592162
| *File names* | | |
21602163
| `--c-filename` | Main C file name base (default: main.c) | `--c-filename main.c` |
21612164
| `--api-filename` | API C file name (default: cnn.c) | `--api-filename cnn.c` |
@@ -2266,7 +2269,7 @@ layers:
22662269
To generate an embedded MAX78000 demo in the `demos/ai85-mnist/` folder, use the following command line:
22672270
22682271
```shell
2269-
(ai8x-synthesize) $ python ai8xize.py --verbose --test-dir demos --prefix ai85-mnist --checkpoint-file trained/ai85-mnist.pth.tar --config-file networks/mnist-chw-ai85.yaml --device MAX78000 --compact-data --mexpress --softmax
2272+
(ai8x-synthesize) $ python ai8xize.py --verbose --test-dir demos --prefix ai85-mnist --checkpoint-file trained/ai85-mnist.pth.tar --config-file networks/mnist-chw-ai85.yaml --device MAX78000 --compact-data --softmax
22702273
```
22712274
22722275
*Note: For MAX78002, substitute MAX78002 as the `--device`.*
@@ -2725,22 +2728,23 @@ In order to achieve this, a layer must be inserted that does nothing else but re
27252728
...
27262729
layers:
27272730
...
2728-
# Layer 1
2731+
# Layer I
27292732
- out_offset: 0x0000
27302733
processors: 0x0ffff00000000000
27312734
operation: conv2d
27322735
kernel_size: 3x3
27332736
pad: 1
27342737
activate: ReLU
27352738
2736-
# Layer 2 - re-format data with gap
2739+
# Layer II - re-format data with gap
27372740
- out_offset: 0x2000
27382741
processors: 0x00000000000fffff
27392742
output_processors: 0x00000000000fffff
27402743
operation: passthrough
27412744
write_gap: 1
2745+
name: layerII
27422746
2743-
# Layer 3
2747+
# Layer III
27442748
- in_offset: 0x0000
27452749
out_offset: 0x2004
27462750
processors: 0x00000000000fffff
@@ -2749,9 +2753,10 @@ layers:
27492753
pad: 1
27502754
activate: ReLU
27512755
write_gap: 1
2756+
name: layerIII
27522757
2753-
# Layer 4 - Residual
2754-
- in_sequences: [2, 3]
2758+
# Layer IV - Residual
2759+
- in_sequences: [layerII, layerIII]
27552760
in_offset: 0x2000
27562761
out_offset: 0x0000
27572762
processors: 0x00000000000fffff
@@ -2964,7 +2969,7 @@ Perform minimum accelerator initialization so it can be configured or restarted.
29642969
Configure the accelerator for the given network.
29652970
29662971
`int cnn_load_weights(void);`
2967-
Load the accelerator weights. `cnn_init()` must be called before loading weights after reset or wake from sleep. *Note that the physical weight memories are 72-bit wide. When `--mexpress` mode is enabled, the weight data is written in a sequence of 32-bit writes, containing the “packed” weight values. When `--mexpress` is disabled, each weight memory is written in four 32-bit memory writes, with zero-padded data.*
2972+
Load the accelerator weights. `cnn_init()` must be called before loading weights after reset or wake from sleep. *Note that the physical weight memories are 72-bit wide. When `--mexpress` mode is enabled (default), the weight data is written in a sequence of 32-bit writes, containing the “packed” weight values. When using `--no-mexpress`, each weight memory is written in four 32-bit memory writes, with zero-padded data.*
29682973
29692974
`int cnn_verify_weights(void);`
29702975
Verify the accelerator weights (used for debug only).

README.pdf

-7.96 KB
Binary file not shown.

docs/CHW.png

12.1 KB
Loading

docs/CNNInFlight.png

6.76 KB
Loading

docs/CNNOverview.png

21.5 KB
Loading

docs/Conv2Dk1x1.png

23.9 KB
Loading

docs/DataMemory.png

2.15 KB
Loading

docs/DevelopmentFlow.png

-110 KB
Loading

docs/HWC.png

9.21 KB
Loading

docs/KernelMemory.png

2.36 KB
Loading

0 commit comments

Comments
 (0)