You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Change defaults for --mexpress, --compact-data, --new-kernel-loader; balance energy on MAX78002; README (#243)
* Balance energy use on MAX78002 for loading data and weights (default), add --max-speed for fastest load/unload
* Correct APB/IPO clock display on MAX78002
* Make --mexpress, --compact-data and --new-kernel-loader default; add --no-mexpress, and --no-compact-data
* Check whether checkpoint is missing 'epoch'
* README: Update Development Flow graphics, correctly display .md in dark mode
Copy file name to clipboardExpand all lines: README.md
+16-11Lines changed: 16 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# ADI MAX78000/MAX78002 Model Training and Synthesis
2
2
3
-
June 2, 2022
3
+
June 14, 2022
4
4
5
5
ADI’s MAX78000/MAX78002 project is comprised of five repositories:
6
6
@@ -106,7 +106,7 @@ When going beyond simple models, model training does not work well without CUDA
106
106
107
107
* There is a PyTorch pre-release with ROCm acceleration for certain AMD GPUs on Linux ([see blog entry](https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/)), but this is not currently covered by the installation instructions in this document, and it is not supported.
108
108
109
-
* At this time, there is neither CUDA nor ROCm nor Neural Engine support on macOS, and therefore no hardware acceleration (there is a pre-release version of PyTorch with M1 acceleration, and M1 acceleration will be supported in a future release of these tools).
109
+
* At this time, there is neither CUDA nor ROCm nor Neural Engine support on macOS, and therefore no hardware acceleration (there is a pre-release version of PyTorch with M1 acceleration on macOS 12.3 or later, and M1 acceleration will be supported in a future release of these tools).
110
110
111
111
* PyTorch does not include CUDA support for aarch64/arm64 systems. *Rebuilding PyTorch from source is not covered by this document.*
112
112
@@ -2147,15 +2147,18 @@ The following table describes the most important command line arguments for `ai8
2147
2147
|`--board-name`| Set the target board (default: `EvKit_V1`) |`--board-name FTHR_RevA`|
2148
2148
|*Code generation*|||
2149
2149
|`--overwrite`| Produce output even when the target directory exists (default: abort) ||
2150
-
|`--compact-data`| Use *memcpy* to load input data in order to save code space ||
2151
2150
|`--compact-weights`| Use *memcpy* to load weights in order to save code space ||
2152
-
|`--mexpress`| Use faster kernel loading ||
2151
+
|`--mexpress`| Use faster kernel loading (default) ||
2152
+
|`--no-mexpress`| Use alternate kernel loading (slower) ||
2153
2153
|`--mlator`| Use hardware to swap output bytes (useful for large multi-channel outputs) ||
2154
2154
|`--softmax`| Add software Softmax functions to generated code ||
2155
2155
|`--boost`| Turn on a port pin to boost the CNN supply |`--boost 2.5`|
2156
2156
|`--timer`| Insert code to time the inference using a timer |`--timer 0`|
2157
2157
|`--no-wfi`| Do not use WFI (wait for interrupt) instructions when waiting for CNN completion ||
|`--no-pipeline`|**MAX78002 only**: Disable the pipeline and run the CNN on the slower APB clock. This reduces power consumption, but increases inference time and in most cases overall energy usage. ||
2161
+
|`--max-speed`|**MAX78002 only:** In pipeline mode, load weights and input data on the PLL clock divided by 1 instead of divided by 4. This is approximately 50% faster, but uses 200% of the energy compared to the default settings. ||
2159
2162
|*File names*|||
2160
2163
|`--c-filename`| Main C file name base (default: main.c) |`--c-filename main.c`|
2161
2164
|`--api-filename`| API C file name (default: cnn.c) |`--api-filename cnn.c`|
@@ -2266,7 +2269,7 @@ layers:
2266
2269
To generate an embedded MAX78000 demo in the `demos/ai85-mnist/` folder, use the following command line:
*Note: For MAX78002, substitute MAX78002 as the `--device`.*
@@ -2725,22 +2728,23 @@ In order to achieve this, a layer must be inserted that does nothing else but re
2725
2728
...
2726
2729
layers:
2727
2730
...
2728
-
# Layer 1
2731
+
# Layer I
2729
2732
- out_offset: 0x0000
2730
2733
processors: 0x0ffff00000000000
2731
2734
operation: conv2d
2732
2735
kernel_size: 3x3
2733
2736
pad: 1
2734
2737
activate: ReLU
2735
2738
2736
-
# Layer 2 - re-format data with gap
2739
+
# Layer II - re-format data with gap
2737
2740
- out_offset: 0x2000
2738
2741
processors: 0x00000000000fffff
2739
2742
output_processors: 0x00000000000fffff
2740
2743
operation: passthrough
2741
2744
write_gap: 1
2745
+
name: layerII
2742
2746
2743
-
# Layer 3
2747
+
# Layer III
2744
2748
- in_offset: 0x0000
2745
2749
out_offset: 0x2004
2746
2750
processors: 0x00000000000fffff
@@ -2749,9 +2753,10 @@ layers:
2749
2753
pad: 1
2750
2754
activate: ReLU
2751
2755
write_gap: 1
2756
+
name: layerIII
2752
2757
2753
-
# Layer 4 - Residual
2754
-
- in_sequences: [2, 3]
2758
+
# Layer IV - Residual
2759
+
- in_sequences: [layerII, layerIII]
2755
2760
in_offset: 0x2000
2756
2761
out_offset: 0x0000
2757
2762
processors: 0x00000000000fffff
@@ -2964,7 +2969,7 @@ Perform minimum accelerator initialization so it can be configured or restarted.
2964
2969
Configure the accelerator for the given network.
2965
2970
2966
2971
`int cnn_load_weights(void);`
2967
-
Load the accelerator weights. `cnn_init()` must be called before loading weights after reset or wake from sleep. *Note that the physical weight memories are 72-bit wide. When `--mexpress` mode is enabled, the weight data is written in a sequence of 32-bit writes, containing the “packed” weight values. When `--mexpress` is disabled, each weight memory is written in four 32-bit memory writes, with zero-padded data.*
2972
+
Load the accelerator weights. `cnn_init()` must be called before loading weights after reset or wake from sleep. *Note that the physical weight memories are 72-bit wide. When `--mexpress` mode is enabled (default), the weight data is written in a sequence of 32-bit writes, containing the “packed” weight values. When using `--no-mexpress`, each weight memory is written in four 32-bit memory writes, with zero-padded data.*
2968
2973
2969
2974
`int cnn_verify_weights(void);`
2970
2975
Verify the accelerator weights (used for debug only).
0 commit comments