Skip to content

Commit 5bfc060

Browse files
author
Robert Muchsel
authored
README: Debugging techniques, preventing Flash overflow; fix --mlator; default optimization -O2; checkpoint reader bug fixes (#131)
* Improve --synthesize-input, add --synthesize-words * Fix --mlator code generation; ensure verify_output does not use mlator for 32-bit output * Change default optimization level to -O2 * README: Debugging techniques; handling memory overflows * Adjust stream_start when using pooling in the first layer for MAX78000
1 parent 48f92d9 commit 5bfc060

File tree

19 files changed

+212
-56
lines changed

19 files changed

+212
-56
lines changed

.github/workflows/linter.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,4 +28,5 @@ jobs:
2828
VALIDATE_MARKDOWN: false
2929
VALIDATE_PYTHON_BLACK: false
3030
VALIDATE_JSCPD: false
31+
VALIDATE_CPP: false
3132
FILTER_REGEX_EXCLUDE: attic/.*

README.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_May 4, 2021_
3+
_May 7, 2021_
44

55
The Maxim Integrated AI project is comprised of four repositories:
66

@@ -818,6 +818,7 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
818818
* When using data greater than 90×91, `streaming` mode must be used.
819819
* When using `streaming` mode, the product of any layer’s input width, input height, and input channels divided by 64 rounded up must not exceed 2^21: $width * height * ⌈\frac{channels}{64}⌉ < 2^{21}$. _width_ and _height_ must not exceed 1023.
820820
* Streaming is limited to 8 layers or less, and is limited to four FIFOs (up to 4 input channels in CHW and up to 16 channels in HWC format), see [FIFOs](#FIFOs).
821+
* For streaming layers, bias values may not be added correctly in all cases.
821822

822823
* The weight memory supports up to 768 * 64 3×3 Q7 kernels (see [Number Format](#Number-Format)).
823824
When using 1-, 2- or 4 bit weights, the capacity increases accordingly.
@@ -1325,6 +1326,10 @@ The following table describes the most important command line arguments for `ai8
13251326
| `--ready-sel` | Specify memory waitstates | |
13261327
| `--ready-sel-fifo` | Specify FIFO waitstates | |
13271328
| `--ready-sel-aon` | Specify AON waitstates | |
1329+
| Various | | |
1330+
| `--synthesize-input` | Instead of using large sample input data, use only the first `--synthesize-words` words of the sample input, and add N to each subsequent set of `--synthesize-words` 32-bit words | `--synthesize-input 0x112233` |
1331+
| `--synthesize-words` | When using `—synthesize-input`, specifies how many words to use from the input. The default is 8. This number must be a divisor of the total number of pixels per channel. | `--synthesize-words 64` |
1332+
| `--max-checklines` | Instead of checking all of the expected output data, verify only the first N words | `--max-checklines 1024` |
13281333

13291334
### YAML Network Description
13301335

@@ -2042,7 +2047,57 @@ The generator also adds all files from the `assets/eclipse`, `assets/device-all`
20422047
* For MAX78000/MAX78002, the software Softmax is implemented in `softmax.c`.
20432048
* A template for the `cnn.h` header file in `templatecnn.h`. The template is customized during code generation using model statistics and timer, but uses common function signatures for all projects.
20442049

2050+
#### Determining the Compiled Flash Image Size
20452051

2052+
The generated `.elf` file (either `max78000.elf` or `max78000-combined.elf`) contains debug and other meta information. To determine the true Flash image size, either examine the `.map` file, or convert the `.elf` to a binary image and examine the resulting image.
2053+
2054+
```shell
2055+
% arm-none-eabi-objcopy -I elf32-littlearm build/max78000.elf -O binary temp.bin
2056+
% ls -la temp.bin
2057+
-rwxr-xr-x 1 user staff 321968 Jan 1 11:11 temp.bin
2058+
```
2059+
2060+
#### Handling Linker Flash Section Overflows
2061+
2062+
When linking the generated C code, the code space might overflow:
2063+
2064+
```shell
2065+
$ make
2066+
CC main.c
2067+
CC cnn.c
2068+
...
2069+
LD build/max78000.elf
2070+
arm-none-eabi/bin/ld: build/max78000.elf section `.text' will not fit in region `FLASH'
2071+
arm-none-eabi/bin/ld: region `FLASH' overflowed by 600176 bytes
2072+
collect2: error: ld returned 1 exit status
2073+
```
2074+
2075+
The most likely reason is that the input is too large (from `sampledata.h`), or that the expected output is too large. It is important to note that this only affects the generated code with the built-in known-answer test (KAT) that will not be part of the user application since normal input and output data are not predefined in Flash memory.
2076+
2077+
To deal with this issue, there are several options:
2078+
2079+
* The sample input data can be stored in external memory. This requires modifications to the generated code. Please see the SDK examples to learn how to access external memory.
2080+
* The sample input data can be programmatically generated. Typically, this requires manual modification of the generated code, and a corresponding modification of the sample input file.
2081+
The generator also contains a built-in generator (supported *only* when using `—fifo`, and only for HWC inputs); the command line option `--synthesize-input` uses only the first few words of the sample input data, and then adds the specified value N (for example, 0x112233 if three input channels are used) to each subsequent set of M 32-bit words. M can be specified using `--synthesize-words` and defaults to 8. Note that M must be a divisor of the number of pixels per channel.
2082+
* The output check can be truncated. The command line option `--max-checklines` checks only the first N words of output data (for example, 1024).
2083+
* For 8-bit output values, `--mlator` typically generates more compact code.
2084+
* Change the compiler optimization level in `Makefile`. To change the default optimization levels, modify `MXC_OPTIMIZE_CFLAGS` in `assets/embedded-ai85/templateMakefile` for Arm code and `assets/embedded-riscv-ai85/templateMakefile.RISCV` for RISC-V code. Both `-O1` and `-Os` may result in smaller code compared to `-O2`.
2085+
* If the last layer has large-dimension, large-channel output, the `cnn_unload()` code in `cnn.c` may cause memory segment overflows not only in Flash, but also in the target buffer in SRAM (`ml_data32[]` or `ml_data[]` in `main.c`). In this case, manual code edits are required to perform multiple partial unloads in sequence.
2086+
2087+
#### Debugging Techniques
2088+
2089+
There can be many reasons why the known-answer test (KAT) fails for a given network. The following techniques may help in narrowing down where in the network or the YAML description of the network the error occurs:
2090+
2091+
* The default compiler optimization level is `-O2`, and incorrect code may be generated under rare circumstances. Lower the optimization level in the generated `Makefile` to `-O1`, clean (`make distclean && make clean`) and rebuild the project (`make`). If this solves the problem, one of the possible reasons is that code is missing the `volatile` keyword for certain variables.
2092+
To permanently adjust the default compiler optimization level, modify `MXC_OPTIMIZE_CFLAGS` in `assets/embedded-ai85/templateMakefile` for Arm code and `assets/embedded-riscv-ai85/templateMakefile.RISCV` for RISC-V code.
2093+
2094+
* `--stop-after N` where `N` is a layer number may help finding the problematic layer by terminating the network early without having to retrain and without having to change the weight input file. Note that this may also require `--max-checklines` as [described above](#Handling Linker Flash Section Overflows) since intermediate outputs tend to be large.
2095+
2096+
* `--no-bias LIST` where `LIST` is a comma-separated list of layers (e.g., `0,1,2,3`) can rule out problems due to the bias. This option zeros out the bias for the given layers without having to remove bias values from the weight input file.
2097+
2098+
* `--ignore-streaming` ignores all `streaming` statements in the YAML file. Note that this typically only works when the sample input is replaced with a different, lower-dimension sample input (for example, use 3×32×32 instead of 3×128×128).
2099+
2100+
20462101

20472102
#### Energy Measurement
20482103

README.pdf

70.6 KB
Binary file not shown.

assets/embedded-ai85/templateMakefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ PROJ_CFLAGS+=-Wall -Wcast-align
100100
#STARTUPFILE=start.S
101101

102102
# Override the default optimization level using this variable
103-
MXC_OPTIMIZE_CFLAGS=-O1
103+
MXC_OPTIMIZE_CFLAGS=-O2
104104

105105
################################################################################
106106
# Include external library makefiles here

assets/embedded-ai87/templateMakefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ PROJ_CFLAGS+=-Wall -Wcast-align
100100
#STARTUPFILE=start.S
101101

102102
# Override the default optimization level using this variable
103-
MXC_OPTIMIZE_CFLAGS=-O1
103+
MXC_OPTIMIZE_CFLAGS=-O2
104104

105105
################################################################################
106106
# Include external library makefiles here

assets/embedded-riscv-ai85/templateMakefile.ARM

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ PROJ_CFLAGS+=-Wall -Wcast-align
9494
#STARTUPFILE=startup_max78000.S
9595

9696
# Override the default optimization level using this variable
97-
MXC_OPTIMIZE_CFLAGS=-O1
97+
MXC_OPTIMIZE_CFLAGS=-O2
9898

9999
# Point this variable to a linker file to override the default file
100100
LINKERFILE=$(CMSIS_ROOT)/Device/Maxim/$(TARGET_UC)/Source/GCC/$(TARGET_LC)_arm.ld

assets/embedded-riscv-ai85/templateMakefile.RISCV

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ PROJ_CFLAGS+=-Wall -Wcast-align
107107
#STARTUPFILE=startup_riscv_max78000.S
108108

109109
# Override the default optimization level using this variable
110-
MXC_OPTIMIZE_CFLAGS=-O0
110+
MXC_OPTIMIZE_CFLAGS=-O2
111111

112112
# Point this variable to a linker file to override the default file
113113
LINKERFILE=$(CMSIS_ROOT)/Device/Maxim/$(TARGET_UC)/Source/GCC/$(TARGET_LC)_riscv.ld

assets/embedded-riscv-ai87/Makefile.ARM renamed to assets/embedded-riscv-ai87/templateMakefile.ARM

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,7 @@ COMPILER=GCC
5050

5151
# Specify the board used
5252
ifeq "$(BOARD)" ""
53-
#BOARD=BCB
54-
BOARD=EvKit_V1
55-
#BOARD=Emulator
53+
BOARD=##__BOARD__##
5654
endif
5755

5856
# This is the path to the CMSIS root directory
@@ -96,7 +94,7 @@ PROJ_CFLAGS+=-Wall -Wcast-align
9694
#STARTUPFILE=startup_max78002.S
9795

9896
# Override the default optimization level using this variable
99-
MXC_OPTIMIZE_CFLAGS=-O1
97+
MXC_OPTIMIZE_CFLAGS=-O2
10098

10199
# Point this variable to a linker file to override the default file
102100
LINKERFILE=$(CMSIS_ROOT)/Device/Maxim/$(TARGET_UC)/Source/GCC/$(TARGET_LC)_arm.ld

assets/embedded-riscv-ai87/Makefile.RISCV renamed to assets/embedded-riscv-ai87/templateMakefile.RISCV

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,7 @@ COMPILER=GCC
5050

5151
# Specify the board used
5252
ifeq "$(BOARD)" ""
53-
#BOARD=BCB
54-
BOARD=EvKit_V1
55-
#BOARD=Emulator
53+
BOARD=##__BOARD__##
5654
endif
5755

5856
RISCV_CORE=RV32

izer/apbaccess.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -728,6 +728,7 @@ def verify_unload(
728728
max_count=max_count,
729729
write_gap=write_gap,
730730
final_layer=final_layer,
731+
embedded=self.embedded_code,
731732
)
732733

733734
def output_define( # pylint: disable=no-self-use
@@ -1245,9 +1246,19 @@ def unload(
12451246
Write the unload function. The layer to unload has the shape `input_shape`,
12461247
and the optional `output_offset` argument can shift the output.
12471248
"""
1248-
unload.unload(self.apifile or self.memfile, self.apb_base, processor_map, input_shape,
1249-
output_offset, out_expand, out_expand_thresh, output_width,
1250-
mlator=mlator, blocklevel=self.blocklevel)
1249+
unload.unload(
1250+
self.apifile or self.memfile,
1251+
self.apb_base,
1252+
processor_map,
1253+
input_shape,
1254+
output_offset,
1255+
out_expand,
1256+
out_expand_thresh,
1257+
output_width,
1258+
mlator=mlator,
1259+
blocklevel=self.blocklevel,
1260+
embedded=self.embedded_code,
1261+
)
12511262

12521263
def output_define(
12531264
self,

0 commit comments

Comments
 (0)