Skip to content

Commit 2ad7ad6

Browse files
author
Robert Muchsel
authored
Support flattening > 64 channels; detect overlapping streaming buffers; additional messages (#105)
* Implement device limit for pooling without convolution; no pooling when flattening * Clarify messages, print error when ‘assets’ is missing from current directory * flatten: Support more than 64 channels * Detect overlap in streaming buffers * AI87: Updated features and RTL sims
1 parent d9189ff commit 2ad7ad6

File tree

61 files changed

+1107
-105
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1107
-105
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# MAX78000 Model Training and Synthesis
22

3-
_February 1, 2021_
3+
_February 6, 2021_
44

55
The Maxim Integrated AI project is comprised of four repositories:
66

@@ -696,6 +696,8 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
696696

697697
* Pooling does not support padding.
698698

699+
* Pooling more than 64 channels requires use of a “fused” convolution in the same layer, unless the pooled dimensions are 1×1.
700+
699701
* Pooling strides can be 1 through 16. For 2D pooling, the stride is the same for both dimensions.
700702

701703
* For 2D pooling, supported pooling kernel sizes are 1×1 through 16×16, including non-square kernels. 1D pooling supports kernel sizes from 1 through 16. *Note: Pooling kernel size values do not have to be the same as the pooling stride.*
@@ -727,12 +729,12 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
727729
* There are 16 instances of 32 KiB data memory. When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 181×181 pixels per channel in the CHW format. However, when using more than one input channel, the HWC format may be preferred, and all layer output are in HWC format as well. In those cases, it is required that four channels fit into a single memory instance -- or 91×90 pixels per channel.
728730
Note that the first layer commonly creates a wide expansion (i.e., large number of output channels) that needs to fit into data memory, so the input size limit is mostly theoretical.
729731

730-
* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`) (implemented using 1×1 convolutions on 1×1 data):
732+
* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`), which are implemented using 1×1 convolutions on 1×1 data:
731733
* The maximum number of input neurons is 1024, and the maximum number of output neurons is 1024 (16 each per processor used).
732734

733735
* `Flatten` functionality is available to convert 2D input data for use by fully connected layers, see [Fully Connected Layers](#Fully Connected \(Linear\) Layers).
734736

735-
* When “flattening” two-dimensional data, the input dimensions (C×H×W) must satisfy H×W ≤ 256 and C ≤ 64. Pooling cannot be used at the same time as flattening.
737+
* When “flattening” two-dimensional data, the input dimensions (C×H×W) must satisfy H×W ≤ 16384. Pooling cannot be used at the same time as flattening.
736738

737739
* Element-wise operators support from 2 up to 16 inputs.
738740

@@ -748,7 +750,7 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
748750

749751
m×n fully connected layers can be realized in hardware by “flattening” 2D input data of dimensions C×H×W into m=C×H×W channels of 1×1 input data. The hardware will produce n channels of 1×1 output data. When chaining multiple fully connected layers, the flattening step is omitted. The following picture shows 2D data, the equivalent flattened 1D data, and the output.
750752

751-
For MAX78000/MAX78002, the product H×W must not exceed 256, and C must not exceed 64.
753+
For MAX78000/MAX78002, the product H×W must not exceed 16384.
752754

753755
![MLP](docs/MLP.png)
754756

README.pdf

1.55 KB
Binary file not shown.

izer/commandline.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,8 @@ def get_parser():
119119
group.add_argument('--fast-fifo-quad', action='store_true', default=False,
120120
help="use fast FIFO in quad fanout mode (implies --fast-fifo; "
121121
"default: false)")
122+
group.add_argument('--fifo-go', action='store_true', default=False,
123+
help="start processing before first FIFO push (default: false)")
122124
group.add_argument('--slow-load', type=int, metavar='N', default=0,
123125
help="slow down FIFO loads (default: 0)")
124126

@@ -295,6 +297,8 @@ def get_parser():
295297
help="initialize TRAM to 0 (default: false)")
296298
group.add_argument('--zero-sram', action='store_true', default=False,
297299
help="zero memories (default: false)")
300+
group.add_argument('--pretend-zero-sram', action='store_true', default=False,
301+
help="simulate --zero-sram, but block BIST (default: false)")
298302
group.add_argument('--zero-unused', action='store_true', default=False,
299303
help="zero unused registers (default: do not touch)")
300304
group.add_argument('--apb-base', type=lambda x: int(x, 0), metavar='N',

izer/izer.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -383,8 +383,8 @@ def main():
383383

384384
pooled_dim[ll] = pooled_size
385385
if any(dim == 0 for dim in pooled_dim[ll]):
386-
eprint(f'Pooling in layer {ll} results in a zero data dimension '
387-
f'(input {input_dim[ll]}, pooled {pooled_dim[ll]}).')
386+
eprint(f'Pooling or zero-padding in layer {ll} results in a zero data dimension '
387+
f'(input {input_dim[ll]}, result {pooled_dim[ll]}).')
388388

389389
if operator[ll] != op.CONV1D:
390390
if stride[ll][0] != stride[ll][1]:
@@ -591,6 +591,8 @@ def main():
591591
bias_group_map=bias_group_map,
592592
pool_dilation=pool_dilation,
593593
input_pix_clk=args.input_pix_clk,
594+
fifo_go=args.fifo_go,
595+
pretend_zero_sram=args.pretend_zero_sram,
594596
)
595597
if not args.embedded_code and args.autogen.lower() != 'none':
596598
rtlsim.append_regression(

izer/kernels.py

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -120,13 +120,22 @@ def load( # pylint: disable=too-many-branches,too-many-statements
120120

121121
if flatten[ll]:
122122
kernel_reshaped = kernel[ll].reshape(
123-
output_chan[ll] * input_chan[ll],
123+
output_chan[ll],
124+
in_expand[ll],
125+
-1,
126+
).swapaxes(1, 2).reshape(
127+
output_chan[ll] * in_expand_thresh[ll],
124128
-1,
125129
kernel_size[ll][0],
126130
kernel_size[ll][1],
127131
)
132+
133+
in_exp = 1
134+
in_chan = in_expand_thresh[ll]
128135
else:
129136
kernel_reshaped = kernel[ll]
137+
in_exp = in_expand[ll]
138+
in_chan = input_chan[ll]
130139

131140
if quantization[ll] == -1:
132141
kernel_reshaped = kernel_reshaped.copy().clip(-1, 0)
@@ -186,13 +195,13 @@ def load( # pylint: disable=too-many-branches,too-many-statements
186195
# equal to output channels.
187196
if conv_groups[ll] == 1:
188197
kc = (1 + fls(next_layer_map) - first_output_proc) \
189-
* out_expand[ll] * in_expand[ll]
190-
kern_ochan[ll] = kern_count[ll] = kc + start_col * out_expand[ll] * in_expand[ll]
198+
* out_expand[ll] * in_exp
199+
kern_ochan[ll] = kern_count[ll] = kc + start_col * out_expand[ll] * in_exp
191200
else:
192-
kc = in_expand[ll]
193-
kern_count[ll] = kc + start_col * in_expand[ll]
201+
kc = in_exp
202+
kern_count[ll] = kc + start_col * in_exp
194203
kern_ochan[ll] = (1 + fls(next_layer_map) - first_output_proc) \
195-
* in_expand[ll] + start_col * in_expand[ll]
204+
* in_exp + start_col * in_exp
196205

197206
if not legacy_kernels and flatten[ll]:
198207
kc *= kernel_reshaped.shape[1]
@@ -266,7 +275,7 @@ def add_kernel_data(ll, p, col_target, b):
266275
continue
267276
# Skip start_col processors. Each takes up ksize bytes, or ksize // 9 full
268277
# kernel words. There are col_bytes leftover bytes.
269-
col_target, col_bytes = divmod(start_col * ksize * in_expand[ll], 9)
278+
col_target, col_bytes = divmod(start_col * ksize * in_exp, 9)
270279
# Pad out the leftovers
271280
for _ in range(col_bytes // qfactor): # FIXME for quantization
272281
col_target = add_kernel_data(ll, p, col_target, 0)
@@ -292,31 +301,29 @@ def add_kernel_data(ll, p, col_target, b):
292301
this_mask = this_map & proc_mask
293302
this_map >>= qfactor
294303

295-
in_ch = input_chan[ll]
304+
in_ch = in_chan
296305
if flatten[ll]:
297306
in_ch *= qfactor
298307
src_offs = ch + m * in_ch
299308

300-
for ie in range(in_expand[ll]):
309+
for ie in range(in_exp):
301310
mask = this_mask
302311

303312
n = 0
304313
if ie * in_expand_thresh[ll] + ch < in_ch \
305314
and src_offs < len(kernel_reshaped):
306315
if not flatten[ll]:
307-
k = np.zeros_like(kernel_reshaped[src_offs].flatten())
316+
k = np.zeros_like(kernel_reshaped[src_offs].reshape(-1))
308317
else:
309318
k = np.empty((0), dtype=np.int64)
310319
for i in range(qfactor):
311320
if m < output_chan[ll]:
312321
# Cycle through phases
313322
idx = n + ie * qfactor
314-
koffs = src_offs + (idx % in_expand[ll]) \
315-
* in_expand_thresh[ll] \
316-
+ (idx // in_expand[ll]) \
317-
* input_chan[ll]
323+
koffs = src_offs + (idx % in_exp) * in_expand_thresh[ll] \
324+
+ (idx // in_exp) * in_chan
318325
if koffs < len(kernel_reshaped):
319-
this_kern = kernel_reshaped[koffs].flatten() \
326+
this_kern = kernel_reshaped[koffs].reshape(-1) \
320327
& (2**abs(quantization[ll])-1)
321328
if not flatten[ll]:
322329
k |= this_kern << (i * abs(quantization[ll]))

0 commit comments

Comments
 (0)