analogdevicesinc
diff --git a/‎README.md
Lines changed: 6 additions & 4 deletions b/‎README.md
Lines changed: 6 additions & 4 deletions
diff --git a/‎README.pdf
1.55 KB b/‎README.pdf
1.55 KB
diff --git a/‎izer/commandline.py
Lines changed: 4 additions & 0 deletions b/‎izer/commandline.py
Lines changed: 4 additions & 0 deletions
diff --git a/‎izer/izer.py
Lines changed: 4 additions & 2 deletions b/‎izer/izer.py
Lines changed: 4 additions & 2 deletions
diff --git a/‎izer/kernels.py
Lines changed: 22 additions & 15 deletions b/‎izer/kernels.py
Lines changed: 22 additions & 15 deletions
@@ -1,6 +1,6 @@
 # MAX78000 Model Training and Synthesis
 
-_February 1, 2021_
+_February 6, 2021_
 
 The Maxim Integrated AI project is comprised of four repositories:
 
@@ -696,6 +696,8 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
 
   * Pooling does not support padding.
 
+  * Pooling more than 64 channels requires use of a “fused” convolution in the same layer, unless the pooled dimensions are 1×1.
+  
   * Pooling strides can be 1 through 16. For 2D pooling, the stride is the same for both dimensions.
 
   * For 2D pooling, supported pooling kernel sizes are 1×1 through 16×16, including non-square kernels. 1D pooling supports kernel sizes from 1 through 16. *Note: Pooling kernel size values do not have to be the same as the pooling stride.*
@@ -727,12 +729,12 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
 * There are 16 instances of 32 KiB data memory. When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 181×181 pixels per channel in the CHW format. However, when using more than one input channel, the HWC format may be preferred, and all layer output are in HWC format as well. In those cases, it is required that four channels fit into a single memory instance -- or 91×90 pixels per channel.
   Note that the first layer commonly creates a wide expansion (i.e., large number of output channels) that needs to fit into data memory, so the input size limit is mostly theoretical.
 
-* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`) (implemented using 1×1 convolutions on 1×1 data):
+* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`), which are implemented using 1×1 convolutions on 1×1 data:
   * The maximum number of input neurons is 1024, and the maximum number of output neurons is 1024 (16 each per processor used).
 
   * `Flatten` functionality is available to convert 2D input data for use by fully connected layers, see [Fully Connected Layers](#Fully Connected \(Linear\) Layers).
 
-  * When “flattening” two-dimensional data, the input dimensions (C×H×W) must satisfy H×W ≤ 256 and C ≤ 64. Pooling cannot be used at the same time as flattening.
+  * When “flattening” two-dimensional data, the input dimensions (C×H×W) must satisfy C×H×W ≤ 16384. Pooling cannot be used at the same time as flattening.
 
   * Element-wise operators support from 2 up to 16 inputs.
 
@@ -748,7 +750,7 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
 
 m×n fully connected layers can be realized in hardware by “flattening” 2D input data of dimensions C×H×W into m=C×H×W channels of 1×1 input data. The hardware will produce n channels of 1×1 output data. When chaining multiple fully connected layers, the flattening step is omitted. The following picture shows 2D data, the equivalent flattened 1D data, and the output.
 
-For MAX78000/MAX78002, the product H×W must not exceed 256, and C must not exceed 64.
+For MAX78000/MAX78002, the product C×H×W must not exceed 16384.
 
 ![MLP](docs/MLP.png)
 
 
@@ -119,6 +119,8 @@ def get_parser():
     group.add_argument('--fast-fifo-quad', action='store_true', default=False,
                        help="use fast FIFO in quad fanout mode (implies --fast-fifo; "
                             "default: false)")
+    group.add_argument('--fifo-go', action='store_true', default=False,
+                       help="start processing before first FIFO push (default: false)")
     group.add_argument('--slow-load', type=int, metavar='N', default=0,
                        help="slow down FIFO loads (default: 0)")
 
@@ -295,6 +297,8 @@ def get_parser():
                        help="initialize TRAM to 0 (default: false)")
     group.add_argument('--zero-sram', action='store_true', default=False,
                        help="zero memories (default: false)")
+    group.add_argument('--pretend-zero-sram', action='store_true', default=False,
+                       help="simulate --zero-sram, but block BIST (default: false)")
     group.add_argument('--zero-unused', action='store_true', default=False,
                        help="zero unused registers (default: do not touch)")
     group.add_argument('--apb-base', type=lambda x: int(x, 0), metavar='N',
 
@@ -383,8 +383,8 @@ def main():
 
         pooled_dim[ll] = pooled_size
         if any(dim == 0 for dim in pooled_dim[ll]):
-            eprint(f'Pooling in layer {ll} results in a zero data dimension '
-                   f'(input {input_dim[ll]}, pooled {pooled_dim[ll]}).')
+            eprint(f'Pooling or zero-padding in layer {ll} results in a zero data dimension '
+                   f'(input {input_dim[ll]}, result {pooled_dim[ll]}).')
 
         if operator[ll] != op.CONV1D:
             if stride[ll][0] != stride[ll][1]:
@@ -591,6 +591,8 @@ def main():
             bias_group_map=bias_group_map,
             pool_dilation=pool_dilation,
             input_pix_clk=args.input_pix_clk,
+            fifo_go=args.fifo_go,
+            pretend_zero_sram=args.pretend_zero_sram,
         )
         if not args.embedded_code and args.autogen.lower() != 'none':
             rtlsim.append_regression(
 
@@ -120,13 +120,22 @@ def load(  # pylint: disable=too-many-branches,too-many-statements
 
         if flatten[ll]:
             kernel_reshaped = kernel[ll].reshape(
-                output_chan[ll] * input_chan[ll],
+                output_chan[ll],
+                in_expand[ll],
+                -1,
+            ).swapaxes(1, 2).reshape(
+                output_chan[ll] * in_expand_thresh[ll],
                 -1,
                 kernel_size[ll][0],
                 kernel_size[ll][1],
             )
+
+            in_exp = 1
+            in_chan = in_expand_thresh[ll]
         else:
             kernel_reshaped = kernel[ll]
+            in_exp = in_expand[ll]
+            in_chan = input_chan[ll]
 
         if quantization[ll] == -1:
             kernel_reshaped = kernel_reshaped.copy().clip(-1, 0)
@@ -186,13 +195,13 @@ def load(  # pylint: disable=too-many-branches,too-many-statements
         # equal to output channels.
         if conv_groups[ll] == 1:
             kc = (1 + fls(next_layer_map) - first_output_proc) \
-                * out_expand[ll] * in_expand[ll]
-            kern_ochan[ll] = kern_count[ll] = kc + start_col * out_expand[ll] * in_expand[ll]
+                * out_expand[ll] * in_exp
+            kern_ochan[ll] = kern_count[ll] = kc + start_col * out_expand[ll] * in_exp
         else:
-            kc = in_expand[ll]
-            kern_count[ll] = kc + start_col * in_expand[ll]
+            kc = in_exp
+            kern_count[ll] = kc + start_col * in_exp
             kern_ochan[ll] = (1 + fls(next_layer_map) - first_output_proc) \
-                * in_expand[ll] + start_col * in_expand[ll]
+                * in_exp + start_col * in_exp
 
         if not legacy_kernels and flatten[ll]:
             kc *= kernel_reshaped.shape[1]
@@ -266,7 +275,7 @@ def add_kernel_data(ll, p, col_target, b):
                 continue
             # Skip start_col processors. Each takes up ksize bytes, or ksize // 9 full
             # kernel words. There are col_bytes leftover bytes.
-            col_target, col_bytes = divmod(start_col * ksize * in_expand[ll], 9)
+            col_target, col_bytes = divmod(start_col * ksize * in_exp, 9)
             # Pad out the leftovers
             for _ in range(col_bytes // qfactor):  # FIXME for quantization
                 col_target = add_kernel_data(ll, p, col_target, 0)
@@ -292,31 +301,29 @@ def add_kernel_data(ll, p, col_target, b):
                     this_mask = this_map & proc_mask
                     this_map >>= qfactor
 
-                    in_ch = input_chan[ll]
+                    in_ch = in_chan
                     if flatten[ll]:
                         in_ch *= qfactor
                     src_offs = ch + m * in_ch
 
-                    for ie in range(in_expand[ll]):
+                    for ie in range(in_exp):
                         mask = this_mask
 
                         n = 0
                         if ie * in_expand_thresh[ll] + ch < in_ch \
                            and src_offs < len(kernel_reshaped):
                             if not flatten[ll]:
-                                k = np.zeros_like(kernel_reshaped[src_offs].flatten())
+                                k = np.zeros_like(kernel_reshaped[src_offs].reshape(-1))
                             else:
                                 k = np.empty((0), dtype=np.int64)
                             for i in range(qfactor):
                                 if m < output_chan[ll]:
                                     # Cycle through phases
                                     idx = n + ie * qfactor
-                                    koffs = src_offs + (idx % in_expand[ll]) \
-                                        * in_expand_thresh[ll] \
-                                        + (idx // in_expand[ll]) \
-                                        * input_chan[ll]
+                                    koffs = src_offs + (idx % in_exp) * in_expand_thresh[ll] \
+                                        + (idx // in_exp) * in_chan
                                     if koffs < len(kernel_reshaped):
-                                        this_kern = kernel_reshaped[koffs].flatten() \
+                                        this_kern = kernel_reshaped[koffs].reshape(-1) \
                                             & (2**abs(quantization[ll])-1)
                                         if not flatten[ll]:
                                             k |= this_kern << (i * abs(quantization[ll]))