analogdevicesinc
diff --git a/‎README.md
Lines changed: 17 additions & 6 deletions b/‎README.md
Lines changed: 17 additions & 6 deletions
diff --git a/‎README.pdf
16 KB b/‎README.pdf
16 KB
diff --git a/‎izer/apbaccess.py
Lines changed: 18 additions & 3 deletions b/‎izer/apbaccess.py
Lines changed: 18 additions & 3 deletions
diff --git a/‎izer/commandline.py
Lines changed: 3 additions & 1 deletion b/‎izer/commandline.py
Lines changed: 3 additions & 1 deletion
diff --git a/‎izer/izer.py
Lines changed: 1 addition & 0 deletions b/‎izer/izer.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎izer/kernels.py
Lines changed: 32 additions & 8 deletions b/‎izer/kernels.py
Lines changed: 32 additions & 8 deletions
diff --git a/‎izer/max7800x.py
Lines changed: 5 additions & 1 deletion b/‎izer/max7800x.py
Lines changed: 5 additions & 1 deletion
@@ -1,6 +1,6 @@
 # MAX78000 Model Training and Synthesis
 
-_March 2, 2021_
+_March 15, 2021_
 
 The Maxim Integrated AI project is comprised of four repositories:
 
@@ -609,13 +609,17 @@ $$ w_0 * w_1 = 128/128 → saturation → 01111111 (= 127/128) $$
 
 #### HWC
 
-All internal data are stored in HWC format, 4 channels per 32-bit word. Assuming 3-color (or 3-channel) input, one byte will be unused. Example:
+All internal data are stored in HWC format, 4 channels per 32-bit word. Assuming 3-color (or 3-channel) input, one byte will be unused. The highest frequency in this data format is the channel, so the channels are interleaved.
+
+Example:
 
 ![0BGR 0BGR 0 BGR 0BGR...](docs/HWC.png)
 
 #### CHW
 
-The input layer can alternatively also use the CHW format (sequence of channels), for example:
+The input layer can alternatively also use the CHW format (a sequence of channels). The highest frequency in this data format is the width or X-axis (W), and the lowest frequency is the channel. Assuming an RGB input, all red pixels are followed by all green pixels, followed by all blue pixels.
+
+Example:
 
 ![RRRRRR...GGGGGG...BBBBBB...](docs/CHW.png)
 
@@ -780,6 +784,8 @@ The `ai84net.py` and `ai85net.py` files contain models that fit into AI84’s we
 
 To train the FP32 model for MNIST on MAX78000, run `scripts/train_mnist.sh` from the `ai8x-training` project. This script will place checkpoint files into the log directory. Training makes use of the Distiller framework, but the `train.py` software has been modified slightly to improve it and add some MAX78000/MAX78002 specifics.
 
+Since training can take hours or days, the training script does not overwrite any weights previously produced. Results are placed in sub-directories under `logs/` named with date and time when training began. The latest results are always soft-linked to by `latest-log_dir` and `latest_log_file`.
+
 ### Command Line Arguments
 
 The following table describes the most important command line arguments for `train.py`. Use `--help` for a complete list.
@@ -790,7 +796,7 @@ The following table describes the most important command line arguments for `tra
 | *Device selection*         |                                                              |                                 |
 | `--device`                 | Set device (default: AI84)                                   | `--device MAX78000`             |
 | *Model and dataset*        |                                                              |                                 |
-| `-a`, `--arch`             | Set model (collected from models folder)                     | `--model ai85net5`              |
+| `-a`, `--arch`, `--model` | Set model (collected from models folder)                     | `--model ai85net5`              |
 | `--dataset`                | Set dataset (collected from datasets folder)                 | `--dataset MNIST`               |
 | `--data`                   | Path to dataset (default: data)                              | `--data /data/ml`               |
 | *Training*                 |                                                              |                                 |
@@ -802,6 +808,7 @@ The following table describes the most important command line arguments for `tra
 | `--resume-from`            | Resume from previous checkpoint                              | `--resume-from chk.pth.tar`     |
 | `--qat-policy`             | Define QAT policy in YAML file (default: qat_policy.yaml). Use ‘’None” to disable QAT. | `--qat-policy qat_policy.yaml`  |
 | *Display and statistics*   |                                                              |                                 |
+| `--enable-tensorboard` | Enable logging to TensorBoard (default: disabled) | |
 | `--confusion`              | Display the confusion matrix                                 |                                 |
 | `--param-hist`             | Collect parameter statistics                                 |                                 |
 | `--pr-curves`              | Generate precision-recall curves                             |                                 |
@@ -941,7 +948,7 @@ Both TensorBoard and Manifold can be used for model comparison and feature attri
 
 #### TensorBoard
 
-TensorBoard is built into `train.py`. It provides a local web server that can be started before, during, or after training and it picks up all data that is written to the `logs/` directory. 
+TensorBoard is built into `train.py`. When enabled using `--enable-tensorboard`, it provides a local web server that can be started before, during, or after training and it picks up all data that is written to the `logs/` directory. 
 
 For classification models, TensorBoard supports the optional `--param-hist` and `--embedding` command line arguments. `--embedding` randomly selects up to 100 data points from the last batch of each verification epoch. These can be viewed in the “projector” tab in TensorBoard.
 
@@ -1169,6 +1176,7 @@ The following table describes the most important command line arguments for `ai8
 | `--prefix`               | Set test name prefix                                         | `--prefix mnist`                |
 | `--board-name`           | Set the target board (default: `EvKit_V1`)                   | `--board-name FTHR_RevA`        |
 | *Code generation*        |                                                              |                                 |
+| `--overwrite`            | Produce output even when the target directory exists (default: abort) |                                 |
 | `--compact-data`         | Use *memcpy* to load input data in order to save code space  |                                 |
 | `--compact-weights`      | Use *memcpy* to load weights in order to save code space     |                                 |
 | `--mexpress`             | Use faster kernel loading                                    |                                 |
@@ -1227,6 +1235,8 @@ The following table describes the most important command line arguments for `ai8
 
 ### YAML Network Description
 
+The [quick-start guide](https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/Guides/YAML%20Quickstart.md) provides a short overview of the purpose and structure of the YAML network description file.
+
 An example network description for the ai85net5 architecture and MNIST is shown below:
 
 ```yaml
@@ -1422,7 +1432,7 @@ Example:
 
 ##### `activate` (Optional)
 
-This key describes whether to activate the layer output (the default is to not activate). When specified, this key must be `ReLU`, `Abs` or `None` (the default).
+This key describes whether to activate the layer output (the default is to not activate). When specified, this key must be `ReLU`, `Abs` or `None` (the default). *Please note that there is always an implicit non-linearity when outputting 8-bit data since outputs are clamped to $[–1, +127/128]$ during training.*
 
 Note that the output values are clipped (saturated) to $[0, +127]$. Because of this, `ReLU` behaves more similar to PyTorch’s `nn.Hardtanh(min_value=0, max_value=127)` than to PyTorch’s `nn.ReLU()`.
 
@@ -2123,3 +2133,4 @@ https://github.com/MaximIntegratedAI/MaximAI_Documentation/blob/master/CONTRIBUT
 
 ---
 
+o
@@ -9,7 +9,7 @@
 """
 import os
 
-from . import kernels, toplevel
+from . import toplevel
 from . import tornadocnn as tc
 from . import unload
 from .eprint import eprint, wprint
@@ -456,7 +456,9 @@ def write_kern(
             k,
             size=9,
             verify_only=False,
-            calcx4=False,
+            calcx4=None,
+            kern_offs=None,
+            count=None,
     ):
         """
         Write single kernel `k` of length `size` for layer `ll`, processor `p` to index `idx` in
@@ -465,7 +467,20 @@ def write_kern(
         assert p < tc.dev.MAX_PROC
         assert idx < tc.dev.mask_width(p)
 
-        idx_x4 = idx if not calcx4 else kernels.calcx4_index(idx)
+        if calcx4[ll]:
+            start = kern_offs[ll]
+            mem, rem = divmod((idx - start), (count + 3) // 4)
+            start //= 4
+            if idx < tc.dev.MASK_WIDTH_SMALL:
+                assert 0 <= mem < 4
+                idx_x4 = mem * (tc.dev.MASK_WIDTH_SMALL // 4) + rem + start
+            else:
+                idx_x4 = idx - tc.dev.MASK_WIDTH_SMALL
+                idx_x4 = mem * ((tc.dev.MASK_WIDTH_LARGE - tc.dev.MASK_WIDTH_SMALL) // 4) + rem \
+                    + tc.dev.MASK_WIDTH_SMALL + start
+        else:
+            idx_x4 = idx
+
         addr = tc.dev.C_GROUP_OFFS * (p // tc.dev.P_NUMPRO) \
             + tc.dev.C_MRAM_BASE \
             + (p % tc.dev.P_NUMPRO) * tc.dev.MASK_OFFS * 16 + idx_x4 * 16
 
@@ -70,6 +70,8 @@ def get_parser():
 
     # Code generation
     group = parser.add_argument_group('Code generation')
+    group.add_argument('--overwrite', action='store_true', default=False,
+                       help="overwrite destination if it exists (default: abort)")
     group.add_argument('--compact-data', action='store_true', default=False,
                        help="use memcpy() to load input data in order to save code space")
     group.add_argument('--compact-weights', action='store_true', default=False,
@@ -253,7 +255,7 @@ def get_parser():
 
     # Streaming
     group = parser.add_argument_group('Streaming tweaks')
-    group.add_argument('--overlap-data', '--overwrite-ok', '--allow-overwrite',
+    group.add_argument('--overlap-data',
                        dest='overwrite_ok', action='store_true', default=False,
                        help="allow output to overwrite input (default: warn/stop)")
     group.add_argument('--override-start', type=lambda x: int(x, 0), metavar='N',
 
@@ -610,6 +610,7 @@ def main():
             snoop_sequence=snoop_sequence,
             simulated_sequence=simulated_sequence,
             debug_snoop=args.debug_snoop,
+            overwrite=args.overwrite,
         )
         if not args.embedded_code and args.autogen.lower() != 'none':
             rtlsim.append_regression(
 
@@ -119,6 +119,8 @@ def load(  # pylint: disable=too-many-branches,too-many-statements
             assert kern_offs[ll] == start_offs
             continue
 
+        qfactor = 8 // abs(quantization[ll])
+
         if flatten[ll]:
             kernel_reshaped = kernel[ll].reshape(
                 output_chan[ll],
@@ -135,15 +137,30 @@ def load(  # pylint: disable=too-many-branches,too-many-statements
             in_chan = in_expand_thresh[ll]
         elif calcx4[ll]:
             # FIXME for output channels % 4 != 0
+            assert output_chan[ll] % 4 == 0
             kernel_reshaped = kernel[ll].reshape(
                 output_chan[ll] // 4,
                 4,
-                in_expand[ll],
-                in_expand_thresh[ll],
+                -1,
+            ).transpose(1, 0, 2).reshape(
+                kernel[ll].shape
+            )
+
+            in_exp = in_expand[ll]
+            in_chan = input_chan[ll]
+        elif ll == 0 and quad and qfactor != 1:
+            # FIXME for output channels % (4 * qfactor) != 0
+            assert output_chan[ll] % (4 * qfactor) == 0
+            kernel_reshaped = kernel[ll].reshape(
+                output_chan[ll] // (4 * qfactor),
+                qfactor,
+                4,
+                input_chan[ll],
                 kernel_size[ll][0] * kernel_size[ll][1],
-            ).transpose(0, 2, 1, 3, -1).reshape(
+            ).transpose(0, 2, 1, 3, 4).reshape(
                 kernel[ll].shape
             )
+
             in_exp = in_expand[ll]
             in_chan = input_chan[ll]
         else:
@@ -186,7 +203,6 @@ def load(  # pylint: disable=too-many-branches,too-many-statements
             kern_offs[ll] *= 4
 
         ksize = kernel_size[ll][0] * kernel_size[ll][1]
-        qfactor = 8 // abs(quantization[ll])
         next_layer_map = output_processor_map[ll]
         first_output_proc = ffs(next_layer_map)
         start_col = first_output_proc % tc.dev.P_SHARED  # First target column out of 4 shared
@@ -411,8 +427,12 @@ def add_kernel_data(ll, p, col_target, b):
             for col in range(0, tc.dev.mask_width(p)):
                 ll = kernel_map[p][col]
                 if ll != _INVALID_VALUE:
-                    k = kernel_data[p][col]
-                    apb.write_kern(ll, p, col, k, verify_only=verify, calcx4=calcx4[ll])
+                    apb.write_kern(ll, p, col, kernel_data[p][col],
+                                   verify_only=verify, calcx4=calcx4,
+                                   kern_offs=kern_offs,
+                                   count=in_expand[ll] * output_chan[ll] * 9
+                                   * abs(quantization[ll])
+                                   // (kernel_size[ll][0] * kernel_size[ll][1] * 8))
         apb.function_footer()  # verify_weights()
 
     if not (embedded_code or mexpress):
@@ -424,10 +444,14 @@ def add_kernel_data(ll, p, col_target, b):
                 if ll != _INVALID_VALUE:
                     k = kernel_data[p][col]
                     if not zero_sram or np.any(k != 0):
-                        apb.write_kern(ll, p, col, k, calcx4=calcx4[ll])
+                        apb.write_kern(ll, p, col, k, calcx4=calcx4,
+                                       kern_offs=kern_offs,
+                                       count=in_expand[ll] * output_chan[ll] * 9
+                                       * abs(quantization[ll])
+                                       // (kernel_size[ll][0] * kernel_size[ll][1] * 8))
         apb.function_footer()  # load_weights()
 
-    if embedded_code or mexpress:
+    else:  # embedded_code or mexpress
         # Write kernels, combining layers and processors where possible to reduce the number
         # of constants and calls to memcpy.
         apb.output('// Kernels:\n', api)
 
@@ -168,6 +168,7 @@ def create_net(  # pylint: disable=too-many-arguments,too-many-locals,too-many-b
         snoop_sequence=None,
         simulated_sequence=None,
         debug_snoop=False,
+        overwrite=False,
 ):
     """
     Chain multiple CNN layers, create and save input and output
@@ -530,7 +531,10 @@ def create_net(  # pylint: disable=too-many-arguments,too-many-locals,too-many-b
         target_dir = os.path.join(base_directory, test_name)
         os.makedirs(target_dir, exist_ok=False)
     except OSError:
-        wprint(target_dir, 'exists')
+        if not overwrite:
+            eprint('The target folder', target_dir, 'exists. Use --overwrite to proceed.')
+        else:
+            wprint('--overwrite specified, writing to ', target_dir, ' even though it exists.')
 
     # Redirect stdout?
     if log:
Original file line number	Diff line number	Diff line change
`@@ -610,6 +610,7 @@ def main():`
`610`	`610`	`snoop_sequence=snoop_sequence,`
`611`	`611`	`simulated_sequence=simulated_sequence,`
`612`	`612`	`debug_snoop=args.debug_snoop,`
	`613`	`+ overwrite=args.overwrite,`
`613`	`614`	`)`
`614`	`615`	`if not args.embedded_code and args.autogen.lower() != 'none':`
`615`	`616`	`rtlsim.append_regression(`