-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Hi all,
I am currently working on one application which requires a specific dataflow design in order to save instruction size (insts.bin, reducing the runtime_sequence more specifically), because data size of a,b,c is really huge. The provided dataflow design is as shown below:

The details about this dataflow are:
-
ST00 simultaneously launches 3 npu_dma to send out a,b,c to MT01 via channel 0 and channel 1, where a and b have to share the ST channel 0 and use different packet id 0 (to channel 0 on MT01) and packet id 1 (to channel 4 on MT01).
-
We have to merge a,b from input channel 0 and channel 1 of MT into output channel 0 of MT, in which the merge has to be in an interleave fashion, e.g., (a0,b0), (a1,b1), (a2,b2) and etc. This is because CT02 only has 2 input channels.
One approach that I tried is shown below, but this could cause the dataflow to be blocked, causing runtime error on the host.

I tried a bunch of methods, but all of them cannot ensure that the a, b data CT02 receives follow the expected order (a0,b0, a1, b1, a2, b2, ...). The a,b data order on input channel 0 of CT02 is kind of random, overriding each other.
I am kindly asking if anyone has encountered similar issues here. Any suggestions will be highly appreciated.