Skip to content

Problem Implementing Broadcast with skip objectfifo for GRU network #2586

@iuwizeyimana

Description

@iuwizeyimana

I am currently implementing a GRU network on the NPU using the IRON API.

The hardware mapping used in this setup is shown in the picture here.

For runtime data movement, I am following the pattern from the single-core matrix-multiplication example, which serves as the baseline for my matrix multiplication compute tiles (A–F in the picture). No memory transformations are applied to the objectfifos transferring data between compute tiles.

The issue arises with compute tile J. As shown in the diagram, tile J participates in a broadcast with skip connection pattern as described here. The tile consumes vector H from the mem tile a few cycles after other consumers of H, on which J depends for its second input (Z).

A FIFO of depth 2 between the mem tile and compute tiles works correctly in a non-tiled configuration (where N=n, M=m, K=k for inputs of size M×K and weights of size K×N), but fails in the tiled setup. Adjusting the FIFO depth such that each tile of the objectfifo that transfers H from the mem tile to the compute tiles has its own FIFO (as suggested in the broadcast-with-skip example) does not resolve the problem.

Interestingly, the issue persists as long as tile J is included as a consumer of the objectfifo even if it does not actually consume the vector in its core computation.

Any help with resolving this issue would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions