Skip to content
yyan7223 edited this page Jul 24, 2023 · 27 revisions

From .cpp to DFG

Finite Impulse Response kernel.cpp

LLVM Intermediate Representation

Data Flow Graph (DFG)

Mapping DFG to CGRA

Mapper.heuristicMap() is responsible for mapping DFG to CGRA, and its pseudocode is as follows:

II = Max(RecMII,ResMII)
While(1){
    constructMRRG(DFG, CGRA, II) // construct Modulo Routing Resource Graph (MRRG) according to DFG, CGRA, and current II 

    for each node in DFGNodes do
        for each row in CGRA.rows do
            for each column in CGRA.cols do
                Tile = CGRA[row][col] // locate the Tile
                pathWithTimeCost = caculateCost(node, Tile) // find the shortest path (time cost minimum) that mapping current DFGNode to current Tile
                pathList.append(timeCost)
            end for
        end for
        optimalPath = getPathWithMinCostAndConstraints(pathList) // find the path with minimum time cost and other constraints
        flag = schedule(optimalPath) // occupy the CGRALink among the optimalPath, allocate the register for the last CGRANode of the optimalPath
    end for

    if flag 
        break // mapping success
    else
        II += 1 // update II, repeat steps above
}

constructMRRG()

Mapper.constructMRRG() is responsible for constructing the Modulo Routing Resource Graph (MRRG) according to DFG, CGRA, and current II Assume that a 2x2 CGRA with II=4, MRRG can be interpreted as the duplicated representation of Tile and Link resources in the CGRA along the time axis, so that the mapper can obtain te occupy status of each Tile and Link at each time cycle, the conduct the mapping process accordingly.

  • The Tile resource includes Tile0, Tile1, Tile2, Tile3.
  • The Link resource, or, the routing fabric, includes the bi-directional interconnect between Tile0 and Tile1, Tile0 and Tile2, Tile3 and Tile1, Tile3 and Tile2.

calculateCost()

Assume that the DFGNode add has been mapped to MRRG Tile1 at Cycle0, now the mapper is trying to determine the placement of its successor DFGNode cmp, which is each node in the pesudo code above

As CGRA2x2 only has 4 Tile, so there are generally 4 placement options for DFGNode cmp, including Tile0, Tile1, Tile2, Tile3, whose exact cycle in MRRG is confirmed by dijkstra_search() function:

(a) MRRG Tile1 at Cycle0: use the same Tile as its predecessor, the result of DFGNode 'add' is directly registered in Tile 1, no routing fabric is required, with timeCost = 0

(b) MRRG Tile0 at Cycle1: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0 to MRRG Tile0 at Cycle1, with timeCost = 1

(c) MRRG Tile3 at Cycle1: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0 to MRRG Tile3 at Cycle1, with timeCost = 1

(d) MRRG Tile2 at Cycle2: the result of DFGNode add is transfered from MRRG Tile1 at Cycle0, through MRRG Tile0/Tile3 at Cycle1, to MRRG Tile2 at Cycle2, with timeCost = 2 (there is no routing fabric between Tile1 and Tile2, so an addition Tile0/Tile3 must be adopted to bridge the result of DFGNode add)

More details about the functions called internally:

constructMRRG()

caculateCost()

getPathWithMinCostAndConstraints()

schedule()

CGRA Architecture & Timing

The details of the CGRA architecture is firstly introduced, followed by the timing analysis of a concrete operation 'multiplication'

CGRA Architecture

CGRA Timing

Clone this wiki locally