Skip to content

Physical IR #456

@UnsignedByte

Description

@UnsignedByte

As part of the HLS Filament #454 project, we want a nice way to implement some of the SDC scheduling logic using Filament. Paramount to this is an ability to represent the dependency graph of nodes. Say we have a simple linear graph:

+ -> + -> *

The information we need in order to perform scheduling is as follows:

  1. Some estimate of resource usage per component.
  2. Some estimate of intra-cycle latency per component.

Filament already has capabilities to measure inter-cycle latency; that is the whole timeline type system! It seems natural therefore to extend this somehow to allow intra-cycle measurement as well.

In combinational logic, the necessity of pipeline registers arises when the latency of sequential combinational blocks adds up to over a single clock period.

As an example, say that adders take 0.3 of a cycle and multipliers 0.6.

We can imagine a 2-cycle pipeline of +, +, * in two ways:

Cycle 1 Cycle 2
+, + *
+, +, *

We could represent this with Filament as follows:

comp Add[W]<'G: 1>(
    in0: ['G, 'G+1] W,
    in1: ['G, 'G+1] W
) -> (
    out: ['G+0.3, 'G+1.3] W
) {...}

comp Mult[W]<'G: 1>(
    in0: ['G, 'G+1] W,
    in1: ['G, 'G+1] W
) -> (
    out: ['G+0.6, 'G+1.6] W
) {...}

In this case, a pipeline like this:

add1: Add[32](in0, in1);
add2: Add[32](add1.out, in1);
mul: Mul[32](add2.out, in1);

would become

add1: Add[32]<'G>(in0, in1);
add2: Add[32]<'G+0.3>(add1.out, in1);
mul: Mul[32]<'G+0.6>(add2.out, in1);

in physical IR, which would then allow the compiler to naturally see that each invocation occurs as follows:

Inst Input timing Output Timing
add1 ['G, 'G+1] ['G+0.3, 'G+1.3]
add2 ['G+0.3, 'G+1.3] ['G+0.6, 'G+1.6]
mul ['G+0.6, 'G+1.6] ['G+1.2, 'G+2.2]

And thus we can see that the multiplier crosses the cycle boundary, and thus can be retimed using a register, which would extend the signal in this case from ['G+0.6, 'G+1.6] -> ['G+1, 'G+2] in an ASAP schedule, which would then allow the multiplier to run from ['G+1, 'G+2] -> 'G+1.6, 'G+2.6].

To do so, I propose a new IR with this simple change, occurring after monomorphization to eliminate parameter issues. Then, a set of IR passes could insert registers in order to make sure a program reaches timing.

This also naturally allows for possible extensions in the future, such as:

A main ambiguity at the time is the idea of ['G, 'G+1] for combinational units - This assumes that a valid is always held high for "exactly" one cycle, which is not really true, but at the same time it seems very infeasible to change everything to some real latency parameter ['G, 'G+L]. While this seems to model the information quite well currently, I think it should go through some further discussion before finalizing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C: InternalsComponent: Compiler internalsC: languageComponent: the Filament languageS: needs triageStatus: Needs some more thinking before being actionable

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions