Skip to content

[rom_ctrl,rtl] Add a flop between rom_ctrl and kmac #27658

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rswarbrick
Copy link
Contributor

This injects a clock edge between data coming out of the ROM and data going into KMAC to be hashed. Inserting the delay is pretty trivial: we just use a prim_fifo_sync.

There is an area vs. efficiency question. Because there is no pass-through, a prim_fifo_sync with depth 1 will have half throughput, because it has to alternate between taking an item and passing it on. Although, there's the bandwidth to do both on the same cycle, breaking the combinatorial path means that we can't pass the KMAC block's "ready" signal all the way back to the ROM.

This works perfectly well, but takes twice as long as you might hope.

A lazy workaround is to use Depth = 2. This is a little silly, because it uses twice as much area as necessary. But it's very easy from a coding perspective!

@rswarbrick rswarbrick marked this pull request as ready for review July 18, 2025 13:38
@rswarbrick rswarbrick force-pushed the rom-ctrl-kmac-flop branch from 129145e to 86d892c Compare July 18, 2025 14:06
Copy link
Contributor

@vogelpi vogelpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rswarbrick , this looks great! Let's wait for @davidschrammel 's feedback.

I think since the flop stage is optional, it's fine to use 2 stages. In some cases ROMs are very big and one may want to have the initial boot phase to run fast.

This injects a clock edge between data coming out of the ROM and data
going into KMAC to be hashed. Inserting the delay is pretty trivial:
we just use a prim_fifo_sync.

There *is* an area vs. efficiency question. Because there is no
pass-through, a prim_fifo_sync with depth 1 will have half throughput,
because it has to alternate between taking an item and passing it on.
Although, there's the bandwidth to do both on the same cycle, breaking
the combinatorial path means that we can't pass the KMAC block's
"ready" signal all the way back to the ROM.

This works perfectly well, but takes twice as long as you might hope.

A lazy workaround is to use Depth = 2. This is a little silly, because
it uses twice as much area as necessary. But it's very easy from a
coding perspective!

Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
Signed-off-by: Rupert Swarbrick <rswarbrick@lowrisc.org>
@rswarbrick rswarbrick force-pushed the rom-ctrl-kmac-flop branch from 86d892c to a537191 Compare July 18, 2025 14:34
sent to KMAC. This may break long paths in a target chip, at the cost of
adding chip area.
'''
local: "false",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to make it local: "true" so that it becomes an localparam and the top-level HJSON is the single source of truth.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants