Skip to content

Further optimize intermediates_to_table_indices #1457

@andyleiserson

Description

@andyleiserson

intermediates_to_table_indices works as follows:

  • It calls bits_to_table_indices, which takes three u128s each containing the value of one of three intermediates for 128 multiplications, and returns four u128s containing a table index in each nibble.
  • It then reorders those nibbles into bytes as its output. (Originally, the table lookup was done here, but additional optimization moved the table lookup elsewhere.)

It appears that bits_to_table_indices compiles to <200 instructions (fully unrolled with no loops or branches), while the rearranging of nibbles compiles to >1000 instructions (again, fully unrolled with no loops or branches). Implementing a single transpose-like operation covering both steps would probably be more efficient.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceThis affects protocol performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions