-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Labels
performanceThis affects protocol performanceThis affects protocol performance
Description
intermediates_to_table_indices
works as follows:
- It calls
bits_to_table_indices
, which takes threeu128
s each containing the value of one of three intermediates for 128 multiplications, and returns fouru128
s containing a table index in each nibble. - It then reorders those nibbles into bytes as its output. (Originally, the table lookup was done here, but additional optimization moved the table lookup elsewhere.)
It appears that bits_to_table_indices
compiles to <200 instructions (fully unrolled with no loops or branches), while the rearranging of nibbles compiles to >1000 instructions (again, fully unrolled with no loops or branches). Implementing a single transpose-like operation covering both steps would probably be more efficient.
Metadata
Metadata
Assignees
Labels
performanceThis affects protocol performanceThis affects protocol performance