-
Notifications
You must be signed in to change notification settings - Fork 2
Description
20 H/s on release is okay, only 5 times slower than reference. Squeezing out most likely 25 H/s or an unlikely 30 H/s would be amazing. I have not done huge amounts of testing, that comes later. But improvements that I can see right now I will list here:
Meaningful optimisations:
- API: RandomX batching API, where it hashes the scratchpad and generates the program at the same time.
- JIT: Re-enable the use of fused multiply-add instructions. Figure out why it isn't as deterministic as it seems, even after doing a runtime feature detect.
Possibly meaningful optimisations:
- JIT: Adjust how semifloat is dispatched based on
fprc, currently it is just a simple table implementation. - JIT: Precompute scratchpad addresses as much as possible in JIT code
- CRYPTO: Optimise Blake2b to use vector instructions. (small improvements as Blake2b only called in the hot path once to hash the register file, with a fixed 256 bytes)
- JIT: Improvements to scheduling of virtual machine instructions. It is possible to reorder instructions to place them close to eachother, without using any
local.*instructions. This will improve baseline JITs register allocation (RA), and may improve subpar JITs RA.
Holy grail optimisations:
- CRYPTO: Fast AES. It is unlikely improvements will be found utilising the host JS runtime,
crypto.subtledoesn't give us what we need. Possible improvements could be found in the software implementation. Probably more effective to lobby forv128.aes.*instructions in the WASM spec, no matter how long it takes.
Any improvement to WASM code that allows the host JIT to get quality code quicker is so important, as we incur a cost going from WASM to native code that RandomX never has to experience going straight to native code. Inspection of JS runtime's JITs is needed to actually understand what it is doing.
This document will track my progress with further optimisations. Feel free to add to it if needed or to voice criticisms and improvements.