You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SEGGER Ozone J-Trace Code Profile identified iterations over daint value
as hot path. The iterations show at the very top of code profile because
full iteration happens whenever there is any activity on endpoint.
Optimize daint handling loops so only set bits are iterated over. While
this optimization depends on find_lsb_set() efficiency, it seems to be
worth it solely on the basis that quite often only few bits are set.
After a bit deeper analysis, I was suprised that on ARM Cortex-M33 the
find_lsb_set() approach is faster than naive iteration even if all bits
are set (which is extreme case because USB applications are unlikely to
use all 16 IN and 16 OUT endpoints simultaneously). This is due to fact
that there is only one conditional jump CBNZ and find_lsb_set() - 1
translates to RBIT + CLZ and then clearing the bit uses LSL.W + BIC.W.
Whereas the naive itation uses ADDS + CMP + BNE for the loop handling
and also has LSR.W + LSLS + BPL (+ ADD.W instruction on each iteration
to add 16 for OUT endpoints) for the continue check. Therefore the
optimized code on ARM Cortex-M33 is never worse than naive iteration.
Signed-off-by: Tomasz Moń <tomasz.mon@nordicsemi.no>
0 commit comments