You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
simple-stream: avoid memcpy calls in fragmented streams for constant sizes
In 3a24d6d ("simple_stream: add [[gnu::always_inline]]"), we
sprinkled [[gnu::always_inline]] to encourage constant propagation
of the size parameter, in non-fragmented streams. When the size is a
constant (which it is, when reading serialized integrals), the memcpy()
call can be optimized into a single instruction to read memory.
Here, we do the same for fragmented streams. Since the code is prepared
for the integral to span two (or more) fragments, it will issue tiny memcpy
calls which expend a large amount of instructions to figure out the right
path, then copy memory using a single instruction and return.
To fix this, we do the following:
1. split for_each_fragment() into a fast-path and slow-path.
2. select the fast path when the size is a constant and it happens
to fit into the first fragment (which is very likely, as constant
sizes are usually for the various integral types).
3. encourage inlining (without which we don't get constant propagation)
with [[gnu::always_inline]].
The fast-path is guarded with __builtin_constant_p. For non-constant
data, we'll call memcpy() anyway, so we don't get much from splitting the
paths.
A demonstration of the optimization is available in [1].
[1] https://godbolt.org/z/rWdMa7bfK
0 commit comments