You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generalize the grayscale image computation optimization
When we are decoding a grayscale image and its width is a multiple of the block size,
then we can directly return the internal decoding buffer without computation nor data copying.
This gives a ~4% performance improvement on my machine when decoding a 512x512 grayscale image.
Before:
time: [896.42 us 900.54 us 904.92 us]
After:
time: [882.10 us 884.72 us 887.31 us]
When decoding a grayscale image with a width multiple of the block size and a height that is not,
the performance gains are even better compared to the old version that optimized only the case where
the whole image size was a multiple of the block size.
0 commit comments