-
Notifications
You must be signed in to change notification settings - Fork 123
Improvements to the speed of latin1_to_string. #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e3d7773
to
f9993c2
Compare
Out of curiosity how does this compare to something like https://docs.rs/encoding_rs/latest/encoding_rs/mem/fn.convert_latin1_to_str.html? |
Good catch! It actually is faster in my tests. now down to 5.4 ms. |
895dba4
to
c774a5c
Compare
I worked a bit more on it and now have simd acceleration for sse2 and avx. With the current testcase we go down from 5.4 ms to 406 ns. Quite an improvement. |
c1e2bdb
to
661283a
Compare
Can you please wait to implement the SIMD implementation in another PR and have this PR just be about the non-SIMD optimizations? That way we can separate out the two topics and if a revert is needed maybe only one of them will be affected. |
3f9d706
to
50541e7
Compare
Ok probably a wise choice. I hope it is fine to still have the fast copy function. Because it is not public it should be inlined anyway. And the multiple testcases. |
6298a9e
to
ff2b4c4
Compare
/// Copies chars to the string | ||
unsafe fn fast_copy(chars: &[u8]) -> String { | ||
let mut v = Vec::with_capacity(chars.len() * 2); | ||
v.set_len(chars.len() * 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is undefined behavior. See Safety requirements of set_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some existing discussion over at encoding_rs hsivonen/encoding_rs#79
I also tested using the external api of encoding_rs::UTF_16BE.decode making sure not to duplicate the work that |
ff2b4c4
to
7cc0787
Compare
On the benchmark it improves the encoding from around 9 microsends to 6 microseconds. This PR also includes the benchmark and setup as criterion benchmark. Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
@Narfinger I pushed a commit which refactors the benchmark a bit, so that we run with different input sizes and can also test the slow path. I can confirm that also on my macbook |
The manual simd part still falls back to the encoding_rs::mem solution for basically everything that doesn't support sse2. I am not quite sure what you mean with fast and slow path here. The function is only called for codepoints that fit in the latin1 range, everything else is not valid behaviour. |
At the moment we have to use encoding_rs to do this encoding leading us to a bit of a cyclical testing. But tests should prepare for future refactoring, so I think this is fine. Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Ok I added more tests for checking latin1 completely. Currently this uses encoding_rs::mem because otherwise we cannot really convert it. This leads us to a bit of cyclical testing but I think this is fine so that anybody who in the future tries to improve this function can test against a sane baseline. In a "completely unrelated" news, my simd improvements do not work and produce wrong results. |
Improvements to the speed of latin1_to_string.
On the benchmark it improves the encoding from around 9 microsends to 6
microseconds.
This PR also includes the benchmark and setup as criterion benchmark.
Signed-off-by: Narfinger Narfinger@users.noreply.github.com