Skip to content

Improvements to the speed of latin1_to_string. #587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Narfinger
Copy link

Improvements to the speed of latin1_to_string.
On the benchmark it improves the encoding from around 9 microsends to 6
microseconds.

This PR also includes the benchmark and setup as criterion benchmark.

Signed-off-by: Narfinger Narfinger@users.noreply.github.com

@Narfinger Narfinger force-pushed the latin1-improvements branch from e3d7773 to f9993c2 Compare June 27, 2025 09:42
@mrobinson
Copy link
Member

Out of curiosity how does this compare to something like https://docs.rs/encoding_rs/latest/encoding_rs/mem/fn.convert_latin1_to_str.html?

@Narfinger
Copy link
Author

Out of curiosity how does this compare to something like https://docs.rs/encoding_rs/latest/encoding_rs/mem/fn.convert_latin1_to_str.html?

Good catch! It actually is faster in my tests. now down to 5.4 ms.

@Narfinger Narfinger force-pushed the latin1-improvements branch 2 times, most recently from 895dba4 to c774a5c Compare June 30, 2025 15:10
@Narfinger
Copy link
Author

Narfinger commented Jun 30, 2025

I worked a bit more on it and now have simd acceleration for sse2 and avx. With the current testcase we go down from 5.4 ms to 406 ns. Quite an improvement.
However, when simd is stable in rust this should be switched to the encoding_rs path and use their flag.
I also added more testcases which should be on the boundary of the simd values.

@Narfinger Narfinger force-pushed the latin1-improvements branch 2 times, most recently from c1e2bdb to 661283a Compare July 1, 2025 08:12
@mrobinson
Copy link
Member

Can you please wait to implement the SIMD implementation in another PR and have this PR just be about the non-SIMD optimizations? That way we can separate out the two topics and if a revert is needed maybe only one of them will be affected.

@Narfinger Narfinger force-pushed the latin1-improvements branch 2 times, most recently from 3f9d706 to 50541e7 Compare July 1, 2025 08:35
@Narfinger
Copy link
Author

Narfinger commented Jul 1, 2025

Ok probably a wise choice. I hope it is fine to still have the fast copy function. Because it is not public it should be inlined anyway. And the multiple testcases.

@Narfinger Narfinger force-pushed the latin1-improvements branch from 6298a9e to ff2b4c4 Compare July 1, 2025 08:53
/// Copies chars to the string
unsafe fn fast_copy(chars: &[u8]) -> String {
let mut v = Vec::with_capacity(chars.len() * 2);
v.set_len(chars.len() * 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is undefined behavior. See Safety requirements of set_len

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some existing discussion over at encoding_rs hsivonen/encoding_rs#79

@Narfinger
Copy link
Author

I also tested using the external api of encoding_rs::UTF_16BE.decode making sure not to duplicate the work that JS_DeprecatedStringHasLatin1Chars does and it seems that that path is still slower then the current path. I am not sure why.
As for the undefined behavior above, jonathan wanted to see if he can use MaybeUninit to make the api safer.

@Narfinger Narfinger force-pushed the latin1-improvements branch from ff2b4c4 to 7cc0787 Compare July 9, 2025 10:15
Narfinger and others added 2 commits July 9, 2025 12:17
On the benchmark it improves the encoding from around 9 microsends to 6
microseconds.

This PR also includes the benchmark and setup as criterion benchmark.

Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
@jschwe
Copy link
Member

jschwe commented Jul 10, 2025

@Narfinger I pushed a commit which refactors the benchmark a bit, so that we run with different input sizes and can also test the slow path.

I can confirm that also on my macbook encoding_rs::mem::decode_latin1 is slower (although it depends on the input sizes, at 1K on the fastpath it seems to be quite a bit faster). For the slowpath with high bytes decode_latin1 is quite a bit slower than the current solution in your PR.
In any case, it doesn't seem like we can easily / quickly change encoding_rs. Did your manual SIMD implementation also still use encoding_rs, or would that be UB free then?

@Narfinger
Copy link
Author

The manual simd part still falls back to the encoding_rs::mem solution for basically everything that doesn't support sse2.
I tested and did not find an improvement for the phone case but I would be interested in if it is faster on apple chips.

I am not quite sure what you mean with fast and slow path here. The function is only called for codepoints that fit in the latin1 range, everything else is not valid behaviour.

At the moment we have to use encoding_rs to do this encoding leading us
to a bit of a cyclical testing. But tests should prepare for future
refactoring, so I think this is fine.

Signed-off-by: Narfinger <Narfinger@users.noreply.github.com>
@Narfinger
Copy link
Author

Ok I added more tests for checking latin1 completely. Currently this uses encoding_rs::mem because otherwise we cannot really convert it. This leads us to a bit of cyclical testing but I think this is fine so that anybody who in the future tries to improve this function can test against a sane baseline.

In a "completely unrelated" news, my simd improvements do not work and produce wrong results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants