Skip to content

Commit 3063e4c

Browse files
authored
Merge pull request #24 from rusticstuff/v.next
Prepare v0.1.1
2 parents 84b79cf + acea3c2 commit 3063e4c

32 files changed

+26044
-16448
lines changed

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "simdutf8"
3-
version = "0.1.0"
3+
version = "0.1.1"
44
authors = ["Hans Kratz <hans@appfour.com>"]
55
edition = "2018"
66
description = "SIMD-accelerated UTF-8 validation."
@@ -16,8 +16,10 @@ exclude = ["/.github", "/.vscode", "/bench", "/afl", "/fuzz", "/img", "expected-
1616
[features]
1717
default = ["std"]
1818

19+
# enable CPU feature detection, on by default, turn off for no-std support
1920
std = []
2021

22+
# expose SIMD implementations in basic::imp::* and compat::imp::*
2123
public_imp = []
2224

2325
# use branch hints - requires nightly

README.md

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,14 @@ Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, b
88
[simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs).
99

1010
## Disclaimer
11-
This software should be considered alpha quality and should not (yet) be used in production, though it has been tested
12-
with sample data as well as a fuzzer and there are no known bugs. It will be tested more rigorously before the first
13-
production release.
11+
This software should not (yet) be used in production, though it has been tested with sample data as well as
12+
fuzzing and there are no known bugs.
1413

1514
## Features
1615
* `basic` API for the fastest validation, optimized for valid UTF-8
1716
* `compat` API as a fully compatible replacement for `std::str::from_utf8()`
18-
* Up to twenty times faster than the std library on non-ASCII, up to twice as fast on ASCII
19-
* Up to 28% faster on non-ASCII input compared to the original simdjson implementation
17+
* Up to 22 times faster than the std library on non-ASCII, up to three times faster on ASCII
18+
* As fast as or faster than the original simdjson implementation
2019
* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64. ARMv7 and ARMv8 neon support is planned
2120
* Selects the fastest implementation at runtime based on CPU support
2221
* Written in pure Rust
@@ -28,7 +27,7 @@ production release.
2827
Add the dependency to your Cargo.toml file:
2928
```toml
3029
[dependencies]
31-
simdutf8 = { version = "0.1.0" }
30+
simdutf8 = { version = "0.1.1" }
3231
```
3332

3433
Use `simdutf8::basic::from_utf8` as a drop-in replacement for `std::str::from_utf8()`.
@@ -59,7 +58,8 @@ is not valid UTF-8. `simdutf8::basic::Utf8Error` is a zero-sized error struct.
5958

6059
### Compat flavor
6160
The `compat` flavor is fully API-compatible with `std::str::from_utf8`. In particular, `simdutf8::compat::from_utf8()`
62-
returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.
61+
returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for
62+
verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.
6363

6464
It also fails early: errors are checked on-the-fly as the string is processed and once
6565
an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
@@ -75,47 +75,56 @@ For no-std support (compiled with `--no-default-features`) the implementation is
7575
the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implementation or `RUSTFLAGS="-C target-feature=+sse4.2"`
7676
for the SSE 4.2 implementation.
7777

78-
If you want to be able to call A SIMD implementation directly, use the `public_imp` feature flag. The validation
78+
If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
7979
implementations are then accessible via `simdutf8::(basic|compat)::imp::x86::(avx2|sse42)::validate_utf8()`.
8080

8181
## When not to use
82-
If you are only processing short byte sequences (less than 64 bytes), the excellent scalar algorithm in the standard
83-
library is likely faster. Also, this library uses unsafe code which has not been battle-tested and should not (yet)
84-
be used in production.
82+
This library uses unsafe code which has not been battle-tested and should not (yet) be used in production.
8583

8684
## Minimum Supported Rust Version (MSRV)
8785
This crate's minimum supported Rust version is 1.38.0.
8886

8987
## Benchmarks
90-
9188
The benchmarks have been done with [criterion](https://bheisler.github.io/criterion.rs/book/index.html), the tables
9289
are created with [critcmp](https://github.com/BurntSushi/critcmp). Source code and data are in the
9390
[bench directory](https://github.com/rusticstuff/simdutf8/tree/main/bench).
9491

9592
The name schema is id-charset/size. _0-empty_ is the empty byte slice, _x-error/66536_ is a 64KiB slice where the very
9693
first character is invalid UTF-8. All benchmarks were run on a laptop with an Intel Core i7-10750H CPU (Comet Lake) on
97-
Windows with Rust 1.51.0. Library versions are simdutf8 v0.1.0 and simdjson v0.9.2.
94+
Windows with Rust 1.51.0 if not otherwise stated. Library versions are simdutf8 v0.1.1 and simdjson v0.9.2. When comparing
95+
with simdjson simdutf8 is compiled with `#inline(never)`.
9896

9997
### simdutf8 basic vs std library UTF-8 validation
100-
![critcmp stimdutf8 basic vs std lib](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-std.png)
101-
simdutf8 performs better except for inputs ≤ 64 bytes.
98+
![critcmp stimdutf8 v0.1.1 basic vs std lib](https://user-images.githubusercontent.com/3736990/116121179-a8271f80-a6c0-11eb-9b2b-6233c3c824f2.png)
99+
simdutf8 performs better or as well as the std library.
100+
101+
### simdutf8 basic vs simdjson UTF-8 validation on Intel Comet Lake
102+
![critcmp stimdutf8 v0.1.1 basic vs simdjson WSL](https://user-images.githubusercontent.com/3736990/116121748-38656480-a6c1-11eb-8cb4-385c7516a46a.png)
103+
simdutf8 beats simdjson on almost all inputs on this CPU. This benchmark is run on
104+
[WSL](https://docs.microsoft.com/en-us/windows/wsl/install-win10)
105+
since I could not get simdjson to reach maximum performance on Windows with any C++ toolchain (see also simdjson issues
106+
[847](https://github.com/simdjson/simdjson/issues/847) and [848](https://github.com/simdjson/simdjson/issues/848)).
107+
108+
### simdutf8 basic vs simdjson UTF-8 validation on AMD Zen 2
109+
![critcmp stimdutf8 v0.1.1 basic vs simdjson AMD Zen 2](https://user-images.githubusercontent.com/3736990/116122729-731bcc80-a6c2-11eb-82a5-6e297778a1c4.png)
102110

103-
### simdutf8 basic vs simdjson UTF-8 validation
104-
![critcmp st lib vs stimdutf8 basic](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-simdjson.png)
105-
simdutf8 is faster than simdjson except for some crazy optimization by clang for the pure ASCII
106-
loop (to be investigated). simdjson is compiled using clang and gcc from MSYS.
111+
On AMD Zen 2 aligning reads apparently does not matter at all. The extra step for aligning even hurts performance a bit around
112+
an input size of 4096.
107113

108114
### simdutf8 basic vs simdutf8 compat UTF-8 validation
109-
![critcmp st lib vs stimdutf8 basic](https://raw.githubusercontent.com/rusticstuff/simdutf8/main/img/basic-vs-compat.png)
115+
![image](https://user-images.githubusercontent.com/3736990/116122427-0dc7db80-a6c2-11eb-8434-f9879742d90d.png)
110116
There is a small performance penalty to continuously checking the error status while processing data, but detecting
111117
errors early provides a huge benefit for the _x-error/66536_ benchmark.
112118

113119
## Technical details
114-
The implementation is similar to the one in simdjson except that it aligns reads to the block size of the
115-
SIMD extension, which leads to better peak performance compared to the implementation in simdjson. This alignment
116-
means that an incomplete block needs to be processed before the aligned data is read, which would lead to worse
117-
performance on short byte sequences. Thus, aligned reads are only used with 2048 bytes of data or more. Incomplete
118-
reads for the first unaligned and the last incomplete block are done in two aligned 64-byte buffers.
120+
On X86 for inputs shorter than 64 bytes validation is delegated to `core::str::from_utf8()`.
121+
122+
The SIMD implementation is similar to the one in simdjson except that it aligns reads to the block size of the
123+
SIMD extension, which leads to better peak performance compared to the implementation in simdjson on some CPUs.
124+
This alignment means that an incomplete block needs to be processed before the aligned data is read, which
125+
leads to worse performance on byte sequences shorter than 2048 bytes. Thus, aligned reads are only used with
126+
2048 bytes of data or more. Incomplete reads for the first unaligned and the last incomplete block are done in
127+
two aligned 64-byte buffers.
119128

120129
For the compat API we need to check the error buffer on each 64-byte block instead of just aggregating it. If an
121130
error is found, the last bytes of the previous block are checked for a cross-block continuation and then
@@ -137,5 +146,4 @@ the MIT license and Apache 2.0 license.
137146
simdjson itself is distributed under the Apache License 2.0.
138147

139148
## References
140-
141149
John Keiser, Daniel Lemire, [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090), Software: Practice and Experience 51 (5), 2021

TODO.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,3 @@
99
* investigate aarch64 support
1010

1111
# NEXT
12-
* v0.1.1 benchmarks

bench/BENCHMARKING.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,5 @@ Adding `-- --save-baseline some_name` to the bench commandline and then using [c
4646
* Beware of BD PROCHOT on aged machines, can cause severe throttling
4747

4848
### Test machines
49-
* Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (Sandy bridge)
50-
* Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz (Skylake)
51-
* Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (Comet Lake)
49+
* Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz (Comet Lake)
50+
* AMD Ryzen 7 PRO 3700 8-Core Processor @ 3.60 GHz (Zen 2)

bench/Cargo.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ simdjson-utf8 = { version = "*", path = "simdjson-utf8", optional = true }
2424
name = "throughput_basic"
2525
harness = false
2626

27+
[[bench]]
28+
name = "throughput_basic_noinline"
29+
harness = false
30+
2731
[[bench]]
2832
name = "throughput_compat"
2933
harness = false

0 commit comments

Comments
 (0)