Skip to content

Commit b969c34

Browse files
committed
Auto merge of #99487 - bmacnaughton:is_whitespace_updates, r=thomcc
is_whitespace() performance improvements This is my first rust PR, so if I miss anything obvious please let me know and I'll do my best to fix it. This was a bit more of a challenge than I realized because, while I made working code locally and tested it against the native `is_whitespace()`, this PR required changing `src/tools/unicode-table-generator`, the code that generated the code. I have benchmarked this locally, using criterion, and have seen meaningful performance improvements. I can add those outputs to this if you'd like, but am guessing that the perf run that `@fmease` recommended is what's needed. I have run ` ./x.py test --stage 0 library/std` after building it locally after executing `./x.py build library`. I didn't try to build the whole compiler, but maybe I should have - any guidance would be appreciated. If this general approach makes sense, I'll take a look at some other candidate categories, e.g., `Cc`, in the future. Oh, and I wasn't sure whether the generated code should be included in this PR or not. I did include it.
2 parents 7477d78 + d81c823 commit b969c34

File tree

1 file changed

+18
-10
lines changed

1 file changed

+18
-10
lines changed

core/src/unicode/unicode_data.rs

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -544,18 +544,26 @@ pub mod uppercase {
544544

545545
#[rustfmt::skip]
546546
pub mod white_space {
547-
static SHORT_OFFSET_RUNS: [u32; 4] = [
548-
5760, 18882560, 23080960, 40972289,
549-
];
550-
static OFFSETS: [u8; 21] = [
551-
9, 5, 18, 1, 100, 1, 26, 1, 0, 1, 0, 11, 29, 2, 5, 1, 47, 1, 0, 1, 0,
547+
static WHITESPACE_MAP: [u8; 256] = [
548+
2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
549+
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
550+
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
551+
0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
552+
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
553+
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
554+
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
555+
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
556+
0, 0, 0, 0, 0, 0, 0, 0, 0,
552557
];
558+
#[inline]
553559
pub fn lookup(c: char) -> bool {
554-
super::skip_search(
555-
c as u32,
556-
&SHORT_OFFSET_RUNS,
557-
&OFFSETS,
558-
)
560+
match c as u32 >> 8 {
561+
0 => WHITESPACE_MAP[c as usize & 0xff] & 1 != 0,
562+
22 => c as u32 == 0x1680,
563+
32 => WHITESPACE_MAP[c as usize & 0xff] & 2 != 0,
564+
48 => c as u32 == 0x3000,
565+
_ => false,
566+
}
559567
}
560568
}
561569

0 commit comments

Comments
 (0)