Skip to content

Perf: Optimize comparison kernels for inlined views #7731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 23, 2025

Conversation

zhuqi-lucas
Copy link
Contributor

Which issue does this PR close?

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

Closes #7621

Rationale for this change

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

What changes are included in this PR?

Add fast path for empty buffer case to skip compute len, and use view(u128) to compare directly.

Are there any user-facing changes?

No

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 21, 2025
@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jun 21, 2025

The performance benchmark result,

the lt scalar StringViewArray shows more than 30% performance faster, and no obvious regression for all string view testing.

lt scalar StringViewArray                                                                                1.00     20.9±1.35ms        ? ?/sec    1.31     27.4±1.23ms        ? ?/sec
critcmp  main fast_path_view --filter "iew"
group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.01  1257.0±18.64µs        ? ?/sec    1.00  1249.7±10.10µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00  1235.9±18.20µs        ? ?/sec    1.00  1230.0±16.30µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.03   697.8±32.61µs        ? ?/sec    1.00    677.9±8.49µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01    613.6±9.26µs        ? ?/sec    1.00   605.7±11.02µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      4.2±0.05ms        ? ?/sec    1.06      4.4±0.10ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.01   510.0±13.43µs        ? ?/sec    1.00    503.6±7.12µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      4.0±0.03ms        ? ?/sec    1.01      4.0±0.04ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.05      4.2±0.04ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.05      4.2±0.06ms        ? ?/sec
like_utf8view scalar complex                                                                             1.00     82.6±0.54ms        ? ?/sec    1.00     82.8±0.89ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.7±0.56ms        ? ?/sec    1.01     77.2±0.63ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     19.0±0.18ms        ? ?/sec    1.00     19.0±0.30ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     19.1±0.22ms        ? ?/sec    1.01     19.3±0.19ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.01     19.2±0.27ms        ? ?/sec    1.00     19.1±0.27ms        ? ?/sec
like_utf8view scalar equals                                                                              1.00     14.1±0.13ms        ? ?/sec    1.00     14.1±0.13ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.01     18.6±0.58ms        ? ?/sec    1.00     18.5±0.61ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.8±0.15ms        ? ?/sec    1.01     11.0±0.22ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.7±0.34ms        ? ?/sec    1.00     18.8±0.41ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.00    619.0±3.76µs        ? ?/sec    1.00    616.8±3.57µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.02  1857.3±74.05µs        ? ?/sec    1.00  1815.2±22.35µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    609.7±7.74µs        ? ?/sec    1.00    611.9±6.24µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.01    195.6±3.62µs        ? ?/sec    1.00    193.5±3.08µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.04   695.1±82.83µs        ? ?/sec    1.00    671.2±6.65µs        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.00   491.0±11.87µs        ? ?/sec    1.00    489.8±5.67µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     20.9±1.35ms        ? ?/sec    1.31     27.4±1.23ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.04   516.6±19.16µs        ? ?/sec    1.00    499.0±6.16µs        ? ?/sec

@zhuqi-lucas zhuqi-lucas changed the title Fast path view Perf: Optimize comparison kernels for inlined views Jun 21, 2025
@zhuqi-lucas
Copy link
Contributor Author

Hi @Dandandan
The fuzz testing failed, i am not sure if we can compare the view directly.

Based on what we’re seeing, it looks like directly comparing the raw u128 views can violate true lexicographic ordering under little‑endian layout—fuzz has caught cases where padding and byte‑order make u128.cmp() pick the wrong winner.

@Dandandan
Copy link
Contributor

Dandandan commented Jun 21, 2025

Hi @Dandandan The fuzz testing failed, i am not sure if we can compare the view directly.

Based on what we’re seeing, it looks like directly comparing the raw u128 views can violate true lexicographic ordering under little‑endian layout—fuzz has caught cases where padding and byte‑order make u128.cmp() pick the wrong winner.

I think we have to make sure only to compare the inlined string, not the entire u128/view value!

@Dandandan
Copy link
Contributor

I think for it to work we probably should:

  • Add some benchmarks for arrays with only inlined string views (12 bytes or smaller)
  • Change the implementation of comparison views to not be on individual level, but on array level - so outside of the loop we can decide on whether to use the fast path or not.

@zhuqi-lucas
Copy link
Contributor Author

Thank you @Dandandan , addressed in latest PR, and add benchmark, the result is amazing, 3.x faster and 10x faster.

critcmp  main fast_path_view --filter "inlined bytes"
group                                               fast_path_view                         main
-----                                               --------------                         ----
eq StringViewArray StringViewArray inlined bytes    1.00      3.7±0.09ms        ? ?/sec    3.33     12.3±0.15ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes    1.00      5.2±0.13ms        ? ?/sec    9.80     51.3±1.76ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (c089841) to 1ededfe diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=StringView
BENCH_BRANCH_NAME=fast_path_view
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks good to me -- thank you @zhuqi-lucas

I am rerunning the benchmarks to double check performance but it looks great

let r_bytes = unsafe { GenericByteViewArray::<T>::inline_value(r_view, 12) };
return l_bytes.cmp(r_bytes).is_eq();
}

// # Safety
// The index is within bounds as it is checked in value()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think it would be good to move the Safety comment up along with the calls to unsafe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for review, addressed it in latest PR.

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

🤖: Benchmark completed

Details

group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.03      2.9±0.04ms        ? ?/sec    1.00      2.8±0.03ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      3.0±0.03ms        ? ?/sec    1.02      3.0±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.03      2.3±0.06ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.00      2.2±0.04ms        ? ?/sec    1.01      2.2±0.07ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     25.0±0.24ms        ? ?/sec    1.00     24.9±0.24ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     23.6±0.14ms        ? ?/sec  
eq long same prefix strings StringViewArray                                                              1.00    921.5±5.29µs        ? ?/sec    1.00    923.2±6.22µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.06     17.7±0.08ms        ? ?/sec    1.00     16.8±0.15ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.06     17.9±0.18ms        ? ?/sec    1.00     17.0±0.15ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.06     17.9±0.12ms        ? ?/sec    1.00     17.0±0.10ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     24.6±0.25ms        ? ?/sec  
lt long same prefix strings StringViewArray                                                              1.02    899.1±3.97µs        ? ?/sec    1.00    881.0±4.03µs        ? ?/sec
lt scalar StringViewArray                                                                                1.01     58.3±0.11ms        ? ?/sec    1.00     57.7±0.11ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    918.3±4.04µs        ? ?/sec    1.01    925.2±4.53µs        ? ?/sec

@@ -566,12 +566,18 @@ impl<'a, T: ByteViewType> ArrayOrd for &'a GenericByteViewArray<T> {
type Item = (&'a GenericByteViewArray<T>, usize);

fn is_eq(l: Self::Item, r: Self::Item) -> bool {
let l_view = unsafe { l.0.views().get_unchecked(l.1) };
let r_view = unsafe { r.0.views().get_unchecked(r.1) };
if l.0.data_buffers().is_empty() && r.0.data_buffers().is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code probably depends on inlining / moving this check outside of the loop?
Should we add #[inline(always)] to it to make sure it keeps doing that? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan for this good suggestion, i addressed it in latest PR.

Copy link
Contributor Author

@zhuqi-lucas zhuqi-lucas Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the #[inline(always)] to eq(), lt() and compare_byte_view(), i am sure if it's the right way.

@Dandandan
Copy link
Contributor

My results match the 3x / 10x improvement 🚀

eq StringViewArray StringViewArray inlined bytes
                        time:   [6.5763 ms 6.6128 ms 6.6497 ms]
                        change: [−62.258% −62.065% −61.859%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

lt StringViewArray StringViewArray inlined bytes
                        time:   [7.9021 ms 7.9405 ms 7.9766 ms]
                        change: [−89.725% −89.674% −89.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low severe
  7 (7.00%) low mild

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fast_path_view (dddf67a) to 1ededfe diff
BENCH_NAME=comparison_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench comparison_kernels
BENCH_FILTER=StringView
BENCH_BRANCH_NAME=fast_path_view
Results will be posted here when complete

if left.data_buffers().is_empty() && right.data_buffers().is_empty() {
// Only need to compare the inlined bytes
let l_bytes = unsafe {
GenericByteViewArray::<T>::inline_value(left.views().get_unchecked(left_idx), 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think this might not be correct for > / < in case of zeros in the values (e.g. https://en.wikipedia.org/wiki/Null_character).
I think we should add the length to the comparison (or do something smarter? I am not sure if that's possible).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns greater (instead of Equal):

> println!("{:?}", "\0".cmp(""))
> Greater

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @Dandandan , let me find a good way to handle it!

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promising results!
I think we need some test to show it is correct with null characters and different lengths, I think we need to modify the implementation a bit.

@alamb
Copy link
Contributor

alamb commented Jun 22, 2025

🤖: Benchmark completed

Details

group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.00      2.9±0.04ms        ? ?/sec    1.00      2.9±0.07ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.00      3.0±0.04ms        ? ?/sec    1.02      3.0±0.04ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.00      2.2±0.05ms        ? ?/sec    1.03      2.3±0.08ms        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.01      2.2±0.05ms        ? ?/sec    1.00      2.2±0.04ms        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00     24.9±0.25ms        ? ?/sec    1.00     24.9±0.27ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00     23.9±0.25ms        ? ?/sec  
eq long same prefix strings StringViewArray                                                              1.00    919.5±5.05µs        ? ?/sec    1.00    920.2±4.66µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.06     17.6±0.09ms        ? ?/sec    1.00     16.7±0.16ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.04     17.8±0.17ms        ? ?/sec    1.00     17.1±0.09ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.05     17.8±0.13ms        ? ?/sec    1.00     16.9±0.11ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     24.6±0.28ms        ? ?/sec  
lt long same prefix strings StringViewArray                                                              1.03    907.7±8.25µs        ? ?/sec    1.00    879.4±4.85µs        ? ?/sec
lt scalar StringViewArray                                                                                1.01     58.3±0.16ms        ? ?/sec    1.00     57.7±0.17ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.00    915.7±4.42µs        ? ?/sec    1.01    921.6±4.44µs        ? ?/sec

@alamb alamb self-requested a review June 22, 2025 12:42
@zhuqi-lucas
Copy link
Contributor Author

Promising results! I think we need some test to show it is correct with null characters and different lengths, I think we need to modify the implementation a bit.

Thank you @Dandandan , updated the code, and we still can gain some performance improvement for latest code.

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Jun 22, 2025

@Dandandan @alamb
The benchmark result, i run locally for latest PR:

critcmp  main fast_path_view --filter "iew"
group                                                                                                    fast_path_view                         main
-----                                                                                                    --------------                         ----
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar complex        1.01  1252.0±14.91µs        ? ?/sec    1.00  1242.8±12.91µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar contains       1.01   1224.7±7.64µs        ? ?/sec    1.00  1215.0±22.97µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar ends with      1.01    673.7±3.22µs        ? ?/sec    1.00    670.3±2.76µs        ? ?/sec
StringViewArray: regexp_matches_utf8view scalar benchmarks/regexp_matches_utf8view scalar starts with    1.02    620.5±7.92µs        ? ?/sec    1.00    609.3±5.52µs        ? ?/sec
eq StringViewArray StringViewArray                                                                       1.00      4.3±0.02ms        ? ?/sec    1.10      4.8±0.03ms        ? ?/sec
eq StringViewArray StringViewArray inlined bytes                                                         1.00      8.2±0.05ms        ? ?/sec    1.57     12.8±0.10ms        ? ?/sec
eq long same prefix strings StringViewArray                                                              1.04   521.7±11.72µs        ? ?/sec    1.00   500.1±12.99µs        ? ?/sec
eq scalar StringViewArray 13 bytes                                                                       1.00      3.8±0.04ms        ? ?/sec    1.01      3.9±0.04ms        ? ?/sec
eq scalar StringViewArray 4 bytes                                                                        1.00      4.0±0.04ms        ? ?/sec    1.02      4.1±0.03ms        ? ?/sec
eq scalar StringViewArray 6 bytes                                                                        1.00      4.0±0.08ms        ? ?/sec    1.00      4.0±0.03ms        ? ?/sec
like_utf8view scalar complex                                                                             1.01     80.5±0.69ms        ? ?/sec    1.00     80.0±0.40ms        ? ?/sec
like_utf8view scalar contains                                                                            1.00     76.5±0.65ms        ? ?/sec    1.00     76.7±0.63ms        ? ?/sec
like_utf8view scalar ends with 13 bytes                                                                  1.00     18.8±0.21ms        ? ?/sec    1.01     19.1±0.46ms        ? ?/sec
like_utf8view scalar ends with 4 bytes                                                                   1.00     18.9±0.19ms        ? ?/sec    1.00     18.9±0.27ms        ? ?/sec
like_utf8view scalar ends with 6 bytes                                                                   1.01     18.9±0.23ms        ? ?/sec    1.00     18.8±0.19ms        ? ?/sec
like_utf8view scalar equals                                                                              1.02     14.3±0.62ms        ? ?/sec    1.00     14.0±0.17ms        ? ?/sec
like_utf8view scalar starts with 13 bytes                                                                1.00     18.1±0.19ms        ? ?/sec    1.01     18.2±0.12ms        ? ?/sec
like_utf8view scalar starts with 4 bytes                                                                 1.00     10.5±0.10ms        ? ?/sec    1.00     10.5±0.16ms        ? ?/sec
like_utf8view scalar starts with 6 bytes                                                                 1.00     18.4±0.20ms        ? ?/sec    1.00     18.4±0.15ms        ? ?/sec
long same prefix strings like_utf8view scalar complex                                                    1.01   624.2±32.90µs        ? ?/sec    1.00   615.5±13.97µs        ? ?/sec
long same prefix strings like_utf8view scalar contains                                                   1.02  1849.8±73.20µs        ? ?/sec    1.00  1813.0±19.86µs        ? ?/sec
long same prefix strings like_utf8view scalar ends with                                                  1.00    593.1±4.95µs        ? ?/sec    1.01   598.3±13.30µs        ? ?/sec
long same prefix strings like_utf8view scalar equals                                                     1.03    195.4±5.64µs        ? ?/sec    1.00    190.2±1.29µs        ? ?/sec
long same prefix strings like_utf8view scalar starts with                                                1.00    671.4±5.58µs        ? ?/sec    1.01   676.8±20.18µs        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     47.6±1.05ms        ? ?/sec    1.11     52.8±1.38ms        ? ?/sec
lt long same prefix strings StringViewArray                                                              1.06    512.5±9.08µs        ? ?/sec    1.00    482.1±5.10µs        ? ?/sec
lt scalar StringViewArray                                                                                1.00     26.8±0.92ms        ? ?/sec    1.04     28.0±1.56ms        ? ?/sec
neq long same prefix strings StringViewArray                                                             1.03    521.1±9.77µs        ? ?/sec    1.00    503.7±9.93µs        ? ?/sec

Still have 57% and %11 for the inlined case:

eq StringViewArray StringViewArray inlined bytes                                                         1.00      8.2±0.05ms        ? ?/sec    1.57     12.8±0.10ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes                                                         1.00     47.6±1.05ms        ? ?/sec    1.11     52.8±1.38ms        ? ?/sec

let r_len = *r_view as u32;
// This is a fast path for equality check.
// We don't need to look at the actual bytes to determine if they are equal.
if l_len != r_len {
return false;
}

// When both len are same, we can compare the inlined bytes, this can handle the corner case after
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also compare earlier based on actual length instead like in the other place?

i am also not sure comparing length here is really a fast path, because slice equality also checks on slice first before accessing the element.

Copy link
Contributor

@Dandandan Dandandan Jun 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think we can do the eq case on the u128 value directly @zhuqi-lucas (compare both length and value to be equal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you @Dandandan , thank you for catching this! Addressed in latest PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, eq optimize a lot, 4.75 faster.

critcmp  main fast_path_view --filter "inlined bytes"
group                                               fast_path_view                         main
-----                                               --------------                         ----
eq StringViewArray StringViewArray inlined bytes    1.00      2.7±0.01ms        ? ?/sec    4.75     12.8±0.10ms        ? ?/sec
lt StringViewArray StringViewArray inlined bytes    1.00     47.3±0.55ms        ? ?/sec    1.12     52.8±1.38ms        ? ?/sec

@zhuqi-lucas
Copy link
Contributor Author

Similar changes for datafusion side, it shows good result for Q11 sort-tpch which is mostly using inlined bytes to compare.

apache/datafusion#16509

let r_view = unsafe { r.0.views().get_unchecked(r.1) };
let l_len = *l_view as u32 as usize;
let r_len = *r_view as u32 as usize;
let l_bytes = unsafe { GenericByteViewArray::<T>::inline_value(l_view, l_len) };
Copy link
Contributor

@Dandandan Dandandan Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the main thing I see is what we could do next is do inline_value on a u128 level rather than &[u8] and compare the values on u128 and length if equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Dandandan , interesting, i will investigate this idea!

@Dandandan
Copy link
Contributor

Dandandan commented Jun 23, 2025

All in all it's good, I think we can improve it further, I'll let you decide @zhuqi-lucas if you want to do it in this PR or later!

I think we can also improve the non-inlined case with prefixes in the views, I will create an issue about it later.

@zhuqi-lucas
Copy link
Contributor Author

Thank you @Dandandan , let's create the issues for further improvement, i am willing to investigate and take the related issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize comparison kernels for inlined views
3 participants