-
Notifications
You must be signed in to change notification settings - Fork 958
Perf: Optimize comparison kernels for inlined views #7731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
4bc4122
0e4a6cd
8a552a4
399297f
e9b5e44
c089841
dddf67a
4fa53b5
5eec94b
fb82e8b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -565,24 +565,46 @@ impl<'a, T: ByteViewType> ArrayOrd for &'a GenericByteViewArray<T> { | |
/// Item.0 is the array, Item.1 is the index | ||
type Item = (&'a GenericByteViewArray<T>, usize); | ||
|
||
#[inline(always)] | ||
fn is_eq(l: Self::Item, r: Self::Item) -> bool { | ||
// # Safety | ||
// The index is within bounds as it is checked in value() | ||
let l_view = unsafe { l.0.views().get_unchecked(l.1) }; | ||
let l_len = *l_view as u32; | ||
|
||
let r_view = unsafe { r.0.views().get_unchecked(r.1) }; | ||
let l_len = *l_view as u32; | ||
let r_len = *r_view as u32; | ||
// This is a fast path for equality check. | ||
// We don't need to look at the actual bytes to determine if they are equal. | ||
if l_len != r_len { | ||
return false; | ||
} | ||
|
||
// When both len are same, we can compare the inlined bytes, this can handle the corner case after | ||
// the len compare, for example: | ||
// > println!("{:?}", "\0".cmp("")) | ||
// > Greater | ||
if l.0.data_buffers().is_empty() && r.0.data_buffers().is_empty() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the code probably depends on inlining / moving this check outside of the loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @Dandandan for this good suggestion, i addressed it in latest PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I add the #[inline(always)] to eq(), lt() and compare_byte_view(), i am sure if it's the right way. |
||
// If the lengths are equal, we can compare the inlined bytes | ||
// Only need to compare the inlined bytes | ||
let l_bytes = unsafe { GenericByteViewArray::<T>::inline_value(l_view, 12) }; | ||
let r_bytes = unsafe { GenericByteViewArray::<T>::inline_value(r_view, 12) }; | ||
return l_bytes.cmp(r_bytes).is_eq(); | ||
} | ||
|
||
// # Safety | ||
// The index is within bounds as it is checked in value() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I think it would be good to move the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @alamb for review, addressed it in latest PR. |
||
unsafe { GenericByteViewArray::compare_unchecked(l.0, l.1, r.0, r.1).is_eq() } | ||
} | ||
|
||
#[inline(always)] | ||
fn is_lt(l: Self::Item, r: Self::Item) -> bool { | ||
if l.0.data_buffers().is_empty() && r.0.data_buffers().is_empty() { | ||
let l_view = unsafe { l.0.views().get_unchecked(l.1) }; | ||
let r_view = unsafe { r.0.views().get_unchecked(r.1) }; | ||
let l_len = *l_view as u32 as usize; | ||
let r_len = *r_view as u32 as usize; | ||
let l_bytes = unsafe { GenericByteViewArray::<T>::inline_value(l_view, l_len) }; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps the main thing I see is what we could do next is do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @Dandandan , interesting, i will investigate this idea! |
||
let r_bytes = unsafe { GenericByteViewArray::<T>::inline_value(r_view, r_len) }; | ||
return l_bytes.cmp(r_bytes).is_lt(); | ||
} | ||
// # Safety | ||
// The index is within bounds as it is checked in value() | ||
unsafe { GenericByteViewArray::compare_unchecked(l.0, l.1, r.0, r.1).is_lt() } | ||
|
@@ -618,6 +640,7 @@ impl<'a> ArrayOrd for &'a FixedSizeBinaryArray { | |
} | ||
|
||
/// Compares two [`GenericByteViewArray`] at index `left_idx` and `right_idx` | ||
#[inline(always)] | ||
pub fn compare_byte_view<T: ByteViewType>( | ||
left: &GenericByteViewArray<T>, | ||
left_idx: usize, | ||
|
@@ -626,6 +649,15 @@ pub fn compare_byte_view<T: ByteViewType>( | |
) -> std::cmp::Ordering { | ||
assert!(left_idx < left.len()); | ||
assert!(right_idx < right.len()); | ||
if left.data_buffers().is_empty() && right.data_buffers().is_empty() { | ||
let l_view = unsafe { left.views().get_unchecked(left_idx) }; | ||
let r_view = unsafe { right.views().get_unchecked(right_idx) }; | ||
let l_len = *l_view as u32 as usize; | ||
let r_len = *r_view as u32 as usize; | ||
let l_bytes = unsafe { GenericByteViewArray::<T>::inline_value(l_view, l_len) }; | ||
let r_bytes = unsafe { GenericByteViewArray::<T>::inline_value(r_view, r_len) }; | ||
return l_bytes.cmp(r_bytes); | ||
} | ||
unsafe { GenericByteViewArray::compare_unchecked(left, left_idx, right, right_idx) } | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also compare earlier based on actual length instead like in the other place?
i am also not sure comparing length here is really a fast path, because slice equality also checks on slice first before accessing the element.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think we can do the
eq
case on the u128 value directly @zhuqi-lucas (compare both length and value to be equal)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @Dandandan , thank you for catching this! Addressed in latest PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this change, eq optimize a lot, 4.75 faster.