Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion arrow-array/src/array/boolean_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,7 @@ impl BooleanArray {
///
/// The iterator must be [`TrustedLen`](https://doc.rust-lang.org/std/iter/trait.TrustedLen.html).
/// I.e. that `size_hint().1` correctly reports its length. Note that this is a stronger
/// guarantee that `ExactSizeIterator` provides which could still report a wrong length.
/// guarantee than `ExactSizeIterator` provides, which could still report a wrong length.
///
/// # Panics
///
Expand Down
11 changes: 9 additions & 2 deletions arrow-array/src/array/primitive_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1458,14 +1458,21 @@ impl<T: ArrowPrimitiveType, Ptr: Into<NativeAdapter<T>>> FromIterator<Ptr> for P

impl<T: ArrowPrimitiveType> PrimitiveArray<T> {
/// Creates a [`PrimitiveArray`] from an iterator of trusted length.
///
/// # Safety
///
/// The iterator must be [`TrustedLen`](https://doc.rust-lang.org/std/iter/trait.TrustedLen.html).
/// I.e. that `size_hint().1` correctly reports its length.
/// I.e. that `size_hint().1` correctly reports its length. Note that this is a stronger
/// guarantee than `ExactSizeIterator` provides, which could still report a wrong length.
///
/// # Panics
///
/// Panics if the iterator does not report an upper bound on `size_hint()`.
#[inline]
pub unsafe fn from_trusted_len_iter<I, P>(iter: I) -> Self
where
P: std::borrow::Borrow<Option<<T as ArrowPrimitiveType>::Native>>,
I: IntoIterator<Item = P>,
I: ExactSizeIterator<Item = P>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rationale for this change? I am thinking that maybe someone who had an iterator that knew its length but did not implement ExactSizeIterator would have to change their code

Maybe that is ok, but the implementation doesn't even seem to use any of the methods on ExactSizeIterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we just mentioned that as a side note in the last PR. I think it has two advantages:

  • Users have at least some help from the type system to avoid safety issues.
  • It's consistent with BooleanArray

However, I am also happy with the old API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that some of the functions could make use of len instead of size_hint. If we choose to keep ExactSizeIterator I'll go through them.

I think another cause of incompatibility is caused by users needing to call .into_iter() manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbrobbel Do you have any more thoughts on which iterator bound we should use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use TrustedLen, but we can't because it's nightly-only. So the next best things is ExactSizeIterator for iterators that know their exact length. However there is a note in the docs:

Note that this trait is a safe trait and as such does not and cannot guarantee that the returned length is correct. This means that unsafe code must not rely on the correctness of Iterator::size_hint. The unstable and unsafe TrustedLen trait gives this additional guarantee.

Since this method is marked unsafe we require users to make sure their iterators are TrustedLen. This means:

The iterator reports a size hint where it is either exact (lower bound is equal to upper bound), or the upper bound is None. The upper bound must only be None if the actual iterator length is larger than usize::MAX. In that case, the lower bound must be usize::MAX, resulting in an Iterator::size_hint() of (usize::MAX, None).

The motivation to use ExactSizeIterator here (I think) is to use ExactSizeIterator::len instead of Iterator::size_hint. This has the same "can't be None" requirement on the upper bound reported by the size hint.

For the bound we can use I: IntoIterator<Item = P, IntoIter: ExactSizeIterator>.

{
let iterator = iter.into_iter();
let (_, upper) = iterator.size_hint();
Expand Down
8 changes: 4 additions & 4 deletions arrow-cast/src/cast/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ pub(crate) fn parse_string_view<P: Parser>(
fn parse_string_iter<
'a,
P: Parser,
I: Iterator<Item = Option<&'a str>>,
I: ExactSizeIterator<Item = Option<&'a str>>,
F: FnOnce() -> Option<NullBuffer>,
>(
iter: I,
Expand Down Expand Up @@ -156,7 +156,7 @@ pub(crate) fn cast_view_to_timestamp<T: ArrowTimestampType>(

fn cast_string_to_timestamp_impl<
'a,
I: Iterator<Item = Option<&'a str>>,
I: ExactSizeIterator<Item = Option<&'a str>>,
T: ArrowTimestampType,
Tz: TimeZone,
>(
Expand Down Expand Up @@ -310,7 +310,7 @@ fn cast_string_to_interval_impl<'a, I, ArrowType, F>(
parse_function: F,
) -> Result<ArrayRef, ArrowError>
where
I: Iterator<Item = Option<&'a str>>,
I: ExactSizeIterator<Item = Option<&'a str>>,
ArrowType: ArrowPrimitiveType,
F: Fn(&str) -> Result<ArrowType::Native, ArrowError> + Copy,
{
Expand All @@ -331,7 +331,7 @@ where
// 20% performance improvement
// Soundness:
// The iterator is trustedLen because it comes from an `StringArray`.
unsafe { PrimitiveArray::<ArrowType>::from_trusted_len_iter(vec) }
unsafe { PrimitiveArray::<ArrowType>::from_trusted_len_iter(vec.into_iter()) }
};
Ok(Arc::new(interval_array) as ArrayRef)
}
Expand Down
40 changes: 23 additions & 17 deletions arrow/benches/array_from.rs
Original file line number Diff line number Diff line change
Expand Up @@ -206,40 +206,46 @@ fn array_from_vec_benchmark(c: &mut Criterion) {
});
}

fn gen_option_vector<TItem: Copy>(item: TItem, len: usize) -> Vec<Option<TItem>> {
hint::black_box(
repeat_n(item, len)
.enumerate()
.map(|(idx, item)| if idx % 3 == 0 { None } else { Some(item) })
.collect(),
)
fn gen_option_iter<TItem: Clone + 'static>(
item: TItem,
len: usize,
) -> Box<dyn ExactSizeIterator<Item = Option<TItem>>> {
hint::black_box(Box::new(repeat_n(item, len).enumerate().map(
|(idx, item)| {
if idx % 3 == 0 {
None
} else {
Some(item)
}
},
)))
}

fn from_iter_benchmark(c: &mut Criterion) {
const ITER_LEN: usize = 16_384;

// All ArrowPrimitiveType use the same implementation
c.bench_function("Int64Array::from_iter", |b| {
let values = gen_option_vector(1, ITER_LEN);
b.iter(|| hint::black_box(Int64Array::from_iter(values.iter())));
b.iter(|| hint::black_box(Int64Array::from_iter(gen_option_iter(1, ITER_LEN))));
});
c.bench_function("Int64Array::from_trusted_len_iter", |b| {
let values = gen_option_vector(1, ITER_LEN);
b.iter(|| unsafe {
// SAFETY: values.iter() is a TrustedLenIterator
hint::black_box(Int64Array::from_trusted_len_iter(values.iter()))
// SAFETY: gen_option_iter returns a TrustedLen iterator
hint::black_box(Int64Array::from_trusted_len_iter(gen_option_iter(
1, ITER_LEN,
)))
});
});

c.bench_function("BooleanArray::from_iter", |b| {
let values = gen_option_vector(true, ITER_LEN);
b.iter(|| hint::black_box(BooleanArray::from_iter(values.iter())));
b.iter(|| hint::black_box(BooleanArray::from_iter(gen_option_iter(true, ITER_LEN))));
});
c.bench_function("BooleanArray::from_trusted_len_iter", |b| {
let values = gen_option_vector(true, ITER_LEN);
b.iter(|| unsafe {
// SAFETY: values.iter() is a TrustedLenIterator
hint::black_box(BooleanArray::from_trusted_len_iter(values.iter()))
// SAFETY: gen_option_iter returns a TrustedLen iterator
hint::black_box(BooleanArray::from_trusted_len_iter(gen_option_iter(
true, ITER_LEN,
)))
});
});
}
Expand Down
Loading