Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on an upstream project apache/datafusion#12092 which switched DataFusion to use StringViewArray
rather than StringArray
When I did, one of the queries got much slower.
Profiling revealed that the time difference was almost entirely explained by the time spent in StringViewArray::slice()
Here is the flamegraph with StringArray
:
Here is the same query with StringViewArray
:
I am pretty sure the additional time is due to the time spent allocating / copying / deallocating the Vec
s of buffers here:
Where calling slice
on a StringArray can be done with a few Arc increments.
arrow-rs/arrow-array/src/array/byte_array.rs
Lines 88 to 91 in 3490639
Describe the solution you'd like
I would like StringViewArray::slice
to be faster (aka don't allocate)
Describe alternatives you've considered
We can (and probably should) change DataFusion not to use slice in this case (see apache/datafusion#6906, apache/datafusion#6906 (comment) specifically) but I think making slice
faster / non allocating for StringViewArray
will be useful in general
Additional context