Skip to content

Make StringViewArray::slice() and BinaryViewArray::slice() faster / non allocating #6408

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working on an upstream project apache/datafusion#12092 which switched DataFusion to use StringViewArray rather than StringArray

When I did, one of the queries got much slower.

Profiling revealed that the time difference was almost entirely explained by the time spent in StringViewArray::slice()

Screenshot 2024-09-16 at 4 44 50 PM

Here is the flamegraph with StringArray:
flamegraph-main

Here is the same query with StringViewArray:
flamegraph-string-view

I am pretty sure the additional time is due to the time spent allocating / copying / deallocating the Vecs of buffers here:

Where calling slice on a StringArray can be done with a few Arc increments.

data_type: DataType,
value_offsets: OffsetBuffer<T::Offset>,
value_data: Buffer,
nulls: Option<NullBuffer>,

Describe the solution you'd like
I would like StringViewArray::slice to be faster (aka don't allocate)

Describe alternatives you've considered

We can (and probably should) change DataFusion not to use slice in this case (see apache/datafusion#6906, apache/datafusion#6906 (comment) specifically) but I think making slice faster / non allocating for StringViewArray will be useful in general

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changeloghelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions