Skip to content

Optimize the join operators #16710

@MrPowers

Description

@MrPowers

DataFusion is underperforming the Polars streaming engine on some localhost join queries (1e8 rows of data on a Macbook M3 with 16GB of RAM):

Image

Here are the join queries.

I am guessing the join operator can be optimized, similar to how the filtering and aggregation operations were optimized.

Here is an example of how the median function was made faster: #13550

See this epic for more info: #13548

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions