Skip to content

[FEA] Use post_traversal to populate "base" column statistics #19390

@rjzamora

Description

@rjzamora

Implement a post_traversal pass over the un-lowered IR graph to populate dict[IR, dict[str, ColumnStats]] and dict[IR, RowCount] data structure with base (i.e. source) statistics. The necessary statistics classes were added in #19276.

This traversal will not update the ColumnStats.unique_stats attribute for each column yet. The goal of this traversal is to make sure DataSourceInfo and source-based row-count estimates are fully propagated.

We can also use this traversal to call add_unique_stats_column for known GroupBy and Distinct key columns. This way, the first call too DataSourceInfo.unique_stats(*) (expected during a later IR-graph traversal) will collect row-group information for all known GroupBy/Distinct keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PythonAffects Python cuDF API.cudf-polarsIssues specific to cudf-polarsfeature requestNew feature or request

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions