-
Notifications
You must be signed in to change notification settings - Fork 980
Evaluate IS_NULL
at row group and page level in Parquet filtering
#20144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Evaluate IS_NULL
at row group and page level in Parquet filtering
#20144
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two quick questions, everything else looks great. Regarding the has_is_null_operator
name, it feels a bit wordy, but I couldn’t come up with a better alternative.
Description
Closes #20125
This PR allows evaluating
IS_NULL(col)
and by extensionNOT(IS_NULL(col))
expressions while filtering Parquet row groups and data pages (requires page index) using corresponding statistics. The PR also includes optimizations in case the host columns containing page-stats don't contain any nulls.Checklist
page_stats_caster
androw_group_stats_caster
to usehas_is_null_operator