Skip to content

🚧 Add an example read a parquet file with highly selective filter using the next-gen parquet reader #19469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 57 commits into
base: branch-25.10
Choose a base branch
from

Conversation

mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Jul 23, 2025

Description

🚧 This PR adds an example for reading a parquet file subject to highly selective string point lookup filter using the new next-gen (hybrid scan) reader. The file is also read using the legacy parquet reader and the timings are compared.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

mhaseeb123 and others added 30 commits July 7, 2025 22:36
…put-nullmasks-for-pruned-pages' into fea/materialize-hybrid-scan-columns
@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jul 23, 2025
@mhaseeb123 mhaseeb123 changed the title Fea/hybrid-scan-example 🚧 Add an example read a parquet file with highly selective filter using the next-gen parquet reader Jul 23, 2025
@mhaseeb123 mhaseeb123 added 2 - In Progress Currently a work in progress non-breaking Non-breaking change feature request New feature or request labels Jul 23, 2025
@mhaseeb123 mhaseeb123 added the DO NOT MERGE Hold off on merging; see PR for details label Jul 25, 2025
@mhaseeb123 mhaseeb123 changed the base branch from branch-25.08 to branch-25.10 July 25, 2025 02:16
out_columns.emplace_back(make_column(_output_buffers[i], &col_name, metadata, _stream));
} else {
out_columns.emplace_back(make_column(_output_buffers[i], nullptr, metadata, _stream));
}
}

out_columns =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasted this stuff from the same function in reader_impl.cpp to sync both versions. I believe this adds optimizations for structs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue DO NOT MERGE Hold off on merging; see PR for details feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant