Skip to content

Commit 12485be

Browse files
authored
fix: check leaf column is root column in Parquet schema (#1347)
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> N/A ## What changes are included in this PR? <!-- Provide a summary of the modifications in this PR. List the main changes such as new features, bug fixes, refactoring, or any other updates. --> This PR is to fix a bug in checking if a leaf column is root column in Parquet schema. The current solution is to compare the leaf column's index with root column's index, this is not reliable as the column index may not correspond exactly after a nested column. For example, - col A: String - col B: Map - key B.K: String - value B.V: String - col C: String For column A, B.K and B.V, the currently logic works well. It won't return error for A, but will do for B.K and B.V, this is correct. But for column C, it will return error as its leaf column index (3) will not match root column index (2), this is incorrect. Column C is a valid leaf column to be in a predicate. I think `get_column_root().is_group()` may be a more reliable way to check in this situation. Just to check if a leaf column's root column is a group column, we can determine a leaf column is actually on the root. ## Are these changes tested? <!-- Specify what test covers (unit test, integration test, etc.). If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Yes
1 parent fbc3716 commit 12485be

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

crates/iceberg/src/arrow/reader.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -933,7 +933,7 @@ impl PredicateConverter<'_> {
933933
fn bound_reference(&mut self, reference: &BoundReference) -> Result<Option<usize>> {
934934
// The leaf column's index in Parquet schema.
935935
if let Some(column_idx) = self.column_map.get(&reference.field().id) {
936-
if self.parquet_schema.get_column_root_idx(*column_idx) != *column_idx {
936+
if self.parquet_schema.get_column_root(*column_idx).is_group() {
937937
return Err(Error::new(
938938
ErrorKind::DataInvalid,
939939
format!(

0 commit comments

Comments
 (0)