You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: check leaf column is root column in Parquet schema (#1347)
## Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes#123` indicates that this PR will close issue #123.
-->
N/A
## What changes are included in this PR?
<!--
Provide a summary of the modifications in this PR. List the main changes
such as new features, bug fixes, refactoring, or any other updates.
-->
This PR is to fix a bug in checking if a leaf column is root column in
Parquet schema. The current solution is to compare the leaf column's
index with root column's index, this is not reliable as the column index
may not correspond exactly after a nested column. For example,
- col A: String
- col B: Map
- key B.K: String
- value B.V: String
- col C: String
For column A, B.K and B.V, the currently logic works well. It won't
return error for A, but will do for B.K and B.V, this is correct. But
for column C, it will return error as its leaf column index (3) will not
match root column index (2), this is incorrect. Column C is a valid leaf
column to be in a predicate.
I think `get_column_root().is_group()` may be a more reliable way to
check in this situation. Just to check if a leaf column's root column is
a group column, we can determine a leaf column is actually on the root.
## Are these changes tested?
<!--
Specify what test covers (unit test, integration test, etc.).
If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Yes
0 commit comments