You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix a potential segfault in PQ reader's number of rows per source calculation (#18906)
This PR fixes a potential segfault from illegal memory access (out of range vector index) in computation of number of rows read from each source in parquet reader. The segfault should manifest when the `num_rows` option leads us to not read from all data sources.
Consider this example: we have 5 files with 1000 rows each, we set our skip rows to 1500 and num rows to 1000. The `partial_sum_nrows_source` vector (at L725 in `reader_impl.cpp`) would look like this: [0, 500, 1000, 1000, 1000, 1000]. Here, computing the `end_iter` (end_row = 1000) with std::upper_bound will give us `std::end(..)` instead of `std::begin(..) + 2` which should lead to a segfault.
Authors:
- Muhammad Haseeb (https://github.com/mhaseeb123)
Approvers:
- Bradley Dice (https://github.com/bdice)
- Vyas Ramasubramani (https://github.com/vyasr)
- Vukasin Milovanovic (https://github.com/vuule)
- David Wendt (https://github.com/davidwendt)
- Nghia Truong (https://github.com/ttnghia)
URL: #18906
0 commit comments