[Parquet]Optimize the performance in record reader #8607
+121
−39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to:
Rationale for this change
Improve the performance in ParquetRecoredBatchReader, especially when the
rowselector
is short.What changes are included in this PR?
parquet/src/column/reader/decoder.rs
, update the hashmap to a enum array.parquet/src/arrow/array_reader/cached_array_reader.rs
, update the hash functionparquet/src/column/reader/decoder.rs
, Update thecounts
logic,read_value_num
will be returned directly by decoder, save the time to do the counts on byte array.Are these changes tested?
The hashmaps are already covered by existing tests.
Updated the related tests in
definition_levels.rs
Also tested by manual read parquets.
Are there any user-facing changes?
No
Performance results in arrow_reader_row_filter.rs
on my 3950X