Currently parquet don't reads some specific bytes, why? #3634
-
for this parquet file :- parquet reader will read :- Start 4 bytes not read because it contains 'PAR1'. |
Beta Was this translation helpful? Give feedback.
Answered by
tustvold
Jan 31, 2023
Replies: 1 comment 5 replies
-
Could you provide a code sample of how you are reading the file? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The remaining bytes are most likely index information which you aren't telling it to read (as there is no point if you aren't doing predicate pushdown).
It can be enabled with https://docs.rs/parquet/latest/parquet/file/serialized_reader/struct.ReadOptionsBuilder.html#method.with_page_index
https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ may also be informative