-
Notifications
You must be signed in to change notification settings - Fork 649
perf(v2): static parquet page buffer size #4208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
aa1a4b3
to
7d8894e
Compare
|
7d8894e
to
a37a025
Compare
const ( | ||
// Each 2MB translates to an I/O read op. | ||
parquetReadBufferSize = 2 << 20 | ||
parquetPageWriteBufferSize = 1 << 20 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make sense to increase the parquetReadBufferSize
size further, e.g., 4MB. However, we need to estimate the actual impact on memory consumption.
I'm less confident about parquetPageWriteBufferSize
.
Regarding the previous comment on gaps in row ranges: this is expected due to the row ordering – first by series_id, then by timestamp. As a result, when the time range isn't fully covered, we must skip some rows for each series. Unfortunately, due to the way parquet-go is implemented, this almost always leads to a request to object storage – it's perfectly possible that a full scan is cheaper in this case. Also, see #2192: I would consider swapping the timestamp and series_id sort order: While investigating the issue, I discovered that #4036 causes parquet reader to perform Another interesting finding: I'll create issues. |
a37a025
to
c0624d5
Compare
Theoretically we could also separate different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for tweaking the buffer sizes and the investigations/trialing around it.
I think we should make sure that we don't miss the ideas of that issue and create follow up tasks.
In this PR, I'm increasing the parquet buffers and making them static.
Currently, parquet buffer sizes are determined based on the table size, with lower and upper bounds of 64KB and 1MB, respectively. This approach helps prevent high memory consumption in the segment-writer, compaction-worker, and query-backend services, which typically handle many small tables concurrently. While this strategy works well in the common case – keeping memory usage low – it can negatively impact query performance in certain scenarios, and are generally overly conservative.
An example of what performance degradation looks like and how it can be identified.
Given a query trace with query-backend

Invoke
span (3ac41dbe32f4c111c67181c236628a7e
). It spent 2.3s of CPU time and took 17s of the real time. The span profile:Here we see that CPU time was spent in object storage i/o and parquet decoding (both amd64-avx2 and arm64-naïve implementations demonstrate comparable performance; the profile is collected on arm64).
The bottleneck is

RepeatedRowColumnIterator
, where we actually read pages – 52 events (the last one is summary):The even log suggests that:
It matches expectations: 1MB page, the data range is contiguous, the data chunk is of a manageable size (50MB). Although I find it weird that the number of rows per page varies this much.
The change won't resolve the problem: we trade memory for I/O throughput.