Skip to content

Conversation

@DuanWeiFan
Copy link
Contributor

Rationale for this change

As discussed in issue, pqarrow parquet writer when writing more than 1 row group, it cannot track the bytesWritten & compressedbytesWritten for closed row group.
The method RowGroupTotalCompressedBytes & RowGroupTotalBytesWritten only returns the current row group.
The ideal is to introduce TotalCompressedBytes() & TotalBytesWritten() as an enhancement for user to track all row group written bytes & compressed bytes.

What changes are included in this PR?

As part of making this change, I notice the totalCompressedBytes is not being populated correctly in row_group_writer.go.
It should follow the same approach as bytesWritten where it tracks all the closed column_writer.

Are these changes tested?

Yes. I added test cases to ensure those metrics are populated correctly

Are there any user-facing changes?

Yes. There are two new methods introduced:

  1. TotalCompressedBytes()
  2. TotalBytesWritten()

It won't impact on existing users.

@DuanWeiFan DuanWeiFan requested a review from zeroshade as a code owner October 25, 2025 20:03
Comment on lines +118 to +119
fw.totalCompressedBytes += fw.rgw.TotalCompressedBytes()
fw.totalBytesWritten += fw.rgw.TotalBytesWritten()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FileWriter.Close when we write the metadata, we should probably add the size of the metadata we write to the total bytes written, yea?

Comment on lines +174 to +175
assert.Greater(t, writer.TotalCompressedBytes(), int64(0))
assert.Greater(t, writer.TotalBytesWritten(), int64(0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're creating the test and know the settings, we can probably know the exact size to test for and have the test be explicit on the size we expect so that we know if we break anything in the future. Make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants