Skip to content

ParquetMetaData memory size is not reported accurately when encryption is enabled #8472

@alamb

Description

@alamb

Describe the bug
While working on #8470 I noticed that the API to report memory usage when encryption was used undercounts the actual memory used

ParquetMetaData::memory_size is used for memory accounting for in memory parquet caches, and thus should be accurate

To Reproduce
Specifically this function

pub fn memory_size(&self) -> usize {
std::mem::size_of::<Self>()
+ self.file_metadata.heap_size()
+ self.row_groups.heap_size()
+ self.column_index.heap_size()
+ self.offset_index.heap_size()

Does not account for the heap allocations in the file_decryptor field:

file_decryptor: Option<FileDecryptor>,

Expected behavior
ParquetMetaData::memory_size should report its actually heap allocation size (by implementing the HeapSize trait for FileDecryptor and all its subfields

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions