Closed
Description
Is your feature request related to a problem? Please describe.
Currently, there is no clean way to get the size in bytes of all buffers owned by a const cudf::table
or a cudf::column
without requiring estimation and d->h copies.
We can use the following APIs which can also work with a cudf::table_view
:
cudf::bitmask_allocation_size_bytes
to estimate the nullmask buffer size.cudf::strings_column_view::chars_size
for char size of any string columns. However this needs to do a d->h copy to get a single element
Describe the solution you'd like
The desired API should be able to return the size in bytes value by summing the sizes of all device_buffers owned by all constituent columns of a cudf::table
Describe alternatives you've considered
Currently, we're able to workaround this by disassembling, inspecting, and reassembling the cudf::table
and cudf::column
like so:
std::pair<uint64_t, std::unique_ptr<cudf::column>> getColumnSize(
std::unique_ptr<cudf::column> column) {
// Store column metadata (type, null count, and size) before releasing it,
// as the release() operation transfers ownership of the underlying buffers
// and invalidates access to these properties.
auto type = column->type();
auto nullCount = column->null_count();
auto size = column->size();
auto contents = column->release();
auto bytes = contents.data->size() + contents.null_mask->size();
// Recursively get the size of the children columns.
std::vector<std::unique_ptr<cudf::column>> children;
for (auto& child : contents.children) {
auto [childBytes, childColumn] = getColumnSize(std::move(child));
bytes += childBytes;
children.push_back(std::move(childColumn));
}
// Reassemble the column with the original metadata.
auto reconstitutedColumn = std::make_unique<cudf::column>(
type,
size,
std::move(*contents.data.release()),
std::move(*contents.null_mask.release()),
nullCount,
std::move(children));
return std::make_pair(bytes, std::move(reconstitutedColumn));
}
std::pair<uint64_t, std::unique_ptr<cudf::table>> getTableSize(
std::unique_ptr<cudf::table>&& table) {
auto columns = table->release();
std::vector<std::unique_ptr<cudf::column>> columnsOut;
uint64_t totalBytes = 0;
for (auto& column : columns) {
auto [bytes, columnOut] = getColumnSize(std::move(column));
totalBytes += bytes;
columnsOut.push_back(std::move(columnOut));
}
return std::make_pair(
totalBytes, std::make_unique<cudf::table>(std::move(columnsOut)));
}
Metadata
Metadata
Assignees
Type
Projects
Status