Skip to content

[FEA] Add a method to cudf::table and cudf::column to get its size in bytes without kernel launches or d -> h memcpy #18462

Closed
@devavret

Description

@devavret

Is your feature request related to a problem? Please describe.
Currently, there is no clean way to get the size in bytes of all buffers owned by a const cudf::table or a cudf::column without requiring estimation and d->h copies.
We can use the following APIs which can also work with a cudf::table_view:

  • cudf::bitmask_allocation_size_bytes to estimate the nullmask buffer size.
  • cudf::strings_column_view::chars_size for char size of any string columns. However this needs to do a d->h copy to get a single element

Describe the solution you'd like
The desired API should be able to return the size in bytes value by summing the sizes of all device_buffers owned by all constituent columns of a cudf::table

Describe alternatives you've considered
Currently, we're able to workaround this by disassembling, inspecting, and reassembling the cudf::table and cudf::column like so:

std::pair<uint64_t, std::unique_ptr<cudf::column>> getColumnSize(
    std::unique_ptr<cudf::column> column) {
  // Store column metadata (type, null count, and size) before releasing it,
  // as the release() operation transfers ownership of the underlying buffers
  // and invalidates access to these properties.
  auto type = column->type();
  auto nullCount = column->null_count();
  auto size = column->size();

  auto contents = column->release();
  auto bytes = contents.data->size() + contents.null_mask->size();

  // Recursively get the size of the children columns.
  std::vector<std::unique_ptr<cudf::column>> children;
  for (auto& child : contents.children) {
    auto [childBytes, childColumn] = getColumnSize(std::move(child));
    bytes += childBytes;
    children.push_back(std::move(childColumn));
  }

  // Reassemble the column with the original metadata.
  auto reconstitutedColumn = std::make_unique<cudf::column>(
      type,
      size,
      std::move(*contents.data.release()),
      std::move(*contents.null_mask.release()),
      nullCount,
      std::move(children));

  return std::make_pair(bytes, std::move(reconstitutedColumn));
}

std::pair<uint64_t, std::unique_ptr<cudf::table>> getTableSize(
    std::unique_ptr<cudf::table>&& table) {
  auto columns = table->release();
  std::vector<std::unique_ptr<cudf::column>> columnsOut;
  uint64_t totalBytes = 0;

  for (auto& column : columns) {
    auto [bytes, columnOut] = getColumnSize(std::move(column));
    totalBytes += bytes;
    columnsOut.push_back(std::move(columnOut));
  }
  return std::make_pair(
      totalBytes, std::make_unique<cudf::table>(std::move(columnsOut)));
}

Metadata

Metadata

Assignees

Labels

feature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions