[FEA] Add a method to cudf::table and cudf::column to get its size in bytes without kernel launches or d -> h memcpy

**Is your feature request related to a problem? Please describe.**
Currently, there is no clean way to get the size in bytes of all buffers owned by a const `cudf::table` or a `cudf::column` without requiring estimation and d->h copies.
We can use the following APIs which can also work with a `cudf::table_view`:
- `cudf::bitmask_allocation_size_bytes` to **estimate** the nullmask buffer size.
- `cudf::strings_column_view::chars_size` for char size of any string columns. However this needs to do a d->h copy to get a single element

**Describe the solution you'd like**
The desired API should be able to return the size in bytes value by summing the sizes of all device_buffers owned by all constituent columns of a `cudf::table` 

**Describe alternatives you've considered**
Currently, we're able to workaround this by disassembling, inspecting, and reassembling the `cudf::table` and `cudf::column` like so:
```c++
std::pair<uint64_t, std::unique_ptr<cudf::column>> getColumnSize(
    std::unique_ptr<cudf::column> column) {
  // Store column metadata (type, null count, and size) before releasing it,
  // as the release() operation transfers ownership of the underlying buffers
  // and invalidates access to these properties.
  auto type = column->type();
  auto nullCount = column->null_count();
  auto size = column->size();

  auto contents = column->release();
  auto bytes = contents.data->size() + contents.null_mask->size();

  // Recursively get the size of the children columns.
  std::vector<std::unique_ptr<cudf::column>> children;
  for (auto& child : contents.children) {
    auto [childBytes, childColumn] = getColumnSize(std::move(child));
    bytes += childBytes;
    children.push_back(std::move(childColumn));
  }

  // Reassemble the column with the original metadata.
  auto reconstitutedColumn = std::make_unique<cudf::column>(
      type,
      size,
      std::move(*contents.data.release()),
      std::move(*contents.null_mask.release()),
      nullCount,
      std::move(children));

  return std::make_pair(bytes, std::move(reconstitutedColumn));
}

std::pair<uint64_t, std::unique_ptr<cudf::table>> getTableSize(
    std::unique_ptr<cudf::table>&& table) {
  auto columns = table->release();
  std::vector<std::unique_ptr<cudf::column>> columnsOut;
  uint64_t totalBytes = 0;

  for (auto& column : columns) {
    auto [bytes, columnOut] = getColumnSize(std::move(column));
    totalBytes += bytes;
    columnsOut.push_back(std::move(columnOut));
  }
  return std::make_pair(
      totalBytes, std::make_unique<cudf::table>(std::move(columnsOut)));
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Add a method to cudf::table and cudf::column to get its size in bytes without kernel launches or d -> h memcpy #18462

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Add a method to cudf::table and cudf::column to get its size in bytes without kernel launches or d -> h memcpy #18462

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions