Open
Description
Is your feature request related to a problem or challenge?
Running SELECT COUNT(1)
when using iceberg-datafusion results in a table scan. This can be avoided by implementing ExecutionPlan::statistics. Datafusion does this for its built-in parquet scanner by fetching the statistics from parquet metadata when constructing the ExecutionPlan. I was looking to implement this in a similar way (at least for tables without deletes) by iterating over the ManifestEntry
s and summing the record_count
s. I have a draft PR but wanted to confirm this approach is acceptable before putting in the work to clean it up.
Describe the solution you'd like
count(*)
in datafusion does not perform a table scan
Willingness to contribute
I would be willing to contribute to this feature with guidance from the Iceberg Rust community