Open
Description
Apache Iceberg Rust version
Current, v0.4.0
Describe the bug
The current implementation for partition filtering treats a missing lower_bound
/upper_bound
value as though all rows are null:
Current manifest_evaluator.rs:
fn less_than(
&mut self,
reference: &BoundReference,
datum: &Datum,
_predicate: &BoundPredicate,
) -> crate::Result<bool> {
let field = self.field_summary_for_reference(reference);
match &field.lower_bound {
Some(bound) if datum <= bound => ROWS_CANNOT_MATCH,
Some(_) => ROWS_MIGHT_MATCH,
None => ROWS_CANNOT_MATCH,
}
}
This means that if statistics were not computed on a given partition file, that file will be excluded no matter what.
For comparison, the Java implementation handles this correctly:
T lower = lowerBound(term);
if (null == lower || NaNUtil.isNaN(lower)) {
// NaN indicates unreliable bounds. See the InclusiveMetricsEvaluator docs for more.
return ROWS_MIGHT_MATCH;
}
by treating a null
lower bound as indicating that all rows might match.
To Reproduce
No response
Expected behavior
A partition file with a missing lower_bound
column should not be excluded (should be included) from scans that filter on that column with <
/<=
/>
/>=
.
Willingness to contribute
I can contribute a fix for this bug independently