Skip to content

fix: PushDownFilter for GROUP BY on uppercase col names #16049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 56 additions & 1 deletion datafusion/optimizer/src/push_down_filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -941,7 +941,11 @@ impl OptimizerRule for PushDownFilter {
let group_expr_columns = agg
.group_expr
.iter()
.map(|e| Ok(Column::from_qualified_name(e.schema_name().to_string())))
.map(|e| {
Ok(Column::from_qualified_name_ignore_case(
e.schema_name().to_string(),
))
})
.collect::<Result<HashSet<_>>>()?;

let predicates = split_conjunction_owned(filter.predicate);
Expand Down Expand Up @@ -4123,4 +4127,55 @@ mod tests {
"
)
}

/// Create a test table scan with uppercase column names for case sensitivity testing
fn test_table_scan_with_uppercase_columns() -> Result<LogicalPlan> {
let schema = Schema::new(vec![
Field::new("a", DataType::UInt32, false),
Field::new("A", DataType::UInt32, false),
Field::new("B", DataType::UInt32, false),
Field::new("C", DataType::UInt32, false),
]);
table_scan(Some("test"), &schema, None)?.build()
}

#[test]
fn filter_agg_case_insensitive() -> Result<()> {
let table_scan = test_table_scan_with_uppercase_columns()?;
Copy link
Member

@xudong963 xudong963 Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the table also has a column named 'a', what'll happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, just tried this and it works as expected for both uppercase and lower case col, even if both are present in the schema at the same time. I added another test, lmk if we should keep it or it's overkill.

let plan = LogicalPlanBuilder::from(table_scan)
.aggregate(
vec![col(r#""A""#)],
vec![sum(col(r#""B""#)).alias("total_salary")],
)?
.filter(col(r#""A""#).gt(lit(10i64)))?
.build()?;

assert_optimized_plan_equal!(
plan,
@r"
Aggregate: groupBy=[[test.A]], aggr=[[sum(test.B) AS total_salary]]
TableScan: test, full_filters=[test.A > Int64(10)]
"
)
}

#[test]
fn filter_agg_mix_case_insensitive() -> Result<()> {
let table_scan = test_table_scan_with_uppercase_columns()?;
let plan = LogicalPlanBuilder::from(table_scan)
.aggregate(
vec![col("a")],
vec![sum(col(r#""B""#)).alias("total_salary")],
)?
.filter(col("a").gt(lit(10i64)))?
.build()?;

assert_optimized_plan_equal!(
plan,
@r"
Aggregate: groupBy=[[test.a]], aggr=[[sum(test.B) AS total_salary]]
TableScan: test, full_filters=[test.a > Int64(10)]
"
)
}
}