-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: extend recursive protection to prevent stack overflows in additional functions #16506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 |
🤖: Benchmark completed Details
|
Which issue does this PR close?
Fixes stack overflows caused by deeply nested query plans during recursive function calls in various optimizer and expression evaluation paths.
Rationale for this change
DataFusion was experiencing stack overflow panics when processing deeply nested query plans. The issue occurred because many functions throughout the codebase perform recursive operations on query plans and expressions without stack overflow protection. When these operations encounter deeply nested structures, they can exhaust the call stack, causing the application to crash.
This change extends the existing recursive protection mechanism to additional functions across the optimizer, physical expression planner, and other core components that were previously unprotected.
Note that #16404 also helped mitigate the stack overflows I encountered with my reproducer, very likely because cloning
Expr
is not stack friendly especially whenExpr
is deeply nested. This is another data point in favour of #9577.What changes are included in this PR?
#[cfg_attr(feature = "recursive_protection", recursive::recursive)]
annotations to various recursive functions across multiple cratesrecursive_protection
feature todatafusion-physical-expr
anddatafusion-substrait
cratesAre these changes tested?
The changes are primarily defensive additions that activate existing recursion protection when the
recursive_protection
feature is enabled. The recursion protection mechanism itself is already used in the codebase. Ideally we should work on automating the detection of lacking recursion protection.About the reproducer: it's a 5MB+ Substrait JSON plan that's deeply nested (but valid). I don't see value in sharing it publicly as it contains various custom extensions and can't be used without an adequate Datadog-specific setup. However It helped me debug and validate the proposed changes.
Are there any user-facing changes?
No