-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: UDF, UDAF, UDWF with_alias(..) should wrap the inner function fully #12098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
datafusion/expr/src/udaf.rs
Outdated
@@ -442,7 +442,7 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { | |||
/// not implement the method, returns an error. Order insensitive and hard | |||
/// requirement aggregators return `Ok(None)`. | |||
fn with_beneficial_ordering( | |||
self: Arc<Self>, | |||
&self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Arc<Self>
was hard to call from the aliased version. Is there a reason why it was Arc'd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that passing through the &Arc
allows the result to be created without a deep copy., though maybe it is being overly optimized 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, dunno if there is a difference between passing an Arc and &self perf wise - neither should require copy. But I got it work with the Arc<> (by adding a .clone() there, but that should only clone the pointer and not the self so I think that's fine as well), so then I don't need to change this and that's probably safer 😅
fixed in 33b89ce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the Arc
in the API is wonky -- it definitely might be better to remove it. Thank you for considering doing so in a follow on PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @Blizzara -- the only question I have is related to the change of the Arc
in the trait.
Perhaps we can contemplate that change as a separate PR so we can merge this one in?
datafusion/expr/src/udaf.rs
Outdated
@@ -442,7 +442,7 @@ pub trait AggregateUDFImpl: Debug + Send + Sync { | |||
/// not implement the method, returns an error. Order insensitive and hard | |||
/// requirement aggregators return `Ok(None)`. | |||
fn with_beneficial_ordering( | |||
self: Arc<Self>, | |||
&self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea is that passing through the &Arc
allows the result to be created without a deep copy., though maybe it is being overly optimized 🤔
@@ -608,6 +608,60 @@ impl AggregateUDFImpl for AliasedAggregateUDFImpl { | |||
&self.aliases | |||
} | |||
|
|||
fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<Field>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 nice - to avoid similar bugs in the future maybe we should also add a comment to AggregateUDFImpl
that tries to remind people to add an implementation here.
It would be cool if we could tell Rust not to create a default implementation for a particular impl
so the compiler could do the check for us, but I don't know how to express that in Rust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added notes in 175b8d2. And yep - it would be nice to have a way of enforcing this, but I also couldn't figure out any way to do that :/
I took the liberty of pushing a commit to flx clippy |
Compilation is failing due to #12108 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thanks @Blizzara
Thanks for the review, fixing clippy, and fixing main! :) |
Which issue does this PR close?
Rationale for this change
While calling
with_alias
onMakeArray
, we noticed some weird type failures:That turned out to be because the AliasedScalarUDFImpl doesn't implement
coerce_types
, so instead of usingMakeArray::coerce_types
we'd end up using the default implementation of ScalarUDFImpl which just throws (that throw is then ignored later on, making this a bit hard to debug).What changes are included in this PR?
Implement the full set of functions in Aliased[Scalar/Aggregate/Window]FunctionUDFImpl to pass through to inner UDF.
Are these changes tested?
No.. any ideas on how to smartly test these? I can add a simple test for the case we hit somewhere, but is there any generic way of guaranteeing that the aliased versions re-implement all of the *UDFImpl methods?
Are there any user-facing changes?
Mostly just fixes