Skip to content

Derive UDF equality from PartialEq, Hash #16842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

findepi
Copy link
Member

@findepi findepi commented Jul 21, 2025

Reduce boilerplate in cases where implementation of {ScalarUDFImpl,AggregateUDFImpl,WindowUDFImpl}::{equals,hash_value} can be derived using standard PartialEq and Hash traits.

This is code complexity reduction. Follows #16781

While valuable on its own, this also prepares for more automatic derivation of UDF equals/hash and/or removal of default implementations (which currently are error-prone) -- #16677

@findepi findepi marked this pull request as draft July 21, 2025 14:20
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Jul 21, 2025
@findepi findepi force-pushed the findepi/derive-udf-equality-from-partialeq-hash-c0f600 branch from e6bf41d to 234c978 Compare July 21, 2025 14:22
@findepi
Copy link
Member Author

findepi commented Jul 21, 2025

This is POC only for now.
@alamb @kosiew @timsaucer PTAL and let me know if you agree with the direction.

Note the PR goals:

  • reduce complexity of existing code, by deriving ScalarUDFImpl::equals and ScalarUDFImpl::hash_code from PartialEq and Hash implementations (same for AggregateUDFImpl and WindowUDFImpl)

  • if we decide to remove default implementations of ScalarUDFImpl::equals and ScalarUDFImpl::hash_code (ScalarUDFImpl::equals default implementation is error-prone #16677), something like this would be even more valuable. If we decide to keep them, it's still a good change.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@@ -1260,6 +1260,24 @@ pub fn collect_subquery_cols(
})
}

#[macro_export]
macro_rules! udf_equals_hash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this macro needed if we proceed with the proposal in #16677 (comment) ?

It seems like that proposal is pretty awesome -- is it possible to do incrementally, or does it require one massive PR ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this macro needed if we proceed with the proposal in #16677 (comment) ?

No
But it's a good transition step.
It allows us to introduce PartialEq, Hash implementations for functions where PartialEq, Hash may not be trivial (Hashmap, Arc<Fn>).
On the usage side, the temporary udf_equals_hash!(...) line is low cost for that and easy to remove in the second step.

It seems like that proposal is pretty awesome -- is it possible to do incrementally, or does it require one massive PR ?

The change proposed in this PR -- delegation to PartialEq, Hash with udf_equals_hash macro -- can be incremental.

The change proposed in #16677 (comment) is not incremental.

Reduce boilerplate in cases where implementation of
`{ScalarUDFImpl,AggregateUDFImpl,WindowUDFImpl}::{equals,hash_code}` can
be derived using standard `PartialEq` and `Hash` traits.

This is code complexity reduction.

While valuable on its own, this also prepares for more automatic
derivation of UDF equals/hash and/or removal of default implementations
(which currently are error-prone).
@findepi findepi force-pushed the findepi/derive-udf-equality-from-partialeq-hash-c0f600 branch from 234c978 to 493ac60 Compare July 23, 2025 08:30
@findepi findepi marked this pull request as ready for review July 23, 2025 08:30
@github-actions github-actions bot added sql SQL Planner proto Related to proto crate ffi Changes to the ffi crate labels Jul 23, 2025
@findepi
Copy link
Member Author

findepi commented Jul 23, 2025

Update. This PR covers ScalarUDFImpl that today have explicit equals/hash implementations (= closes #16865)

Next steps are defined as sub-issues in #16677

@findepi findepi requested a review from kosiew July 23, 2025 15:45
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi -- this is a nice improvement I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate ffi Changes to the ffi crate logical-expr Logical plan and expressions proto Related to proto crate sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Derive UDF (ScalarUDFImpl) equality from PartialEq, Hash
2 participants