Skip to content

Add hash-based SUM_ANSI aggregation for INT64 values #19403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: branch-25.10
Choose a base branch
from

Conversation

PointKernel
Copy link
Member

Description

Contributes to #19243

This PR introduces a new aggregation kind, SUM_ANSI, which returns a STRUCT containing the sum and a boolean indicating overflow. If an overflow occurs, the corresponding row in the overflow boolean column will be set to true.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Jul 16, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 16, 2025
@PointKernel PointKernel added non-breaking Non-breaking change feature request New feature or request labels Jul 16, 2025
@PointKernel
Copy link
Member Author

/ok to test aa77170

Copy link
Contributor

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API looks great, but I am a bit surprised that this is targeted for 25.08. I thought it was not going to happen until 25.10. I'll need some time to run tests and benchmarks to see if it does what I want/need.

I don't see a corresponding reduction operations, but that is probably okay for now. I have implemented another work around for the time being which has okay performance. Because I have another workaround, I was not expecting this so quickly, it may take me a few days to get all of that done.

@davidwendt
Copy link
Contributor

The API looks great, but I am a bit surprised that this is targeted for 25.08. I thought it was not going to happen until 25.10. I'll need some time to run tests and benchmarks to see if it does what I want/need.

The branch-25.10 does not exist yet. Once it does, this will likely move there since burndown starts today.

/**
* @brief Derived class for specifying a sum_ansi aggregation
*/
class sum_ansi_aggregation final : public groupby_aggregation, public groupby_scan_aggregation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just add an overflow_check member to the existing sum_aggregation and default it to false?
I'm not sure ANSI communicates the behavior here very well.
We have a few examples of aggregations with member data like quantile:

interpolation _interpolation; ///< Desired interpolation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I'm not too familiar with the setup of quantile aggregation, so I'll need to take a closer look. My main concern for now is that the overflow_check parameter is determined at runtime, which influences the output type and, consequently, how the aggregation is performed. Since we typically treat the aggregation kind as a template parameter and specialize for each case, introducing runtime branching could affect performance. Hash-based aggregation is quite register-sensitive, so we’ll need to carefully evaluate the overhead of any runtime dispatching.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @PointKernel for looking into the parameter suggestion. If there are concrete benefits from introducing a new aggregation kind then that would seem fine to me. We might want to call if "SUM_WITH_OVERFLOW" or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for naming the aggregation kind something like "SUM_WITH_OVERFLOW" (or "SUM_WITH_OVERFLOW_CHECK" if we are feeling wordy 😄)
IMO, having this operation as a new aggregation kind makes sense if overflow check is less likely to be turned on than off. Sum is a popular aggregation type, and if there's a performance impact from having the overflow_check parameter determined at runtime, then I'd support having a template specialization for this aggregation kind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was picturing the overflow flag could be turned into a template variable internally but that may more complicated to fit in the current design. Also, the different output type is a good argument for a separate aggregation type.

@GregoryKimball GregoryKimball moved this to Burndown in libcudf Jul 17, 2025
@GregoryKimball GregoryKimball requested a review from shrshi July 17, 2025 18:03
Copy link
Contributor

@shrshi shrshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions -

@@ -1348,15 +1371,15 @@ struct target_type_impl<Source,

constexpr bool is_sum_product_agg(aggregation::Kind k)
{
return (k == aggregation::SUM) || (k == aggregation::PRODUCT) ||
return (k == aggregation::SUM) || (k == aggregation::SUM_ANSI) || (k == aggregation::PRODUCT) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we need to add the new aggregation kind to this function since we are explicitly disregarding it in the requires clause of target_type_impl

if (agg == cudf::aggregation::SUM_ANSI) {
if (col.size() == 0) {
// For empty columns, create empty struct column manually
std::vector<std::unique_ptr<cudf::column>> children;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we move the children vector construction into a lambda?

// Create struct column with the children
// For SUM_ANSI, make struct nullable if input has nulls (same as other aggregations)
if (nullable) {
// Start with ALL_NULL, results will be marked valid during aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something basic here, but why do we start with all null for the struct column and its children if the input column is nullable?

// Start with ALL_NULL, results will be marked valid during aggregation
auto null_mask = cudf::create_null_mask(col.size(), cudf::mask_state::ALL_NULL, stream);
auto null_count = col.size(); // All null initially
return make_structs_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibly naive question - can we use create_structs_hierarchy here instead of make_structs_column? We can save on null sanitization costs if possible.

{
// For SUM_ANSI, target is a struct with sum value at child(0) and overflow flag at child(1)
auto sum_column = target.child(0);
auto overflow_column = target.child(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: can we exit early if overflow value at target index is already true? If Spark is either erroring out or returning null on overflow, I'm wondering if we need to compute the entire aggregation result?

@ttnghia
Copy link
Contributor

ttnghia commented Jul 21, 2025

Sorry I have to say that I don't like the idea of having a separate _ANSI aggregation like this. We will need to check overflow not only just for SUM but also other aggs such as MULTIPLY and more. I imagine that having a lot of pairs AGG_NAME and AGG_NAME_ANSI would be very awkward.

Although I don't have a perfect solution for it, I feel that adding an optional overflow_check to the aggregation constructor seems a better solution for the long term.

@GregoryKimball GregoryKimball moved this from Burndown to Slip in libcudf Jul 23, 2025
Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checking for overflow uses UB. For an approach that doesn't do so, and gives you the same information with a branch-free kernel please see (and possible promote to a shared header somewhere) the saturating implementation in src/rolling/detail/range_utils.cuh

@@ -94,6 +94,7 @@ class aggregation {
*/
enum Kind {
SUM, ///< sum reduction
SUM_ANSI, ///< sum reduction with ANSI overflow semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: link to "ANSI overflow semantics". An external user of this API may well have no idea what this means.

Comment on lines +178 to +181
// Check for overflow: if old_sum and source_value have same sign but result has different sign
auto const new_sum = old_sum + source_value;
auto const overflow = ((old_sum > 0 && source_value > 0 && new_sum < 0) ||
(old_sum < 0 && source_value < 0 && new_sum > 0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: signed integer overflow is UB, so the compiler has no obligation to generate code for these exceptional cases, and may (will in some future version) optimise this to auto const overflow = false.

9223372036854775800L, // Close to INT64_MAX
9223372036854775800L, // Close to INT64_MAX
1L, // Small value
400L}; // Small value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some test conditions that produce overflow in the "other" direction.

Comment on lines +128 to +132
// SUM_ANSI returns a struct with sum (int64_t) and overflow (bool) children
std::vector<std::unique_ptr<cudf::column>> children;
children.push_back(make_empty_column(cudf::data_type{cudf::type_id::INT64}));
children.push_back(make_empty_column(cudf::data_type{cudf::type_id::BOOL8}));
return make_structs_column(0, std::move(children), 0, {}, stream, mr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What input types are supported for sum_ansi? It seems only signed integers. However, I don't see any exceptions reported if an unsupported input type is provided?

Comment on lines +104 to +112
thrust::fill(rmm::exec_policy_nosync(stream),
sum_col.begin<int64_t>(),
sum_col.end<int64_t>(),
int64_t{0});

thrust::fill(rmm::exec_policy_nosync(stream),
overflow_col.begin<bool>(),
overflow_col.end<bool>(),
false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: do we need two kernels, or can we fill both columns using a zip iterator?

@PointKernel PointKernel changed the base branch from branch-25.08 to branch-25.10 July 25, 2025 18:28
@davidwendt
Copy link
Contributor

We will need to check overflow not only just for SUM but also other aggs such as MULTIPLY and more. I imagine that having a lot of pairs AGG_NAME and AGG_NAME_ANSI would be very awkward.

This is another concern in that adding overflow detection to one aggregation opens up doubling libcudf aggregation types to support overflow (or other checks) for all of them.

@davidwendt
Copy link
Contributor

Added a comment to the original issue about Spark possibly using the HOST_UDF aggregator type.
#19243 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: Slip
Development

Successfully merging this pull request may close these issues.

7 participants