-
Notifications
You must be signed in to change notification settings - Fork 961
Add hash-based SUM_ANSI aggregation for INT64 values #19403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
/ok to test aa77170 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API looks great, but I am a bit surprised that this is targeted for 25.08. I thought it was not going to happen until 25.10. I'll need some time to run tests and benchmarks to see if it does what I want/need.
I don't see a corresponding reduction operations, but that is probably okay for now. I have implemented another work around for the time being which has okay performance. Because I have another workaround, I was not expecting this so quickly, it may take me a few days to get all of that done.
The branch-25.10 does not exist yet. Once it does, this will likely move there since burndown starts today. |
/** | ||
* @brief Derived class for specifying a sum_ansi aggregation | ||
*/ | ||
class sum_ansi_aggregation final : public groupby_aggregation, public groupby_scan_aggregation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just add an overflow_check
member to the existing sum_aggregation
and default it to false
?
I'm not sure ANSI communicates the behavior here very well.
We have a few examples of aggregations with member data like quantile:
interpolation _interpolation; ///< Desired interpolation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I'm not too familiar with the setup of quantile aggregation, so I'll need to take a closer look. My main concern for now is that the overflow_check
parameter is determined at runtime, which influences the output type and, consequently, how the aggregation is performed. Since we typically treat the aggregation kind as a template parameter and specialize for each case, introducing runtime branching could affect performance. Hash-based aggregation is quite register-sensitive, so we’ll need to carefully evaluate the overhead of any runtime dispatching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @PointKernel for looking into the parameter suggestion. If there are concrete benefits from introducing a new aggregation kind then that would seem fine to me. We might want to call if "SUM_WITH_OVERFLOW" or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for naming the aggregation kind something like "SUM_WITH_OVERFLOW" (or "SUM_WITH_OVERFLOW_CHECK" if we are feeling wordy 😄)
IMO, having this operation as a new aggregation kind makes sense if overflow check is less likely to be turned on than off. Sum is a popular aggregation type, and if there's a performance impact from having the overflow_check
parameter determined at runtime, then I'd support having a template specialization for this aggregation kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was picturing the overflow flag could be turned into a template variable internally but that may more complicated to fit in the current design. Also, the different output type is a good argument for a separate aggregation type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions -
@@ -1348,15 +1371,15 @@ struct target_type_impl<Source, | |||
|
|||
constexpr bool is_sum_product_agg(aggregation::Kind k) | |||
{ | |||
return (k == aggregation::SUM) || (k == aggregation::PRODUCT) || | |||
return (k == aggregation::SUM) || (k == aggregation::SUM_ANSI) || (k == aggregation::PRODUCT) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we need to add the new aggregation kind to this function since we are explicitly disregarding it in the requires
clause of target_type_impl
if (agg == cudf::aggregation::SUM_ANSI) { | ||
if (col.size() == 0) { | ||
// For empty columns, create empty struct column manually | ||
std::vector<std::unique_ptr<cudf::column>> children; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we move the children vector construction into a lambda?
// Create struct column with the children | ||
// For SUM_ANSI, make struct nullable if input has nulls (same as other aggregations) | ||
if (nullable) { | ||
// Start with ALL_NULL, results will be marked valid during aggregation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something basic here, but why do we start with all null for the struct column and its children if the input column is nullable?
// Start with ALL_NULL, results will be marked valid during aggregation | ||
auto null_mask = cudf::create_null_mask(col.size(), cudf::mask_state::ALL_NULL, stream); | ||
auto null_count = col.size(); // All null initially | ||
return make_structs_column( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibly naive question - can we use create_structs_hierarchy
here instead of make_structs_column
? We can save on null sanitization costs if possible.
{ | ||
// For SUM_ANSI, target is a struct with sum value at child(0) and overflow flag at child(1) | ||
auto sum_column = target.child(0); | ||
auto overflow_column = target.child(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: can we exit early if overflow value at target index is already true? If Spark is either erroring out or returning null on overflow, I'm wondering if we need to compute the entire aggregation result?
Sorry I have to say that I don't like the idea of having a separate Although I don't have a perfect solution for it, I feel that adding an optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checking for overflow uses UB. For an approach that doesn't do so, and gives you the same information with a branch-free kernel please see (and possible promote to a shared header somewhere) the saturating
implementation in src/rolling/detail/range_utils.cuh
@@ -94,6 +94,7 @@ class aggregation { | |||
*/ | |||
enum Kind { | |||
SUM, ///< sum reduction | |||
SUM_ANSI, ///< sum reduction with ANSI overflow semantics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: link to "ANSI overflow semantics". An external user of this API may well have no idea what this means.
// Check for overflow: if old_sum and source_value have same sign but result has different sign | ||
auto const new_sum = old_sum + source_value; | ||
auto const overflow = ((old_sum > 0 && source_value > 0 && new_sum < 0) || | ||
(old_sum < 0 && source_value < 0 && new_sum > 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: signed integer overflow is UB, so the compiler has no obligation to generate code for these exceptional cases, and may (will in some future version) optimise this to auto const overflow = false
.
9223372036854775800L, // Close to INT64_MAX | ||
9223372036854775800L, // Close to INT64_MAX | ||
1L, // Small value | ||
400L}; // Small value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some test conditions that produce overflow in the "other" direction.
// SUM_ANSI returns a struct with sum (int64_t) and overflow (bool) children | ||
std::vector<std::unique_ptr<cudf::column>> children; | ||
children.push_back(make_empty_column(cudf::data_type{cudf::type_id::INT64})); | ||
children.push_back(make_empty_column(cudf::data_type{cudf::type_id::BOOL8})); | ||
return make_structs_column(0, std::move(children), 0, {}, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What input types are supported for sum_ansi
? It seems only signed integers. However, I don't see any exceptions reported if an unsupported input type is provided?
thrust::fill(rmm::exec_policy_nosync(stream), | ||
sum_col.begin<int64_t>(), | ||
sum_col.end<int64_t>(), | ||
int64_t{0}); | ||
|
||
thrust::fill(rmm::exec_policy_nosync(stream), | ||
overflow_col.begin<bool>(), | ||
overflow_col.end<bool>(), | ||
false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: do we need two kernels, or can we fill both columns using a zip iterator?
This is another concern in that adding overflow detection to one aggregation opens up doubling libcudf aggregation types to support overflow (or other checks) for all of them. |
Added a comment to the original issue about Spark possibly using the |
Description
Contributes to #19243
This PR introduces a new aggregation kind,
SUM_ANSI
, which returns aSTRUCT
containing the sum and a boolean indicating overflow. If an overflow occurs, the corresponding row in the overflow boolean column will be set totrue
.Checklist