Skip to content

Conversation

@kutay25
Copy link

@kutay25 kutay25 commented Sep 24, 2025

Added MaxContributions to privacy-on-beam/pbeam/count.go and the associated codepath at go/dpagg.

The proposal can be found in this document.

@miracvbasaran miracvbasaran self-requested a review September 24, 2025 14:31
Copy link
Collaborator

@miracvbasaran miracvbasaran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG overall, thanks a lot for working on this!

return nil
}

// CheckContributionBoundingOptions returns an error if exactly one of MaxContributions and MaxPartitionsContributed is not set.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "returns an error unless exactly one of MaxContributions and MaxPartitionsContributed is set" or "checks that exactly one of MaxContributions or MaxPartitionsContributed is set".

return fmt.Errorf("MaxContributions and MaxPartitionsContributed cannot be set at the same time")
}
if maxContributions <= 0 && maxPartitionsContributed <= 0 {
return fmt.Errorf("MaxContributions and MaxPartitionsContributed cannot be both 0 at the same time")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "MaxContributions and MaxPartitionsContributed are both unset, exactly one should be set."

}

// CheckContributionBoundingOptions returns an error if exactly one of MaxContributions and MaxPartitionsContributed is not set.
func CheckContributionBoundingOptions(maxContributions, maxPartitionsContributed int64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check for Lower & Upper here as well. Maybe we have two functions, one w/o Lower & Upper, and one with. But AFAICT, we actually only need the one w/ Lower & Upper

PreThreshold int64
// MaxPartitionsContributed is the number of distinct partitions a single
// privacy unit can contribute to. Required.
// privacy unit can contribute to.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting MaxContributions and setting MaxPartitionsContributed is exactly the same though, if you look at lines 161-166. So maybe instead of introducing a new parameter, we could simply mention in the documentation for MaxPartitionsContributed that for PreAggSelectPartitions, you can use MaxPartitionsContributed instead of MaxContributions.

// How many times may a single privacy unit contribute in total to all partitions?
// Currently only used for Count aggregation function.
//
// Mutually exclusive with MaxPartitionsContributed, Lower and Upper. One of the two options is required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very clear. Suggestion: "Mutually exclusive with set of {MaxPartitionsContributed, Lower, Upper}. Required when {MaxPartitionsContributed, Lower, Upper} are not set."

Similarly, for consistency, in the documentation of MaxPartitionsContributed, Lower and Upper, we could say "Mutually exclusive with MaxContributions. Required to be specified along with Lower and Upper when MaxContributions is not set." (change Lower and Upper to other two for each parameter ofc)

Comment on lines +84 to +85
// MaxPartitionsContributed and MaxValue are mutually exclusive with MaxContributions.
// One of the two options is required.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion here about documentation in dpagg

// Third, now that contribution bounding is done, remove the privacy keys,
// decode the value, and sum all the counts bounded by MaxValue.
// Fourth, now that contribution bounding is done, remove the privacy keys,
// decode the value, and sum all the counts bounded by MaxValue or MaxContributions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing. When MaxContributions are used, we don't actually do any further bounding here. The dpagg.Add functions used internally shouldn't clamp anything because we have already clamped in the first step above. Does that make sense? TBH, I am actually debating if we should get rid of setting upper = opt.MaxContributions in dpagg.Sum and explicitly mention that no bounding will happen there but I am not sure if that's the best option.

So here maybe we can say something like "... bounded by MaxValue.\n\nIf MaxContributions are set, because we are already finished with per-privacy identifier contribution bounding, the bounding done after this should be no-op."

Comment on lines 85 to 87
// How many times may a single privacy unit contribute to a single partition?
// Defaults to 1. This is only needed for other aggregation functions using BoundedSum;
// which is why the option is not exported.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also mention that setting this doesn't work for MaxContributions

Comment on lines +110 to +113
// The total contribution a single privacy identifier can make across all
// partitions. This is currently only supported for BoundedSumInt64, and is
// unexported.
maxContributions int64
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this in this PR

// checkContributionBounding checks that either MaxContributions or MaxValue/MaxPartitionsContributed is set, but not both.
// If MaxContributions is set, MaxValue and MaxPartitionsContributed must be 0.
// If MaxValue/MaxPartitionsContributed is set, MaxContributions must be 0.
func checkContributionBounding(maxContributions int64, maxValue int64, maxPartitionsContributed int64) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not duplicate code as much as possible, so maybe we can reuse the go/checks/checks.go function we implemented here? If need be, we could add two more parameters (lower/upper) there, or create a new function with those two new parameters. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants