-
Notifications
You must be signed in to change notification settings - Fork 405
Added MaxContributions to privacy-on-beam/pbeam/count.go #337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ciated codepath at go/dpagg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG overall, thanks a lot for working on this!
| return nil | ||
| } | ||
|
|
||
| // CheckContributionBoundingOptions returns an error if exactly one of MaxContributions and MaxPartitionsContributed is not set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "returns an error unless exactly one of MaxContributions and MaxPartitionsContributed is set" or "checks that exactly one of MaxContributions or MaxPartitionsContributed is set".
| return fmt.Errorf("MaxContributions and MaxPartitionsContributed cannot be set at the same time") | ||
| } | ||
| if maxContributions <= 0 && maxPartitionsContributed <= 0 { | ||
| return fmt.Errorf("MaxContributions and MaxPartitionsContributed cannot be both 0 at the same time") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "MaxContributions and MaxPartitionsContributed are both unset, exactly one should be set."
| } | ||
|
|
||
| // CheckContributionBoundingOptions returns an error if exactly one of MaxContributions and MaxPartitionsContributed is not set. | ||
| func CheckContributionBoundingOptions(maxContributions, maxPartitionsContributed int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to check for Lower & Upper here as well. Maybe we have two functions, one w/o Lower & Upper, and one with. But AFAICT, we actually only need the one w/ Lower & Upper
| PreThreshold int64 | ||
| // MaxPartitionsContributed is the number of distinct partitions a single | ||
| // privacy unit can contribute to. Required. | ||
| // privacy unit can contribute to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting MaxContributions and setting MaxPartitionsContributed is exactly the same though, if you look at lines 161-166. So maybe instead of introducing a new parameter, we could simply mention in the documentation for MaxPartitionsContributed that for PreAggSelectPartitions, you can use MaxPartitionsContributed instead of MaxContributions.
| // How many times may a single privacy unit contribute in total to all partitions? | ||
| // Currently only used for Count aggregation function. | ||
| // | ||
| // Mutually exclusive with MaxPartitionsContributed, Lower and Upper. One of the two options is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not very clear. Suggestion: "Mutually exclusive with set of {MaxPartitionsContributed, Lower, Upper}. Required when {MaxPartitionsContributed, Lower, Upper} are not set."
Similarly, for consistency, in the documentation of MaxPartitionsContributed, Lower and Upper, we could say "Mutually exclusive with MaxContributions. Required to be specified along with Lower and Upper when MaxContributions is not set." (change Lower and Upper to other two for each parameter ofc)
| // MaxPartitionsContributed and MaxValue are mutually exclusive with MaxContributions. | ||
| // One of the two options is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion here about documentation in dpagg
| // Third, now that contribution bounding is done, remove the privacy keys, | ||
| // decode the value, and sum all the counts bounded by MaxValue. | ||
| // Fourth, now that contribution bounding is done, remove the privacy keys, | ||
| // decode the value, and sum all the counts bounded by MaxValue or MaxContributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is confusing. When MaxContributions are used, we don't actually do any further bounding here. The dpagg.Add functions used internally shouldn't clamp anything because we have already clamped in the first step above. Does that make sense? TBH, I am actually debating if we should get rid of setting upper = opt.MaxContributions in dpagg.Sum and explicitly mention that no bounding will happen there but I am not sure if that's the best option.
So here maybe we can say something like "... bounded by MaxValue.\n\nIf MaxContributions are set, because we are already finished with per-privacy identifier contribution bounding, the bounding done after this should be no-op."
| // How many times may a single privacy unit contribute to a single partition? | ||
| // Defaults to 1. This is only needed for other aggregation functions using BoundedSum; | ||
| // which is why the option is not exported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also mention that setting this doesn't work for MaxContributions
| // The total contribution a single privacy identifier can make across all | ||
| // partitions. This is currently only supported for BoundedSumInt64, and is | ||
| // unexported. | ||
| maxContributions int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this in this PR
| // checkContributionBounding checks that either MaxContributions or MaxValue/MaxPartitionsContributed is set, but not both. | ||
| // If MaxContributions is set, MaxValue and MaxPartitionsContributed must be 0. | ||
| // If MaxValue/MaxPartitionsContributed is set, MaxContributions must be 0. | ||
| func checkContributionBounding(maxContributions int64, maxValue int64, maxPartitionsContributed int64) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not duplicate code as much as possible, so maybe we can reuse the go/checks/checks.go function we implemented here? If need be, we could add two more parameters (lower/upper) there, or create a new function with those two new parameters. WDYT?
Added MaxContributions to privacy-on-beam/pbeam/count.go and the associated codepath at go/dpagg.
The proposal can be found in this document.