-
Notifications
You must be signed in to change notification settings - Fork 90
docs(ADR): extends the fractional operator to support up to .001% distributions #1800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4f0f886
15ea16c
8ac7bb3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,279 @@ | ||
--- | ||
# Valid statuses: draft | proposed | rejected | accepted | superseded | ||
status: draft | ||
author: Michael Beemer | ||
created: 2025-09-10 | ||
updated: 2025-09-10 | ||
--- | ||
|
||
# High-Precision Fractional Bucketing for Sub-Percent Traffic Allocation | ||
|
||
This ADR proposes enhancing the fractional operation to support high-precision traffic allocation down to 0.001% granularity by increasing the internal bucket count from 100 to 100,000 while maintaining the existing weight-based API. | ||
|
||
## Background | ||
|
||
The current fractional operation in flagd uses a 100-bucket system that maps hash values to percentages in the range [0, 100]. | ||
This approach works well for most use cases but has significant limitations in high-throughput environments where precise sub-percent traffic allocation is required. | ||
|
||
Currently, the smallest allocation possible is 1%, which is insufficient for: | ||
|
||
- Gradual rollouts in ultra-high-traffic systems where 1% could represent millions of users | ||
- A/B testing scenarios requiring precise control over small experimental groups | ||
- Canary deployments where operators need to start with very small traffic percentages (e.g., 0.1% or 0.01%) | ||
|
||
The current implementation in `fractional.go` calculates bucket assignment using: | ||
|
||
```go | ||
bucket := hashRatio * 100 // in range [0, 100] | ||
``` | ||
|
||
This limits granularity to 1% increments, making it impossible to achieve the precision required for sophisticated traffic management strategies. | ||
|
||
## Requirements | ||
|
||
- Support traffic allocation precision down to 0.001% (3 decimal places) | ||
- Maintain backwards compatibility with existing weight-based API | ||
- Preserve deterministic bucketing behavior (same hash input always produces same bucket) | ||
- Ensure consistent bucket assignment across different programming languages | ||
- Support weight values up to a reasonable maximum that works across multiple languages | ||
- Maintain current performance characteristics | ||
- Prevent users from being moved between buckets when only distribution percentages change | ||
- Guarantee that any variant with weight > 0 receives some traffic allocation | ||
- Handle edge cases gracefully without silent failures | ||
- Validate weight configurations and provide clear error messages for invalid inputs | ||
|
||
## Considered Options | ||
|
||
- **Option 1: 10,000 buckets (0.01% precision)** - 1 in every 10,000 users, better but still not sufficient for many high-throughput use cases | ||
- **Option 2: 100,000 buckets (0.001% precision)** - 1 in every 100,000 users, meets most high-precision needs | ||
- **Option 3: 1,000,000 buckets (0.0001% precision)** - 1 in every 1,000,000 users, likely overkill and could impact performance | ||
|
||
## Proposal | ||
|
||
Implement a 100,000-bucket system that provides 0.001% precision while maintaining the existing integer weight-based API. | ||
|
||
### API changes | ||
|
||
No API changes are required. The existing fractional operation syntax remains unchanged: | ||
|
||
```json | ||
"fractional": [ | ||
{ "cat": [{ "var": "$flagd.flagKey" }, { "var": "email" }] }, | ||
["red", 50], | ||
["blue", 30], | ||
["green", 20] | ||
] | ||
``` | ||
|
||
### Implementation Changes | ||
|
||
1. **Bucket Count**: Change from 100 to 100,000 buckets by modifying bucket calculation from `hashRatio * 100` to `hashRatio * 100000` | ||
2. **Minimum Allocation Guarantee**: Any variant with weight > 0 receives at least 1 bucket (0.001%) | ||
3. **Excess Bucket Handling**: Remove excess buckets from the largest variant to maintain exactly 100,000 total buckets | ||
4. **Weight Sum Validation**: Reject configurations where total weight exceeds maximum safe integer value | ||
5. **Maximum Weight Sum**: Use language-specific maximum 32-bit signed integer constants for cross-platform compatibility | ||
|
||
### Minimum Allocation Guarantee | ||
|
||
To prevent silent configuration failures, any variant with a positive weight will receive at least 0.001% allocation (1 bucket), even if the calculated percentage would round to zero. This ensures predictable behavior where positive weights always result in some traffic allocation. | ||
|
||
**Example**: Configuration `["variant-a", 1], ["variant-b", 1000000]` | ||
|
||
- Without guarantee: variant-a gets 0% (never selected) | ||
- With guarantee: variant-a gets 0.001%, variant-b gets 99.999% | ||
|
||
### Excess Bucket Management | ||
|
||
When minimum allocations cause the total to exceed 100,000 buckets, excess buckets are removed from the variant with the largest allocation. | ||
This approach: | ||
Comment on lines
+87
to
+88
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description of how excess buckets are handled is slightly inconsistent. This section states that excess buckets are removed from 'the variant with the largest allocation' (singular), while the 'Edge Case Handling' section on line 127 refers to it as 'Excess distributed fairly among largest variants' (plural). The code example shows a sequential removal process. For clarity and consistency, I suggest refining the description to accurately reflect the implementation, for example: 'Excess buckets are removed sequentially from variants with the largest allocations, starting with the largest, until the total bucket count is exactly 100,000.' |
||
|
||
- Maintains the minimum guarantee for small variants | ||
- Has minimal impact on large variants (small relative reduction) | ||
- Preserves deterministic behavior | ||
- Prevents bucket count overflow | ||
|
||
### Weight Sum Validation | ||
|
||
When the total weight sum exceeds the maximum safe integer value, the fractional evaluation will return a validation error with a clear message. | ||
This prevents integer overflow issues and provides immediate feedback to users about invalid configurations. | ||
|
||
```go | ||
import "math" | ||
|
||
func validateWeightSum(variants []fractionalEvaluationVariant) error { | ||
var totalWeight int64 = 0 | ||
for _, variant := range variants { | ||
totalWeight += int64(variant.weight) | ||
if totalWeight > math.MaxInt32 { | ||
return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", | ||
totalWeight, math.MaxInt32) | ||
} | ||
} | ||
return nil | ||
} | ||
``` | ||
|
||
Implementations should prefer built-in language constants (e.g., `math.MaxInt32` in Go, `Integer.MAX_VALUE` in Java, `int.MaxValue` in C#) rather than hardcoded values to ensure maintainability and clarity. | ||
|
||
### Edge Case Handling | ||
|
||
The implementation addresses several edge cases: | ||
|
||
1. **All weights are 0**: Returns empty string (maintains current behavior) | ||
2. **Negative weights**: Treated as 0 (maintains current validation behavior) | ||
3. **Single variant**: Receives all 100,000 buckets regardless of weight value | ||
4. **Empty variants**: Returns error (maintains current validation behavior) | ||
5. **Weight sum overflow**: Returns validation error with clear message | ||
6. **Multiple variants with minimum allocation**: Excess distributed fairly among largest variants | ||
|
||
### Maximum Weight Considerations | ||
|
||
To ensure cross-language compatibility, we establish a maximum total weight sum equal to the maximum 32-bit signed integer value (2,147,483,647). This limit: | ||
|
||
- Works reliably across all target languages (Go, Java, .NET, JavaScript, Python) | ||
- Provides more than sufficient range for any practical use case | ||
- Prevents integer overflow issues in 32-bit signed integer systems | ||
- Allows for extremely fine-grained control (individual weights can be 1 out of 2+ billion) | ||
- Uses language-native constants for better maintainability | ||
|
||
### Code Changes | ||
|
||
The following shows how the core logic in `fractional.go` would be modified. | ||
|
||
```go | ||
const bucketCount = 100000 | ||
|
||
// bucketAllocation represents the number of buckets allocated to a variant | ||
type bucketAllocation struct { | ||
variant string | ||
buckets int | ||
} | ||
|
||
func (fe *Fractional) Evaluate(values, data any) any { | ||
valueToDistribute, feDistributions, err := parseFractionalEvaluationData(values, data) | ||
if err != nil { | ||
fe.Logger.Warn(fmt.Sprintf("parse fractional evaluation data: %v", err)) | ||
return nil | ||
} | ||
|
||
if err := validateWeightSum(feDistributions.weightedVariants); err != nil { | ||
fe.Logger.Warn(fmt.Sprintf("weight validation failed: %v", err)) | ||
return nil | ||
} | ||
|
||
return distributeValue(valueToDistribute, feDistributions) | ||
} | ||
|
||
func validateWeightSum(variants []fractionalEvaluationVariant) error { | ||
var totalWeight int64 = 0 | ||
for _, variant := range variants { | ||
totalWeight += int64(variant.weight) | ||
if totalWeight > math.MaxInt32 { | ||
return fmt.Errorf("total weight sum %d exceeds maximum allowed value %d", | ||
totalWeight, math.MaxInt32) | ||
} | ||
} | ||
return nil | ||
} | ||
|
||
func calculateBucketAllocations(variants []fractionalEvaluationVariant, totalWeight int) []bucketAllocation { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Properly supporting guaranteed bucketing may add more complexity than I'd like. I'm sure I can address this issue but I'd like feedback on if it's worth supporting this at all. The reason I added this is to avoid configurations like |
||
allocations := make([]bucketAllocation, len(variants)) | ||
totalAllocated := 0 | ||
|
||
// Calculate initial allocations | ||
for i, variant := range variants { | ||
if variant.weight == 0 { | ||
allocations[i] = bucketAllocation{variant: variant.variant, buckets: 0} | ||
} else { | ||
// Calculate proportional allocation | ||
proportional := int((int64(variant.weight) * bucketCount) / int64(totalWeight)) | ||
// Ensure minimum allocation of 1 bucket for any positive weight | ||
buckets := max(1, proportional) | ||
allocations[i] = bucketAllocation{variant: variant.variant, buckets: buckets} | ||
} | ||
totalAllocated += allocations[i].buckets | ||
} | ||
|
||
// Handle excess buckets by removing from largest allocation | ||
excess := totalAllocated - bucketCount | ||
if excess > 0 { | ||
// Sort indices by bucket count (descending) to find largest allocation | ||
indices := make([]int, len(allocations)) | ||
for i := range indices { | ||
indices[i] = i | ||
} | ||
sort.Slice(indices, func(i, j int) bool { | ||
if allocations[indices[i]].buckets == allocations[indices[j]].buckets { | ||
return allocations[indices[i]].variant < allocations[indices[j]].variant // Tie-break by variant name | ||
} | ||
return allocations[indices[i]].buckets > allocations[indices[j]].buckets | ||
}) | ||
|
||
// Remove excess from largest allocation, respecting minimum guarantee | ||
for _, idx := range indices { | ||
if excess <= 0 { | ||
break | ||
} | ||
|
||
// Don't reduce below 1 bucket if original weight > 0 | ||
minAllowed := 0 | ||
if variants[idx].weight > 0 { | ||
minAllowed = 1 | ||
} | ||
|
||
canRemove := allocations[idx].buckets - minAllowed | ||
toRemove := min(excess, canRemove) | ||
allocations[idx].buckets -= toRemove | ||
excess -= toRemove | ||
} | ||
} | ||
|
||
return allocations | ||
} | ||
``` | ||
|
||
**5. Replace the distribution logic:** | ||
|
||
```go | ||
func distributeValue(value string, feDistribution *fractionalEvaluationDistribution) string { | ||
if feDistribution.totalWeight == 0 { | ||
return "" | ||
} | ||
|
||
allocations := calculateBucketAllocations(feDistribution.weightedVariants, feDistribution.totalWeight) | ||
|
||
hashValue := int32(murmur3.StringSum32(value)) | ||
hashRatio := math.Abs(float64(hashValue)) / math.MaxInt32 | ||
bucket := int(hashRatio * bucketCount) // in range [0, bucketCount) | ||
|
||
currentBucket := 0 | ||
for _, allocation := range allocations { | ||
currentBucket += allocation.buckets | ||
if bucket < currentBucket { | ||
return allocation.variant | ||
} | ||
} | ||
|
||
return "" | ||
} | ||
``` | ||
|
||
### Consequences | ||
|
||
- Good, because it enables precise traffic control for high-throughput environments | ||
- Good, because it matches industry-standard precision offered by leading vendors | ||
- Good, because it maintains API backwards compatibility | ||
- Good, because integer weights remain simple to understand and configure | ||
- Good, because it prevents silent configuration failures through minimum allocation guarantee | ||
- Good, because excess handling is predictable and fair | ||
- Good, because weight validation provides clear error messages for invalid configurations | ||
- Bad, because it represents a behavioral breaking change for existing configurations | ||
- Bad, because it slightly increases memory usage for bucket calculations | ||
- Bad, because actual percentages may differ slightly from configured weights due to minimum allocations | ||
|
||
### Implementation Plan | ||
|
||
1. Update flagd-testbed with comprehensive test cases for high-precision fractional bucketing across all evaluation modes | ||
2. Implement core logic in flagd to support 100,000-bucket system with minimum allocation guarantee and excess handling | ||
3. Update flagd providers to ensure consistent behavior and testing across language implementations | ||
4. Documentation updates, migration guides, and example configurations to demonstrate the new precision capabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went back and forth on this. It isn't necessary if the flag is configured properly but I'm afraid that it wouldn't be that obvious that there's a misconfiguration. This basically prevents 0% distribution if a weight is defined.