-
Notifications
You must be signed in to change notification settings - Fork 559
Description
Component(s)
collector
Describe the issue you're reporting
The documentation comment in apis/v1alpha1/opentelemetrycollector_types.go
is outdated and contradicts the actual capabilities of the Target Allocator, particularly regarding the per-node allocation strategy's
support for multiple replicas.
Current Misleading Documentation
In apis/v1alpha1/opentelemetrycollector_types.go
, the Replicas
field comment states:
// Replicas is the number of pod instances for the underlying TargetAllocator.
// This should only be set to a value other than 1 if a strategy that allows
// for high availability is chosen. Currently, the only allocation strategy
// that can be run in a high availability mode is consistent-hashing.
This comment directly contradicts:
- Maintainer confirmation in Issue Target Allocator multiple Replicas with per-node allocationStrategy #2900 where @swiatekm stated:
"It should work correctly... we don't check Replicas at admission, so you can really set it to whatever you want" - PR Allow setting of PDB for per-node #2932 merged June 13, 2024 which specifically added PodDisruptionBudget support for per-node strategy, explicitly enabling HA configurations
- The technical reality that per-node allocation is stateless and deterministic, making it inherently safe for multiple replicas
Impact
This outdated comment may cause users to:
- Avoid implementing HA configurations for per-node Target Allocators
- Miss opportunities to eliminate single points of failure in critical observability infrastructure
- Question whether officially supported configurations are actually valid
- Potentially experience incidents due to lack of HA (we experienced this in June 2024)
Expected Behavior
The comment should be updated to reflect that:
- Per-node allocation strategy does support multiple replicas
- Multiple replicas require operator v0.103.0+ (which includes PR Allow setting of PDB for per-node #2932 for PDB support)
- Both consistent-hashing and per-node strategies can run in HA mode
Proposed Documentation Fix
// Replicas is the number of pod instances for the underlying TargetAllocator.
// This should only be set to a value other than 1 if a strategy that allows
// for high availability is chosen.
//
// Allocation strategies supporting high availability:
// - consistent-hashing: Supported (distributes targets via hashing)
// - per-node: Supported (requires operator v0.103.0+, deterministic allocation)
// - least-weighted: Not recommended for HA (may cause allocation conflicts)
//
// When using multiple replicas, configure a PodDisruptionBudget to ensure
// availability during cluster maintenance operations.
Additional Context
- This comment appears to be 3+ years old and predates the PDB support PR
- Issue Target Allocator multiple Replicas with per-node allocationStrategy #2900 has extensive maintainer discussion confirming per-node works with multiple replicas
- Users relying on this documentation are being misled about the operator's actual capabilities
- The Target Allocator's stateless, deterministic design makes it inherently safe for HA with per-node strategy
References
- Issue Target Allocator multiple Replicas with per-node allocationStrategy #2900: Target Allocator multiple Replicas with per-node allocationStrategy #2900
- PR Allow setting of PDB for per-node #2932: Allow setting of PDB for per-node #2932
- Source file: https://github.com/open-telemetry/opentelemetry-operator/blob/main/apis/v1alpha1/opentelemetrycollector_types.go
Would appreciate if this documentation could be updated to reflect the current reality and prevent others from experiencing the same confusion and potential production incidents.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1
or me too
, to help us triage it. Learn more here.