-
Notifications
You must be signed in to change notification settings - Fork 467
Description
Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration
I'd like to revisit a proposal I initially made about eight years ago regarding N-dimensional histograms in ROOT. At that time, we explored some aspects of THn with Axel, but the implementation remained partially "frozen." and as result of our work only THn was created.
Now, with the increasingly pressing memory consumption challenges within the Hyperloop framework in O2, the need for an optimized N-dimensional histogram representation has become urgent.
A dynamic alternative to THn
and THnSparse
Motivation
Current ROOT histograms (THn
and THnSparse
) force users into a trade-off:
THn
(dense): Fast, cache-friendly, but huge memory footprint for sparse data.THnSparse
(sparse): Efficient for low occupancy, but slow fill/access and merging as density increases.
Merging multiple THnSparse
histograms with moderate occupancy is particularly costly due to pointer-heavy structures and hash collisions.
Proposal
Introduce an adaptive (elastic) N-D histogram that dynamically chooses the optimal internal storage format based on occupancy and access patterns, providing:
- Memory efficiency for sparse data.
- High-speed access and merging for dense or aggregated data.
- Automatic switching between internal formats without user intervention.
Internal Representations
The histogram can exist in three states, switching dynamically:
-
SparseMap (low occupancy, <~1-5%)
- Structure:
std::unordered_map<BinIndex, Weight>
- Optimal for: Initial fills, very sparse datasets.
- Structure:
-
ChunkedDense (intermediate occupancy, ~5–30% - to benchmark)
- Structure: Dense chunks of N-D arrays for high-density regions, sparse elsewhere.
- Trigger: Local occupancy > threshold.
- Benefit: Locality + reduced hash overhead.
-
FullDense (high occupancy >30%, or during merging)
- Structure: Flat 1-D array (like
THn
). - Optimal for: Merging or fully dense datasets.
- Benefit: Maximum cache locality, fastest arithmetic operations.
- Structure: Flat 1-D array (like
Adaptive Logic
- Start:
SparseMap
for minimal memory. - Promote: Switch to
ChunkedDense
orFullDense
when occupancy thresholds are reached. - Merge: Auto-promote to
FullDense
for merging speed, optionally optimize back if sparse regions dominate.
Key Benefits
-
Performance:
- Fast fill and query (cache-friendly in dense mode).
- Merge O(N) in dense mode vs. pointer-heavy O(N log N) in sparse structures.
-
Memory Efficiency:
- Uses memory proportional to actual data, avoiding dense overhead for sparse cases.
-
User Simplicity:
- One API, no need to choose
THn
vsTHnSparse
. - Compatible with existing ROOT I/O ?
- One API, no need to choose
API Sketch
ElasticTHn hist("ElasticHist", ndim, bins, xmin, xmax);
hist.Fill({x1, x2, ..., xn}, weight);
hist.Merge(otherElasticTHn);
Optional config:
hist.SetThresholds({sparseToChunk=0.01, chunkToDense=0.3});
Next Steps
- Prototype in C++ (wrap as
ROOT::Experimental::ElasticTHn
). - Benchmark vs THn/THnSparse for speed, memory, and merging performance.
- Integration Plan: Add as an alternative in ROOT 6/7 with I/O compatible serialization.
Impact
This proposal solves a longstanding problem in HEP analysis: multidimensional histograms that scale efficiently with data sparsity. It removes the user burden of manual optimization while enabling modern workflows (fast merging, dynamic rebinning, adaptive compression).