Skip to content

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration #14543

@miranov25

Description

@miranov25

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration

I'd like to revisit a proposal I initially made about eight years ago regarding N-dimensional histograms in ROOT. At that time, we explored some aspects of THn with Axel, but the implementation remained partially "frozen." and as result of our work only THn was created.

Now, with the increasingly pressing memory consumption challenges within the Hyperloop framework in O2, the need for an optimized N-dimensional histogram representation has become urgent.

A dynamic alternative to THn and THnSparse

Motivation

Current ROOT histograms (THn and THnSparse) force users into a trade-off:

  • THn (dense): Fast, cache-friendly, but huge memory footprint for sparse data.
  • THnSparse (sparse): Efficient for low occupancy, but slow fill/access and merging as density increases.

Merging multiple THnSparse histograms with moderate occupancy is particularly costly due to pointer-heavy structures and hash collisions.


Proposal

Introduce an adaptive (elastic) N-D histogram that dynamically chooses the optimal internal storage format based on occupancy and access patterns, providing:

  • Memory efficiency for sparse data.
  • High-speed access and merging for dense or aggregated data.
  • Automatic switching between internal formats without user intervention.

Internal Representations

The histogram can exist in three states, switching dynamically:

  1. SparseMap (low occupancy, <~1-5%)

    • Structure: std::unordered_map<BinIndex, Weight>
    • Optimal for: Initial fills, very sparse datasets.
  2. ChunkedDense (intermediate occupancy, ~5–30% - to benchmark)

    • Structure: Dense chunks of N-D arrays for high-density regions, sparse elsewhere.
    • Trigger: Local occupancy > threshold.
    • Benefit: Locality + reduced hash overhead.
  3. FullDense (high occupancy >30%, or during merging)

    • Structure: Flat 1-D array (like THn).
    • Optimal for: Merging or fully dense datasets.
    • Benefit: Maximum cache locality, fastest arithmetic operations.

Adaptive Logic

  • Start: SparseMap for minimal memory.
  • Promote: Switch to ChunkedDense or FullDense when occupancy thresholds are reached.
  • Merge: Auto-promote to FullDense for merging speed, optionally optimize back if sparse regions dominate.

Key Benefits

  • Performance:

    • Fast fill and query (cache-friendly in dense mode).
    • Merge O(N) in dense mode vs. pointer-heavy O(N log N) in sparse structures.
  • Memory Efficiency:

    • Uses memory proportional to actual data, avoiding dense overhead for sparse cases.
  • User Simplicity:

    • One API, no need to choose THn vs THnSparse.
    • Compatible with existing ROOT I/O ?

API Sketch

ElasticTHn hist("ElasticHist", ndim, bins, xmin, xmax);
hist.Fill({x1, x2, ..., xn}, weight);
hist.Merge(otherElasticTHn);

Optional config:

hist.SetThresholds({sparseToChunk=0.01, chunkToDense=0.3});

Next Steps

  1. Prototype in C++ (wrap as ROOT::Experimental::ElasticTHn).
  2. Benchmark vs THn/THnSparse for speed, memory, and merging performance.
  3. Integration Plan: Add as an alternative in ROOT 6/7 with I/O compatible serialization.

Impact

This proposal solves a longstanding problem in HEP analysis: multidimensional histograms that scale efficiently with data sparsity. It removes the user burden of manual optimization while enabling modern workflows (fast merging, dynamic rebinning, adaptive compression).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions