Elastic N-Dimensional Histogram for O2+ROOT  -> Hyperloop consideration

###  Elastic N-Dimensional Histogram for O2+ROOT  -> Hyperloop consideration

I'd like to revisit a proposal I initially made about eight years ago regarding N-dimensional histograms in ROOT. At that time, we explored some aspects of THn with Axel, but the implementation remained partially "frozen." and as result of our work only THn was created.

Now, with the increasingly pressing memory consumption challenges within the Hyperloop framework in O2, the need for an optimized N-dimensional histogram representation has become urgent. 

*A dynamic alternative to `THn` and `THnSparse`*
---

### **Motivation**

Current ROOT histograms (`THn` and `THnSparse`) force users into a trade-off:

* `THn` (dense): Fast, cache-friendly, but **huge memory footprint** for sparse data.
* `THnSparse` (sparse): Efficient for low occupancy, but **slow fill/access and merging** as density increases.

Merging multiple `THnSparse` histograms with moderate occupancy is particularly costly due to pointer-heavy structures and hash collisions.

---

### **Proposal**

Introduce an **adaptive (elastic) N-D histogram** that dynamically chooses the optimal internal storage format based on **occupancy and access patterns**, providing:

* **Memory efficiency** for sparse data.
* **High-speed access and merging** for dense or aggregated data.
* **Automatic switching** between internal formats without user intervention.

---

### **Internal Representations**

The histogram can exist in **three states**, switching dynamically:

1. **SparseMap** (low occupancy, <~1-5%)

   * Structure: `std::unordered_map<BinIndex, Weight>`
   * Optimal for: Initial fills, very sparse datasets.

2. **ChunkedDense** (intermediate occupancy, ~5–30% - to benchmark)

   * Structure: Dense chunks of N-D arrays for high-density regions, sparse elsewhere.
   * Trigger: Local occupancy > threshold.
   * Benefit: Locality + reduced hash overhead.

3. **FullDense** (high occupancy >30%, or during merging)

   * Structure: Flat 1-D array (like `THn`).
   * Optimal for: Merging or fully dense datasets.
   * Benefit: Maximum cache locality, fastest arithmetic operations.

---

### **Adaptive Logic**

* **Start:** `SparseMap` for minimal memory.
* **Promote:** Switch to `ChunkedDense` or `FullDense` when occupancy thresholds are reached.
* **Merge:** Auto-promote to `FullDense` for merging speed, optionally optimize back if sparse regions dominate.

---

### **Key Benefits**

* **Performance:**
  * Fast fill and query (cache-friendly in dense mode).
  * Merge O(N) in dense mode vs. pointer-heavy O(N log N) in sparse structures.
* **Memory Efficiency:**

  * Uses memory proportional to actual data, avoiding dense overhead for sparse cases.
* **User Simplicity:**
  * One API, no need to choose `THn` vs `THnSparse`.
  * Compatible with existing ROOT I/O ?
---

### **API Sketch**

```cpp
ElasticTHn hist("ElasticHist", ndim, bins, xmin, xmax);
hist.Fill({x1, x2, ..., xn}, weight);
hist.Merge(otherElasticTHn);
```

Optional config:

```cpp
hist.SetThresholds({sparseToChunk=0.01, chunkToDense=0.3});
```

---

### **Next Steps**

1. **Prototype in C++** (wrap as `ROOT::Experimental::ElasticTHn`).
2. **Benchmark vs THn/THnSparse** for speed, memory, and merging performance.
3. **Integration Plan:** Add as an alternative in ROOT 6/7 with I/O compatible serialization.

---

### **Impact**

This proposal solves a **longstanding problem in HEP analysis**: multidimensional histograms that scale efficiently with data sparsity. It removes the user burden of manual optimization while enabling modern workflows (fast merging, dynamic rebinning, adaptive compression).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration #14543

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration

A dynamic alternative to `THn` and `THnSparse`

Motivation

Proposal

Internal Representations

Adaptive Logic

Key Benefits

API Sketch

Next Steps

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration #14543

Description

Elastic N-Dimensional Histogram for O2+ROOT -> Hyperloop consideration

A dynamic alternative to THn and THnSparse

Motivation

Proposal

Internal Representations

Adaptive Logic

Key Benefits

API Sketch

Next Steps

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

A dynamic alternative to `THn` and `THnSparse`