feat: Nvidia GPU Direct Storage Support for reading RNTuple #1426

fstrug · 2025-04-22T13:48:20Z

This PR adds support for reading RNTuple data from storage directly to GPU memory via RDMA on GPU Direct Storage (GDS) enabled systems. On systems without GDS, cufile runs in compatibility mode and reads are performed by the cpu via POSIX. Currently, there is only support for reading RNTuple datas compressed with the zstandard algorithm.

Changes are made to uproot.behaviors.RNTuple.HasFields.arrays() to control steering of arrays function with new optional arguments backend and use_GDS.

There are known issues with nvcomp's implementation of the zstandard algorithm which crashes nvcomp (and uproot) when decompressing certain buffers. This behavior is non-determinant and the reason is unknown. Fixes are expected in the next release of nvcomp.

…models/RNTuple.py. Backend and use_GDS options added to arrays().

nsmith-

Some initial comments. I will look more closely at the implementation later.

pyproject.toml

src/uproot/behaviors/RNTuple.py

src/uproot/models/RNTuple.py

…y GDS behavior.

…ssed columns and custom floats. Better handling for retrieving column outoput.

…ed to use snake case.

fstrug · 2025-05-14T19:01:21Z

Passing all RNTuple reading tests besides four.

test_1411_rntuple_physlite_ATLAS.py::test_truth_muon_containers - Bug found in awkward array fixed at scikit-hep/awkward#3507. Had issues building awkward-cpp so I haven't verified with awkward main branch that tests passes as I got missing awkward kernel error at final assert.
test_1250_rntuple_improvements.py::test_iterate - failing at line 89. Iterate not yet implemented for GDS. Failing for backend = "cpu" and use_GDS = False. Only changed function to directly call HasFields._arrays() instead of high level `HasFields.arrays(). This seems to be unrelated to this PR right now.

I am still unsure how we want to solve the following tests.
test_0662_rntuple_stl_containers - Cupy does not support dtype = string. Cudf supports string manipulations and support in cupy not anticipated soon. cupy/cupy#8698 (comment) Unsure how we might want to implement this.
test_1223_empty_struct.py::test_invalid_variant() - Cupy does not support dtype = object. It is possible to call ak.to_backend(a, "cuda") which works. I need to find a way to build a.variant with backend cuda via cupy arrays and ak.from_buffers().

…data.

Sync

pyproject.toml

chore: update pre-commit hooks (scikit-hep#1446)

Fixed ``` error: Failed to parse entry in group `test`: `kvikio-cu12>=25.02.01; platform_system == "Linux" & python_version >= 3.10` Caused by: Unexpected character '&', expected 'and', 'or' or end of input kvikio-cu12>=25.02.01; platform_system == "Linux" & python_version >= 3.10 ```

Require numpy version < 2.3 due to bug.

ariostas · 2025-06-12T16:58:03Z

pyproject.toml

@@ -29,6 +29,7 @@ test = [
  "rangehttpserver",
  "requests",
  "s3fs",
+  'kvikio-cu12>=25.02.01; platform_system == "Linux" and python_version >= "3.10"',


Maybe it makes sense to add a test-gpu group?

yes, then it's clear what is failing (if something is failing :-)

ariostas · 2025-06-12T17:01:22Z

src/uproot/models/RNTuple.py

+# https://github.com/root-project/root/blob/6dc4ff848329eaa3ca433985e709b12321098fe2/core/zip/inc/Compression.h#L93-L105
+compression_settings_dict = {
+    -1: "Inherit",
+    0: "UseGlobal",
+    1: "ZLIB",
+    2: "LZMA",
+    3: "deflate",
+    4: "LZ4",
+    5: "zstd",
+}


This is already in uproot.const

uproot5/src/uproot/const.py

Lines 38 to 43 in 61835f1

kZLIB = 1

kLZMA = 2

kOldCompressionAlgo = 3

kLZ4 = 4

kZSTD = 5

kUndefinedCompressionAlgorithm = 6

ariostas · 2025-06-12T17:03:44Z

src/uproot/models/RNTuple.py

+
+        return Cluster_Contents
+
+    def Deserialize_decompressed_content(


We should stick with the convention of function names being lower snake case

ariostas · 2025-06-12T17:05:53Z

src/uproot/models/RNTuple.py

+
+
+# GDS Helper Dataclasses
+class cupy:  # to appease the linter


I think there's probably a better alternative

We could use a type alias

CupyArray = Any

which won't offer any type checks but at least hints at what is there

ianna

@fstrug - nice work! Just a few minor comments - please, check. Thanks!

ianna · 2025-06-14T14:09:28Z

pyproject.toml

@@ -76,7 +77,7 @@ dependencies = [
  "awkward>=2.4.6",
  "cramjam>=2.5.0",
  "xxhash",
-  "numpy",
+  "numpy < 2.3",


@fstrug - no need to pin it now. The CI should pick up the latest awkward release that fixes the issue.

ianna · 2025-06-14T14:10:59Z

pyproject.toml

@@ -29,6 +29,7 @@ test = [
  "rangehttpserver",
  "requests",
  "s3fs",
+  'kvikio-cu12>=25.02.01; platform_system == "Linux" and python_version >= "3.10"',


yes, then it's clear what is failing (if something is failing :-)

nsmith-

A few code quality comments, no showstoppers regarding functionality as far as I can see

nsmith- · 2025-06-17T14:39:29Z

src/uproot/behaviors/RNTuple.py

+        interpretation_executor=None,
+        filter_branch=unset,
+    ):
+        """


Since this is now a private function we can omit the duplicate docstring (to avoid potential accidents later in editing the private one instead of the public one)

nsmith- · 2025-06-17T14:40:38Z

src/uproot/behaviors/RNTuple.py

        )[entry_start:entry_stop]

+        arrays = uproot.extras.awkward().to_backend(arrays, backend=backend)


Since this is a new line, do we have a test that exercises the cuda backend when not using GDS?

nsmith- · 2025-06-17T14:41:10Z

src/uproot/behaviors/RNTuple.py

+        filter_branch=unset,
+    ):
+        """
+        Current GDS support is limited to nvidia GPUs. The python library kvikIO is


As above, this is a private method so the docstring might better be reduced to implementation-specific details

nsmith- · 2025-06-17T14:45:48Z

src/uproot/behaviors/RNTuple.py

+        for key in target_cols:
+            if "column" in key and "union" not in key:
+                key_nr = int(key.split("-")[1])
+
+                dtype_byte = self.ntuple.column_records[key_nr].type
+                content = content_dict[key_nr]
+
+                if "cardinality" in key:
+                    content = cupy.diff(content)
+
+                if dtype_byte == uproot.const.rntuple_col_type_to_num_dict["switch"]:
+                    kindex, tags = uproot.models.RNTuple._split_switch_bits(content)
+                    # Find invalid variants and adjust buffers accordingly
+                    invalid = numpy.flatnonzero(tags == -1)
+                    if len(invalid) > 0:
+                        kindex = numpy.delete(kindex, invalid)
+                        tags = numpy.delete(tags, invalid)
+                        invalid -= numpy.arange(len(invalid))
+                        optional_index = numpy.insert(
+                            numpy.arange(len(kindex), dtype=numpy.int64), invalid, -1
+                        )
+                    else:
+                        optional_index = numpy.arange(len(kindex), dtype=numpy.int64)
+                    container_dict[f"{key}-index"] = cupy.array(optional_index)
+                    container_dict[f"{key}-union-index"] = cupy.array(kindex)
+                    container_dict[f"{key}-union-tags"] = cupy.array(tags)
+                else:
+                    # don't distinguish data and offsets
+                    container_dict[f"{key}-data"] = content
+                    container_dict[f"{key}-offsets"] = content
+        cluster_offset = cluster_starts[start_cluster_idx]
+        entry_start -= cluster_offset
+        entry_stop -= cluster_offset


It looks like a lot of this code is similar to the non-GPU _arrays method implementation. Is there a way to share more code between them rather than duplicating?

nsmith- · 2025-06-17T14:48:04Z

src/uproot/models/RNTuple.py

+
+
+@dataclasses.dataclass
+class ColBuffers_Cluster:


For the most part, this repo is using CamelCase for types and snake_case for functions, so here also a symbol rename may be in order

nsmith- · 2025-06-17T14:49:31Z

src/uproot/source/cufile_interface.py

+import uproot
+
+
+class Source_CuFile:


CuFileSource

nsmith- · 2025-06-17T14:51:41Z

src/uproot/models/RNTuple.py

+        # if self.columns == []:
+        #     self.columns = Cluster.columns


nsmith- · 2025-06-17T14:55:25Z

src/uproot/models/RNTuple.py

+        key = ColBuffers_Cluster.key
+        self.columns.append(key)
+        self.data_dict[key] = ColBuffers_Cluster
+        self.algorithms[key] = ColBuffers_Cluster.algorithm
+        if ColBuffers_Cluster.isCompressed:
+            self.data_dict_comp[key] = ColBuffers_Cluster
+        else:
+            self.data_dict_uncomp[key] = ColBuffers_Cluster


If this dataclass has multiple containers all indexed by the same key, it may be a sign that they belong together. It looks like they start out this way with ColBuffers_Cluster having a complete record and this destructures it. Any reason not to just keep a dict[str, ColBuffers_Cluster] member?

nsmith- · 2025-06-17T14:56:02Z

src/uproot/models/RNTuple.py

+        else:
+            self.data_dict_uncomp[key] = ColBuffers_Cluster
+
+    def _decompress(self):


It will be good to free any memory occupied by the compressed buffers once they are decompressed

fstrug and others added 5 commits April 10, 2025 19:20

Add nvidia GDS support for RNTuple reading.

6ab3dc5

Merge branch 'scikit-hep:main' into main

e1e2184

Merge branch 'scikit-hep:main' into main

dc63aa2

Integrate RNTuple GDS functionallity across behaviors/RNTuple.py and …

1daaeb4

…models/RNTuple.py. Backend and use_GDS options added to arrays().

style: pre-commit fixes

e81912c

nsmith- requested review from lgray and nsmith- April 22, 2025 15:03

nsmith- requested changes Apr 22, 2025

View reviewed changes

fstrug marked this pull request as draft April 24, 2025 15:30

fstrug and others added 8 commits May 6, 2025 17:42

Add support for LZ4 decompression. Update some RNTuple tests to verif…

8b030a1

…y GDS behavior.

Resolve merge conflicts

55ff9f4

Parametetrized more RNTuple pytests. Added support for reading surpre…

3c4dd08

…ssed columns and custom floats. Better handling for retrieving column outoput.

Remove unnecessary imports. GDS dependencies imported through . Updat…

ceac95d

…ed to use snake case.

Merge branch 'main' into main

84c7929

style: pre-commit fixes

bcf6a05

More tests updated.

6eda90a

style: pre-commit fixes

0d206c7

fstrug and others added 9 commits May 29, 2025 16:49

Stashing changes

a1c5fa6

style: pre-commit fixes

55b7925

Fixed bug causing repeated deserialization operations on a column of …

546195e

…data.

Added initial implementation for interface to kvikio.CuFile.

eb15117

style: pre-commit fixes

333397d

Merge pull request #1 from scikit-hep/main

dd02c45

Sync

style: pre-commit fixes

8fb1921

Added doc strings. Code cleanup.

0d9e970

style: pre-commit fixes

41755b3

nsmith- reviewed May 30, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

fstrug and others added 2 commits June 10, 2025 17:10

Fixed linter bugs

3fd629f

Merge pull request #2 from scikit-hep/main

7fc0520

chore: update pre-commit hooks (scikit-hep#1446)

fstrug marked this pull request as ready for review June 10, 2025 17:15

fstrug and others added 6 commits June 10, 2025 20:36

Skip GDS tests if no available CUDA driver.

28e6ae2

GDS tests should only run on supported OS with available cuda driver.

ac31628

Linting

753ac69

Update pyproject.toml

e6c0163

Update pyproject.toml

4b3c46e

Require numpy version < 2.3 due to bug.

ariostas reviewed Jun 12, 2025

View reviewed changes

ianna requested changes Jun 14, 2025

View reviewed changes

nsmith- reviewed Jun 17, 2025

View reviewed changes

lgray mentioned this pull request Jun 18, 2025

GPU (CUDA)-kernel for end-user analysis HSF/PyHEP.dev-workshops#49

Open

	kZLIB = 1
	kLZMA = 2
	kOldCompressionAlgo = 3
	kLZ4 = 4
	kZSTD = 5
	kUndefinedCompressionAlgorithm = 6


		return Cluster_Contents

		def Deserialize_decompressed_content(

		)[entry_start:entry_stop]

		arrays = uproot.extras.awkward().to_backend(arrays, backend=backend)



		@dataclasses.dataclass
		class ColBuffers_Cluster:

		import uproot


		class Source_CuFile:

feat: Nvidia GPU Direct Storage Support for reading RNTuple #1426

Are you sure you want to change the base?

feat: Nvidia GPU Direct Storage Support for reading RNTuple #1426

Uh oh!

Conversation

fstrug commented Apr 22, 2025

Uh oh!

nsmith- left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fstrug commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsmith- left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fstrug commented May 14, 2025 •

edited

Loading