Random Crashes (No Error) with cupynumeric.load() for Large .npy File (~40GB) #4

IlgarBaghishov · 2025-04-24T16:15:25Z

IlgarBaghishov
Apr 24, 2025

I'm encountering intermittent crashes without any error messages when loading a ~40GB .npy file using cupynumeric.load(). The same file loads consistently with numpy.load(). Converting the NumPy-loaded array via cupynumeric.array() also results in the same random crashes sometimes. GPU memory should not be an issue because I am using grace hopper nodes on Vista that have 96Gb of GPU memory. I can easily create twice as big of matrices. The problem arises specifically when I try to load the npy file or when I try to turn numpy array into cupynumeric array.

In addition, my future objective is multi-node/multi-GPU distributed computations (QR/SVD) on datasets larger than single-node RAM. Therefore pre-loading with NumPy and then converting to cupynumeric in batches is not an ideal solution.

Questions:

Any insights into the cause of these silent, intermittent crashes with cupynumeric I/O or cupynumeric.array() for large arrays?
Is .npy suitable for distributed loading of datasets exceeding node memory?
What file formats and loading strategies (e.g., HDF5, memory-mapping) are recommended for loading data larger than single-node RAM directly for distributed cupynumeric applications? And how to achieve them?

Thank you so much for your help!

Answered by manopapad

Apr 25, 2025

Any insights into the cause of these silent, intermittent crashes with cupynumeric I/O or cupynumeric.array() for large arrays?

I don't have any immediate hunches, could you please share a reproducer?

Secondarily, you could try using a debug build of Legate and running with legate --gdb myprog.py, which will start your execution inside gdb (assuming you have it installed), which in turn will hopefully catch the crash and allow you to print out a backtrace at the point of failure.

We don't upload debug builds very often (due to the large file sizes), here are instructions to install the latest I could find:

conda create -n test-2025-04-24 -c legate/label/pull-request-2141-experimental …

View full answer

manopapad · 2025-04-25T02:06:14Z

manopapad
Apr 25, 2025
Maintainer

Any insights into the cause of these silent, intermittent crashes with cupynumeric I/O or cupynumeric.array() for large arrays?

I don't have any immediate hunches, could you please share a reproducer?

Secondarily, you could try using a debug build of Legate and running with legate --gdb myprog.py, which will start your execution inside gdb (assuming you have it installed), which in turn will hopefully catch the crash and allow you to print out a backtrace at the point of failure.

We don't upload debug builds very often (due to the large file sizes), here are instructions to install the latest I could find:

conda create -n test-2025-04-24 -c legate/label/pull-request-2141-experimental -c legate/label/pull-request-708-experimental -c legate/label/ucc140 cupynumeric

Is .npy suitable for distributed loading of datasets exceeding node memory?

Generally no, as it doesn't natively encode partition information.

What file formats and loading strategies (e.g., HDF5, memory-mapping) are recommended for loading data larger than single-node RAM directly for distributed cupynumeric applications? And how to achieve them?

I would suggest HDF5, for which we support distributed reads today. This example shows how to use the lower-level Legate interface https://github.com/nv-legate/legate/blob/main/share/legate/examples/io/hdf5/ex1.py#L117, then you can wrap the result of from_file using cupynumeric.asarray and go from there.

0 replies

IlgarBaghishov · 2025-04-25T16:38:06Z

IlgarBaghishov
Apr 25, 2025
Author

Thank you so much for the reply! I will try the HDF5 format. If it works then I will just start using HDF5. If not I will try to get a reproducer. Meanwhile I have another related question. I just noticed cupynumeric.linalg.qr only supports single GPU and single CPU. Are there any plans to add multi-node multi-GPU capability in the near future? That would be really useful!
Thanks again!

0 replies

IlgarBaghishov · 2025-04-25T20:27:12Z

IlgarBaghishov
Apr 25, 2025
Author

Reading HDF5 file with legate.io.hdf5.from_file and then wrapping with cupynumeric.asarray worked. Thank you for that! I even tried it with data larger than memory of one node.
However, I was not able to install MPI aware version of h5py together with cupynumeric. Not sure if it is on my end. But conda gives a dependency error. You can try creating a new environment and try installing both together like this:

conda create -n legate_h5py -c conda-forge -c legate cupynumeric h5py=*=mpi_openmpi*

I was only able to install "nompi" version of h5py in the same environment together with legate/cupynumeric

0 replies

manopapad · 2025-04-25T22:32:21Z

manopapad
Apr 25, 2025
Maintainer

I just noticed cupynumeric.linalg.qr only supports single GPU and single CPU. Are there any plans to add multi-node multi-GPU capability in the near future? That would be really useful!

MGMN QR is on the roadmap, but we first need to resolve some packaging issues with cuSolverMp, as we intend to reuse https://docs.nvidia.com/cuda/cusolvermp/usage/functions.html#cusolvermpgeqrf for this.

If you don't mind sharing, we'd love to hear more about your usecase.

I was only able to install "nompi" version of h5py in the same environment together with legate/cupynumeric

"nompi" version of h5py should be sufficient for the needs of cupynumeric (we only use h5py for reading the file metadata, then we manage parallel execution ourslves, and go directly to the individual file reading APIs of hdf5, not relying on their MPI-IO support). Is the "nompi" version of h5py causing issues for you?

0 replies

IlgarBaghishov · 2025-04-25T23:52:32Z

IlgarBaghishov
Apr 25, 2025
Author

Thanks for the update! It's really great to hear that MGMN QR is planned. I understand it depends on the cuSolverMp packaging. Do you have maybe a rough idea when this MGMN QR support could be released?

About the h5py question: You're right, the 'nompi' version works fine for cupynumeric itself. My reason for wanting the MPI version was actually for the step before using legate.

I have a use case where I generate a very large matrix – too big for one node's memory. I calculate parts of it on different nodes using another tool, which gives me NumPy arrays on each node. Because this matrix comes from outside legate, my idea was to use the MPI version of h5py to save all these NumPy parts together into one big HDF5 file using MPI-IO. Then, I wanted to load this HDF5 file with legate to do the distributed QR.

The small issue now is that I can't install MPI h5py and legate in the same environment. So, I would have to save the file in one environment, and then switch to another environment just to read it with legate and run the QR. It's not a big problem, just an extra step in the process. Let me know if you know of a work around!

Thanks again for explaining!

0 replies

manopapad · 2025-04-28T15:54:34Z

manopapad
Apr 28, 2025
Maintainer

I was able to reproduce your package conflict, and it appears to be happening because our packages have a dependency on a recent version of hdf5, whose openmpi-compatible package depends on openmpi>5, which our packages are not compatible with. I asked our package maintainers if it's possible to loosen our version restriction on hdf5 and/or openmpi packages.

Do you have maybe a rough idea when this MGMN QR support could be released?

I don't have a firm estimate at this point, probably at least 1 month away.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Random Crashes (No Error) with cupynumeric.load() for Large .npy File (~40GB) #4

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Random Crashes (No Error) with cupynumeric.load() for Large .npy File (~40GB) #4

Uh oh!

IlgarBaghishov Apr 24, 2025

Replies: 6 comments

Uh oh!

manopapad Apr 25, 2025 Maintainer

Uh oh!

IlgarBaghishov Apr 25, 2025 Author

Uh oh!

Uh oh!

IlgarBaghishov Apr 25, 2025 Author

Uh oh!

manopapad Apr 25, 2025 Maintainer

Uh oh!

IlgarBaghishov Apr 25, 2025 Author

Uh oh!

manopapad Apr 28, 2025 Maintainer

IlgarBaghishov
Apr 24, 2025

manopapad
Apr 25, 2025
Maintainer

IlgarBaghishov
Apr 25, 2025
Author

IlgarBaghishov
Apr 25, 2025
Author

manopapad
Apr 25, 2025
Maintainer

IlgarBaghishov
Apr 25, 2025
Author

manopapad
Apr 28, 2025
Maintainer