Memory allocation of Multi-GPU MPI Jobs #28741

YigitElma · 2025-05-14T15:25:59Z

YigitElma
May 14, 2025

Hi,

I am working on an optimization code where we have very memory-intensive objectives. I managed to use mpi4py to distribute the computations to multiple devices and execute them at the same time. (Note: Each objective is taking different sized-arrays, and do completely different operations, so built-in JAX SPMD parallelization doesn't help).

Example structure of the code is,

num_device = 3

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2"

# os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = str(1 / (num_device + 2))
# os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"] = "platform"

from mpi4py import MPI

if __name__ == "__main__":
    rank = MPI.COMM_WORLD.Get_rank()
    size = MPI.COMM_WORLD.Get_size()

    # Let's say I have 3 expensive objective functions
    obj0 = Object0()
    obj1 = Object1()
    obj2 = Object2()

    # place all the data related to the objectives to corresponding GPUs
    obj0 = jax.device_put(obj0, device=jax.devices("gpu")[0])
    obj1 = jax.device_put(obj1, device=jax.devices("gpu")[1])
    obj2 = jax.device_put(obj2, device=jax.devices("gpu")[2])

    # do_something_on_gpu() function only uses the class data
    # these are embarrassingly parallel
    if rank == 0:
        obj0.do_something_on_gpu()
    if rank == 1:
        obj1.do_something_on_gpu()
    if rank == 2:
        obj2.do_something_on_gpu()

    ...
    # rest of the code concatenates results and does additional things

Currently, I am using the platform flag for memory allocation, and it works. But I wanted to ask you if there is a better way to allocate memory for each process on different GPU. Something like,

Rank 0 -> 99% of GPU0, 0.5% GPU1 and 2
Rank 1 -> 99% of GPU1, 0.5% GPU0 and 2
Rank 2 -> 99% of GPU2, 0.5% GPU0 and 1

Each process needs to be able to see and access the memory of each GPU for problem setup, but most memory will be used during do_something_on_gpu().

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory allocation of Multi-GPU MPI Jobs #28741

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Memory allocation of Multi-GPU MPI Jobs #28741

Uh oh!

Uh oh!

YigitElma May 14, 2025

Replies: 0 comments

YigitElma
May 14, 2025