You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on an optimization code where we have very memory-intensive objectives. I managed to use mpi4py to distribute the computations to multiple devices and execute them at the same time. (Note: Each objective is taking different sized-arrays, and do completely different operations, so built-in JAX SPMD parallelization doesn't help).
Example structure of the code is,
num_device=3os.environ["CUDA_DEVICE_ORDER"] ="PCI_BUS_ID"os.environ["CUDA_VISIBLE_DEVICES"] ="0,1,2"# os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"] = str(1 / (num_device + 2))# os.environ["XLA_PYTHON_CLIENT_ALLOCATOR"] = "platform"frommpi4pyimportMPIif__name__=="__main__":
rank=MPI.COMM_WORLD.Get_rank()
size=MPI.COMM_WORLD.Get_size()
# Let's say I have 3 expensive objective functionsobj0=Object0()
obj1=Object1()
obj2=Object2()
# place all the data related to the objectives to corresponding GPUsobj0=jax.device_put(obj0, device=jax.devices("gpu")[0])
obj1=jax.device_put(obj1, device=jax.devices("gpu")[1])
obj2=jax.device_put(obj2, device=jax.devices("gpu")[2])
# do_something_on_gpu() function only uses the class data# these are embarrassingly parallelifrank==0:
obj0.do_something_on_gpu()
ifrank==1:
obj1.do_something_on_gpu()
ifrank==2:
obj2.do_something_on_gpu()
...
# rest of the code concatenates results and does additional things
Currently, I am using the platform flag for memory allocation, and it works. But I wanted to ask you if there is a better way to allocate memory for each process on different GPU. Something like,
Rank 0 -> 99% of GPU0, 0.5% GPU1 and 2
Rank 1 -> 99% of GPU1, 0.5% GPU0 and 2
Rank 2 -> 99% of GPU2, 0.5% GPU0 and 1
Each process needs to be able to see and access the memory of each GPU for problem setup, but most memory will be used during do_something_on_gpu().
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am working on an optimization code where we have very memory-intensive objectives. I managed to use
mpi4py
to distribute the computations to multiple devices and execute them at the same time. (Note: Each objective is taking different sized-arrays, and do completely different operations, so built-in JAX SPMD parallelization doesn't help).Example structure of the code is,
Currently, I am using the
platform
flag for memory allocation, and it works. But I wanted to ask you if there is a better way to allocate memory for each process on different GPU. Something like,Each process needs to be able to see and access the memory of each GPU for problem setup, but most memory will be used during
do_something_on_gpu()
.Thank you!
Beta Was this translation helpful? Give feedback.
All reactions