pmap approach when workload is a mulitple of TPU num_devices #4198

matpalm · 2020-09-03T01:08:34Z

matpalm
Sep 3, 2020

consider working with a colab provided 8 device TPU. the pmap cookbook shows an example of doing parallel work with pmap by placing data across devices and then doing some calculation; e.g.

matrices = random.normal(random.PRNGKey(0), (8, 100, 200))   # DeviceArray
matrices = pmap(lambda r: r)(matrices)                       # ShardedDeviceArray
result = pmap(lambda x: jnp.dot(x, x.T))(matrices)
result.shape

(8, 100, 100)

if we try to do this with a leading dimension that's not equal to the number of devices we get an error

matrices = random.normal(random.PRNGKey(0), (16, 100, 200))
matrices = pmap(lambda r: r)(matrices)

ValueError: compiling computation that requires 16 logical devices, but only 8 XLA devices are available (num_replicas=16, num_partitions=1)

if the leading dim is a multiple of the number of devices we can get around this by 1) injecting a dummy axis in and 2) further mapping across the dummy axis with a vmap inside the pmap 3) reshaping the final result to get rid of the dummy axis

matrices = random.normal(random.PRNGKey(0), (16, 100, 200))
matrices = matrices.reshape(8, 2, 100, 200)
matrices = pmap(lambda r: r)(matrices)
result = pmap(vmap(lambda x: jnp.dot(x, x.T)))(matrices).reshape(16, 100, 100)

this works for my use case, just wanted to make sure i wasn't missing something about pmap that could do this for me?

Answered by mattjj

Sep 3, 2020

Twitter context.

This is an annoying wart of pmap, which we hope to revise soon! We have a prototype replacement checked in, called gmap (from #4006), which will allow schedulable maps, so that you can control how the map is evaluated as a combination of parallelization, vectorization, and iteration (like your manual pmap+vmap, but without requiring the reshape, and without requiring you to have two separate axis names). But while that's the long-term solution, it's not ready yet (in particular because it doesn't work efficiently with ShardedDeviceArrays). (cc @apaszke )

However, I suggested you ask about this on GitHub because there is another (older) prototype you could try: soft_pmap (…

View full answer

mattjj · 2020-09-03T23:04:54Z

mattjj
Sep 3, 2020
Maintainer

Twitter context.

This is an annoying wart of pmap, which we hope to revise soon! We have a prototype replacement checked in, called gmap (from #4006), which will allow schedulable maps, so that you can control how the map is evaluated as a combination of parallelization, vectorization, and iteration (like your manual pmap+vmap, but without requiring the reshape, and without requiring you to have two separate axis names). But while that's the long-term solution, it's not ready yet (in particular because it doesn't work efficiently with ShardedDeviceArrays). (cc @apaszke )

However, I suggested you ask about this on GitHub because there is another (older) prototype you could try: soft_pmap (which is "soft" in that it softens the hardware correspondence constraints of pmap, or because it does some of the map "in software"). It was too hard to explain on Twitter because to use soft_pmap you also need to enable omnistaging (#3370), which isn't turned on by default yet:

from jax.config import config
config.enable_omnistaging()

In terms of your example, you should be able to write this (where I effectively merged the first two lines):

import jax.numpy as jnp
from jax import random
from jax import soft_pmap

from jax.config import config
config.enable_omnistaging()

keys = random.split(random.PRNGKey(0), 16)
matrices = soft_pmap(lambda k: random.normal(k, (100, 200)))(keys)
result = soft_pmap(lambda x: jnp.dot(x, x.T))(matrices)
print(result.shape)  # (16, 100, 100)
print(type(result))  # <class 'jax.interpreters.pxla.ShardedDeviceArray'>

0 replies

matpalm · 2020-09-04T00:06:34Z

matpalm
Sep 4, 2020
Author

that's perfect! i'll switch to soft_pmap and keep on eye on gmap. thanks!

3 replies

tchaton May 11, 2021

Hey @mattjj, I was wondering if a DDP version of pmap or gmap was planned too ?

froystig May 18, 2021
Maintainer

Here's roughly where this work is now: https://jax.readthedocs.io/en/latest/jax.experimental.maps.html

Chillee May 19, 2021

From my understanding, gmap => xmap as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pmap approach when workload is a mulitple of TPU num_devices #4198

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

pmap approach when workload is a mulitple of TPU num_devices #4198

Uh oh!

matpalm Sep 3, 2020

Replies: 2 comments · 3 replies

Uh oh!

mattjj Sep 3, 2020 Maintainer

Uh oh!

matpalm Sep 4, 2020 Author

Uh oh!

tchaton May 11, 2021

Uh oh!

froystig May 18, 2021 Maintainer

Uh oh!

Chillee May 19, 2021

matpalm
Sep 3, 2020

Replies: 2 comments 3 replies

mattjj
Sep 3, 2020
Maintainer

matpalm
Sep 4, 2020
Author

froystig May 18, 2021
Maintainer