Skip to content

pmap approach when workload is a mulitple of TPU num_devices #4198

Answered by mattjj
matpalm asked this question in General
Discussion options

You must be logged in to vote

Twitter context.

This is an annoying wart of pmap, which we hope to revise soon! We have a prototype replacement checked in, called gmap (from #4006), which will allow schedulable maps, so that you can control how the map is evaluated as a combination of parallelization, vectorization, and iteration (like your manual pmap+vmap, but without requiring the reshape, and without requiring you to have two separate axis names). But while that's the long-term solution, it's not ready yet (in particular because it doesn't work efficiently with ShardedDeviceArrays). (cc @apaszke )

However, I suggested you ask about this on GitHub because there is another (older) prototype you could try: soft_pmap (…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by matpalm
Comment options

You must be logged in to vote
3 replies
@tchaton
Comment options

@froystig
Comment options

@Chillee
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants