Skip to content

Data seems not sharded across processes in multi-host single-slice setting #271

Answered by tengyifei
weirayao asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @weirayao , thanks for checking out our code base. I think what you're seeing is consistent with the distributed execution model of torchprime, although the behavior is not intuitive. Here's what's happening:

  • torchprime uses SPMD instead of multi-processing based distributed execution.
  • The torchprime trainer instantiates a stack of dataloaders each wrapping the previous one.
  • The base DataLoader outputs local tensors stored on the cpu.
  • The MpDeviceLoader outputs sharded global tensors stored on the TPU. If we print its output, that would give the appearance of duplication.

We can get a better understanding of this process by instrumenting the data loaders. I've uploaded an example at

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by weirayao
Comment options

You must be logged in to vote
1 reply
@weirayao
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #267 on June 02, 2025 20:37.