Skip to content

Random access with lookahead/concurrent loading, for model training with torch #3320

Answered by pimdh
pimdh asked this question in Q&A
Discussion options

You must be logged in to vote

Thanks for coming back to me!

I'd prefer a solution in which we can define a map-style dataset which fetches rows, and then use the standard PyTorch DataLoader with multiprocess multiple workers to fetch the data. This is the default paradigm in pytorch and has two advantages:

  • If we need to do some data preprocessing, this is computed in parallel concurrent to the training. For example, this could be finding the neighbourhood lists in pointclouds.
  • If we do distributed training, it's more straightforward to set up the sampler correctly to sample different rows in each sampler.
  • Many ML frameworks/libraries, such as Ray Train, assume we're using the default dataloader.

I prefer to read the…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@westonpace
Comment options

@pimdh
Comment options

Answer selected by pimdh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants