How to understand sharding? #27524

Biosins · 2025-03-27T16:19:21Z

Biosins
Mar 27, 2025

I have recently picked up JAX and converted a project I'm working on from pytorch. It's a bit of a learning curve but I love functional programming so it's fun.

I discovered that google colab offers an 8 TPU session, and wanted to understand how I can optimise my sharding for the calculations I am doing. Are there any good resources for reading about how sharding is commonly thought about?

From what I understand, I shard the inputs to calculations, and then let the compiler figure out how to handle the rest. For example, I can shard a batch into 8 parts, and then duplicate the params across all devices and perform the calculation in parallel.

Could I also shard the parameters into different devices, and would that yield a speed up? If I have 4 layers that run sequentially, I can't imagine how forcing the data to be passed between devices would be quicker. On the other hand, if I shard the params matrices, and perform the calculation in parallel and then combine the results - perhaps this would be quicker?

Any answers or direction would be appreciated :)

Answered by yashk2810

Mar 27, 2025

Hey!

I would recommend reading these docs:

For the techniques you mentioned, it really depends on what you are trying to do. Maybe https://jax-ml.github.io/scaling-book/training/ can help? This doc covers the techniques you were asking about (Data parallelism, FSDP and TP)

View full answer

yashk2810 · 2025-03-27T16:27:16Z

yashk2810
Mar 27, 2025
Collaborator

Hey!

I would recommend reading these docs:

For the techniques you mentioned, it really depends on what you are trying to do. Maybe https://jax-ml.github.io/scaling-book/training/ can help? This doc covers the techniques you were asking about (Data parallelism, FSDP and TP)

3 replies

Biosins Mar 27, 2025
Author

Hey, thanks for the quick reply! I have been reading these docs, but I feel like it assumes some level of prior knowledge? Maybe it's just that it's information dense and so requires a few re-reads. I'll have a look at the doc you linked though, thanks.

yashk2810 Mar 27, 2025
Collaborator

Any specific questions you have that I can help clarify?

Biosins Mar 27, 2025
Author

The doc you shared is just what I was looking for. 👍

(There was a question here I removed because I found the answer in the docs I just mentioned)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to understand sharding? #27524

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to understand sharding? #27524

Uh oh!

Biosins Mar 27, 2025

Replies: 1 comment · 3 replies

Uh oh!

yashk2810 Mar 27, 2025 Collaborator

Uh oh!

Biosins Mar 27, 2025 Author

Uh oh!

yashk2810 Mar 27, 2025 Collaborator

Uh oh!

Uh oh!

Biosins Mar 27, 2025 Author

Biosins
Mar 27, 2025

Replies: 1 comment 3 replies

yashk2810
Mar 27, 2025
Collaborator

Biosins Mar 27, 2025
Author

yashk2810 Mar 27, 2025
Collaborator

Biosins Mar 27, 2025
Author