Serving LLaMA 3-70B on TPUs | How To Scale Your Model #10

jacobaustin123 · 2025-02-03T02:23:42Z

jacobaustin123
Feb 3, 2025
Maintainer

Serving LLAMA on TPUs!

samacqua · 2025-02-11T16:37:51Z

samacqua
Feb 11, 2025 — with giscus

"Notably, at all batch sizes greater than 2k, FLOPs is always smaller than our KV loading time in this regime." Is this a typo, where it should be 200 instead of 2k?

1 reply

jacobaustin123 Feb 11, 2025 — with giscus
Maintainer Author

Rather, it meant to say "Notably, at all sequence lengths greater than 2048, we spend more time on KV cache loading than we do on FLOPs! So while we can improve our hardware utilization by increasing batch size, at long context lengths KV loading always dominates the total step time." I've gone ahead and updated the doc.

gitnicos · 2025-03-13T03:56:54Z

gitnicos
Mar 13, 2025 — with giscus

I think Question 3 answer has mistaken wording:
If we were to run on a 4x4, we’d still be fine ICI-wise, so our latency would drop to 17 / 2 = 8.5ms, but our throughput would remain the same.
instead the throughput is twice as big but per chip will remain the same

1 reply

jacobaustin123 Mar 13, 2025
Maintainer Author

Fixed, not says "but our throughput per chip would remain"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Serving LLaMA 3-70B on TPUs | How To Scale Your Model #10

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Serving LLaMA 3-70B on TPUs | How To Scale Your Model #10

Uh oh!

jacobaustin123 Feb 3, 2025 Maintainer

Replies: 2 comments · 2 replies

Uh oh!

samacqua Feb 11, 2025 — with giscus

Uh oh!

jacobaustin123 Feb 11, 2025 — with giscus Maintainer Author

Uh oh!

gitnicos Mar 13, 2025 — with giscus

Uh oh!

jacobaustin123 Mar 13, 2025 Maintainer Author

jacobaustin123
Feb 3, 2025
Maintainer

Replies: 2 comments 2 replies

samacqua
Feb 11, 2025 — with giscus

jacobaustin123 Feb 11, 2025 — with giscus
Maintainer Author

gitnicos
Mar 13, 2025 — with giscus

jacobaustin123 Mar 13, 2025
Maintainer Author