Input on mpirun
Launcher for JAX Support in Kubeflow Trainer V2
#30498
Unanswered
mahdikhashan
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
As part of a KEP to integrate JAX into Kubeflow Trainer V2, I'm currently evaluating whether using the mpirun launcher is a good fit. One of my main concerns is around its suitability for TPUs, which seem to rely on gRPC for communication rather than MPI.
More generally, I'm also interested in understanding the performance characteristics of MPI, particularly in terms of latency. Are there any known limitations or considerations when using MPI in latency-sensitive workloads?
Any guidance, experiences, or references would be greatly appreciated.
Here’s the KEP for context: kubeflow/trainer#2643
Thanks
Mahdi
Beta Was this translation helpful? Give feedback.
All reactions