This is the repository for DisTrO (Distributed Training Over-The-Internet), a family of low latency distributed optimizers that reduce inter-GPU communication requirements by three to four orders of magnitude.
- Aug. 26th, 2024: DisTrO (Preliminary Report)
- Dec. 2nd, 2024: DeMo Optimization (Paper) (Code), original seed research/idea for DisTrO
- Dec. 2nd, 2024: Nous trains a 15b model using DisTrO
- May. 14th, 2025: Psyche Network
- May. 14th, 2025: Nous Consilience 40b LLM, Huggingface
- Coming Soon: DisTrO Paper and Code
Join us on Discord if you're interested in helping research and build the future of distributed training.