Sarat Kannan sparklerz

Sarat Kannan

I build scalable LLM training/inference systems across distributed multi-GPU (PyTorch DDP/FSDP, DeepSpeed ZeRO, Ray, MosaicML LLM Foundry) and decentralised swarms (Hivemind, Petals). Writing experiments up so others can reproduce them.

Selected work

Distributed Multi-GPU LLM Fine-Tuning (monorepo) — PyTorch (DDP, FSDP), DeepSpeed (ZeRO Offload, Pipeline Parallelism), Ray (Train, Tune), MosaicML; W&B/MLflow + HF Hub for fully traceable runs.
- Repo: https://github.com/sparklerz/multigpu-llm-finetuning
Hivemind fine-tuning (Qwen2-0.5B) — Internet-scale data parallelism with DHT + fault tolerance; measured val-loss reductions.
- Overview: (meta repo) https://github.com/sparklerz/hivemind-qwen2-0.5b
- Article: https://medium.com/@kannansarat9/finetuning-qwen-0-5b-using-hivemind-data-parallelism-over-the-internet-e20af1b15c05
Petals (LLaMA-2-70B) — Decentralised inference + Deep prompt-tuning via swarm model-parallelism.

Writing

Medium: https://medium.com/@kannansarat9

Contact

DMs open: https://x.com/saratkannan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sarat Kannan sparklerz

Achievements

Achievements

Organizations

Block or report sparklerz

Sarat Kannan

Selected work

Writing

Contact

Pinned Loading

Uh oh!