CompeteSMoE - Statistically Guaranteed Mixture of Experts Training via Competition

Authors: Nam V. Nguyen, Huy Nguyen, Quang Pham, Van Nguyen, Savitha Ramasamy, Nhat Ho

📌 About

Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, we argue that effective SMoE training remains challenging because of the suboptimal routing process where experts that perform computation do not directly contribute to the routing process. In this work, we propose competition, a novel mechanism to route tokens to experts with the highest neural response. Theoretically, we show that the competition mechanism enjoys a better sample efficiency than the traditional softmax routing. Furthermore, we develop CompeteSMoE, a simple yet effective algorithm to train large language models by deploying a router to learn the competition policy, thus enjoying strong performances at a low training overhead. Our extensive empirical evaluations on both the visual instruction tuning and language pre-training tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies.

🏆 Performance & Comparisons

Highlights:

CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and state-of-the-art routing methods.
Achieves competitive results compared to recent MoE architectures such as SharedE-V2 and SharedE-V3 (inspired by DeepSeek).
Despite architectural innovations of these models (e.g. shared experts), CompeteSMoE-5.1B consistently delivers superior or comparable results.

🗓️ Release Notes

Date	Release Notes
2025-05-20	- Released CompeteSMoE 5.1B, trained on the LLAVA665K dataset. ✅
2025-05-20	- Published CompeteSMoE paper and open-source code. ✅

📌 Citation

If you find this repository useful, please consider citing our paper:

@misc{nguyen2025competesmoestatisticallyguaranteed,
      title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition}, 
      author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho},
      year={2025},
      eprint={2505.13380},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.13380}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
docs		docs
evaluate		evaluate
moe_model		moe_model
moe_pretrain_model		moe_pretrain_model
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
WEIGHT_LICENSE		WEIGHT_LICENSE
cog.yaml		cog.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CompeteSMoE - Statistically Guaranteed Mixture of Experts Training via Competition

📌 About

🏆 Performance & Comparisons

🗓️ Release Notes

📌 Citation

About

Uh oh!

Releases

Packages

Languages

License

Fsoft-AIC/CompeteSMoE

Folders and files

Latest commit

History

Repository files navigation

CompeteSMoE - Statistically Guaranteed Mixture of Experts Training via Competition

📌 About

🏆 Performance & Comparisons

🗓️ Release Notes

📌 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages