OpenSeek Project Phase I Completion Announcement & Phase II Preview
(Dev-v0.1 Released | Docker Support | Competition Collaboration)
Key Updates:
- Project Restructuring Complete
- Added Docker-based environment setup for simplified onboarding.
- Released pipeline documentation enabling 1.4B-0.4A model training in <5 steps.
- Published full details of openseek-small-v1 (W&B logs, evaluations, configs).
- Added guides for data ratio experiments + benchmark results.
Phase II: Panyu Algorithm Competition ("Beyond Cup")
Co-hosted with Pazhou Algorithm Contest (Competition Link).
Provides compute resources, detailed docs, and cash prizes.
Finals: Collaborative training of 16B DeepSeek-v3 model.
Working Group Progress
🔹 Data Working Group
- CCI4.0 Datasets (Total 35TB | Chinese: 5.2TB)
- CCI4.0-M2-BASE: ModelScope
- CCI4.0-M2-CoT: ModelScope
- CCI4.0-M2-Extra: ModelScope
🔹 Algorithm Working Group
- OpenSeek-Small-v1 Model: Hugging Face
- Baseline-100B Dataset: Hugging Face
- Baseline Model: Hugging Face
🔹 Systems Working Group
- DeepSeek-v3 architecture support added to FlagScale: GitHub
Acknowledgments
"Phase I concludes with gratitude to all contributors. We now advance to Phase II with enhanced capabilities and community collaboration."