Primus is a flexible and high-performance training framework designed for large-scale foundation model training and inference. It is designed to support pretraining, posttraining, and reinforcement learning workflows, and is compatible with multiple backends including Megatron and ROCm-optimized components.
- [2025/06/18] Added TorchTitan backend support.
- [2025/05/16] Added benchmark suite for performance evaluation across models and hardware.
- [2025/04/18] Added Preflight cluster sanity checker to verify environment readiness.
- [2025/04/14] Integrated HipblasLT autotuning for optimized GPU kernel performance.
- [2025/04/09] Extended support for LLaMA2, LLaMA3, DeepSeek-V2/V3 models in Megatron model configs.
- [2025/03/04] Released Megatron trainer module for flexible and efficient large model training.
Primus leverages AMD’s ROCm Docker images to provide a consistent, ready-to-run environment optimized for AMD GPUs. This eliminates manual dependency and environment configuration.
- AMD ROCm drivers (version ≥ 6.0 recommended)
- Docker (version ≥ 24.0) with ROCm support
- ROCm-compatible AMD GPUs (e.g., Instinct MI300 series)
- Proper permissions for Docker and GPU device access
-
Pull the latest Docker image
docker pull docker.io/rocm/megatron-lm:v25.7_py310
-
Clone the repository:
git clone --recurse-submodules https://github.com/AMD-AIG-AIMA/Primus.git
-
Run Pretraining
cd Primus && pip install -r requirements.txt EXP=examples/megatron/configs/llama2_7B-pretrain.yaml bash ./examples/run_local_pretrain.sh
For more detailed usage instructions, configuration options, and examples, please refer to the examples/README.md.
- Support for Primus-RL (training/inference modules for RLHF, OnlineDPO, GRPO, etc.)
- Add support for more model architectures and backends