MoAI Platform

A full-stack infrastructure software from PyTorch to GPUs for the LLM era.
Decouple AI infrastructure from specific hardware vendors.
Virtualize of all GPU/NPUs in a cluster for higher utilization and failover.
Scale to thousands of GPUs/NPUs with automatic parallelization and optimization.
Supports any multi-billion or multi-trillion parameter model for training, and serving.

🚀 Designed to unlock the full potential of your AI infrastructure!

QuickStart

The moai-examples repository is designed to work with a cluster where the MoAI Platform is installed.
To test these scripts, please contact us.

🔷 Training

🔸 Supported Models

Recommended Specifications

The optimized versions of MAF, Torch, and Flavor for each model are as follows:

Model	MAF Version	Torch Version	Python Version	Flavor	Train Batch	Eval Batch	Name in `{model}`
hugglyllama/llama-30b	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-16`	32	16	`llama_30b`
meta-llama/Llama-2-7b-chat-hf	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-8`	64	32	`llama2_7b`
meta-llama/Llama-2-13b-chat-hf	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-8`	32	8	`llama2_13b`
meta-llama/Meta-Llama-2-70b-hf	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-32`	256	64	`llama2_70b`
meta-llama/Meta-Llama-3-8B-Instruct	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-8`	64	32	`llama3_8b`
meta-llama/Meta-Llama-3-70B-Instuct	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-32`	256	64	`llama3_70b`
meta-llama/Meta-Llama-3-70B-Instuct (with LoRA)	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-8`	16	16	`llama3_70b_lora`
Qwen/Qwen2-72B-Instruct	`25.5.3005`	`2.1.0`	`3.10`	`flavor-default-32`	32	32	`qwen2_72b`

🔸 Environment Setup

You can simply set up fine-tuning with the command below.

cd moai-examples/
uv sync

💡 Note: This repository uses version 25.5.3005 for set A (recommended).

🔸 Run a fine-tuning script

To fine-tune the model, run the training script as follows:

cd finetuning_codes
bash scripts/train_{model}.sh

By specifying one of the models listed under example model names in {model}, you can also run other examples.

🔸 Example: `train_llama3_8b.sh`

#!/bin/bash
export TOKENIZERS_PARALLELISM=false
export TRANSFORMERS_VERBOSITY=info
export ACCELERATOR_PLATFORM_FLAVOR=flavor-default-8

uv run accelerate launch \
        --config_file config.yaml \
        train.py \
        --model meta-llama/Meta-Llama-3-8B-Instruct \
        --dataset bitext/Bitext-customer-support-llm-chatbot-training-dataset \
        --lr 0.00001 \
        --train-batch-size 64 \
        --eval-batch-size 32 \
        --block-size 1024 \
        --num-epochs 1 \
        --max-steps -1 \
        --log-interval 20 \
        --save-path $SAVE_DIR

The above script is based on execution from the moai-examples/finetuning_codes directory.

🔷 Inference

Please contact the owner of the MoAI platform you wish to use for instructions on how to create an endpoint.

🔸 Supported Models

Supported Models	Model Max Length	TP Size
Supported Models	Model Max Length	MI250	MI308x
huggyllama/llama-30b	2048	1	1
meta-llama/Llama-2-7b-chat-hf	4096	1	1
meta-llama/Llama-2-13b-chat-hf	4096	1	1
meta-llama/Meta-Llama-2-70b-hf	4096	4	2
meta-llama/Meta-Llama-3-8B-Instruct	8192	1	1
meta-llama/Meta-Llama-3-70B-Instuct	8192	4	2
Qwen/Qwen2-72B-Instruct	32768	4	2
Qwen/QwQ-32B	40960	4	2
deepseek-ai/DeepSeek-R1	163840	-	8

🔸 Endpoint Command

The following is the command to create an endpoint. Most models use the same command, but some models may require additional settings.
The command includes several environment variables:

$MODEL: the name or path of the model
$MODEL_MAX_LEN: the maximum context length supported by the model
$TP_SIZE: tensor parallel size, must match the number of GPUs used

Please note each models may require different values for these variables. Be sure to adjust them accordingly depending on the model you are using.

vllm serve $MODEL --max-model-len $MODEL_MAX_LEN --trust-remote-code --tensor-parallel-size $TP_SIZE --gpu-memory-utilization 0.95 --quantization None --block-size 16 --max-num-batched-tokens $MODEL_MAX_LEN --enable-chunked-prefill False

Commands for these specific models are described separately below:

deepseek-ai/DeepSeek-R1 : Some arguments are unnecessary for DeepSeek.

vllm serve deepseek-ai/DeepSeek-R1 --max-model-len 163840 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --quantization None

🔸 Request Command

The following is an example of a request command using curl. This format is compatible with the OpenAI Chat Completions API.

curl $ENDPOINT_URL/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "$MODEL",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}
    ],
    "max_tokens": 200
}'

If the request is sent correctly, you should receive a response similar to the one below.
Note that the exact response content may vary depending on the model and its configuration.

{
  "id":"chatcmpl-abc123",
  "object":"chat.completion",
  "created":1744950029,
  "model":"$MODEL",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "reasoning_content":null,
        "content":" The winning team of the 2020 World Series was the Los Angeles Dodgers. They beat the Tampa Bay Rays in the seven-game series, which was held in stringent COVID-19 protocols to ensure the safety of players, staff, and fans.",
        "tool_calls":[]
      },
      "logprobs":null,
      "finish_reason":"stop",
      "stop_reason":null
    }
  ],
  "usage":{
    "prompt_tokens":35,
    "total_tokens":91,
    "completion_tokens":56,
    "prompt_tokens_details":null
  },
  "prompt_logprobs":null
}

🔷 MoAI Accelerator

You can check the current moai version and flavor through ap-smi.

ap-smi

+-----------------------------------------------------------------------------+
|                                     Accelerator platform Version : v0.0.30  |
|              MoAI Framework Version : 25.5.3005 Latest Version : 25.5.3005  |
+---------+---------+------------------------+----------------+---------------+
|      #  |  NAME   |  FLAVOR                |  MEM USAGE(%)  |  GPU UTIL(%)  |
+---------+---------+------------------------+----------------+---------------+
|      0  |  1779   |  flavor-default-8      |  43            |  100          |
+---------+---------+------------------------+----------------+---------------+

🔷 Directory and Code Details

🔸 Repo Structure

The structure of the entire repository is as follows:

moai-examples
├── README.md                 # Project overview and instructions
├── checkpoints               # Directory to store model checkpoints during finetuning
├── finetuning_codes          # Code related to model fine-tuning
├── git-hooks                 # Git hooks directory for code formatting and other pre/post-commit tasks
├── pyproject.toml						# Project metadata
└── uv.lock         					# Lockfile that contains exact information about the proejct's dependencies

🔸 `finetuning_codes`

finetuning_codes directory contains train codes, model configs and scripts necessary for fine-tuning.

finetuning_codes
├── config.yaml                   # Config file for accelerate
├── logs 													# Directory for training logs
├── scripts                       # Directory containing shell scripts for different fine-tuning setups
├── train.py                      # Main Python script for initiating the fine-tuning process
└── utils.py                      # Utility functions for train.py

Learn More

Section	Description
Portal	Overview of technologies and company
ModelHub	Chatbot using the MoAI Platform solution

🔸 Running YOLO models on MoAI Platform

To train or run inference with ultralytics YOLO models on the MoAI Platform, which uses AMD accelerators, follow the steps below:

# Be sure to use the ROCm version of torch/torchvision 
pip3 install torch torchvision  --index-url https://download.pytorch.org/whl/rocm6.3
pip3 install ultralytics

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
.github		.github
finetuning_codes		finetuning_codes
git-hooks		git-hooks
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.python-version		.python-version
.style.yapf		.style.yapf
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MoAI Platform

QuickStart

🔷 Training

🔸 Supported Models

🔸 Environment Setup

🔸 Run a fine-tuning script

🔸 Example: `train_llama3_8b.sh`

🔷 Inference

🔸 Supported Models

🔸 Endpoint Command

🔸 Request Command

🔷 MoAI Accelerator

🔷 Directory and Code Details

🔸 Repo Structure

🔸 `finetuning_codes`

Learn More

🔸 Running YOLO models on MoAI Platform

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

moreh-dev/moai-examples

Folders and files

Latest commit

History

Repository files navigation

MoAI Platform

QuickStart

🔷 Training

🔸 Supported Models

🔸 Environment Setup

🔸 Run a fine-tuning script

🔸 Example: train_llama3_8b.sh

🔷 Inference

🔸 Supported Models

🔸 Endpoint Command

🔸 Request Command

🔷 MoAI Accelerator

🔷 Directory and Code Details

🔸 Repo Structure

🔸 finetuning_codes

Learn More

🔸 Running YOLO models on MoAI Platform

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

🔸 Example: `train_llama3_8b.sh`

🔸 `finetuning_codes`

Packages