Skip to content

moreh-dev/moai-examples

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MoAI Platform

A full-stack infrastructure software from PyTorch to GPUs for the LLM era.
Decouple AI infrastructure from specific hardware vendors.
Virtualize of all GPU/NPUs in a cluster for higher utilization and failover.
Scale to thousands of GPUs/NPUs with automatic parallelization and optimization.
Supports any multi-billion or multi-trillion parameter model for training, and serving.


πŸš€ Designed to unlock the full potential of your AI infrastructure!

overview_01

QuickStart

The moai-examples repository is designed to work with a cluster where the MoAI Platform is installed.
To test these scripts, please contact us.

πŸ”· Training

πŸ”Έ Supported Models

Recommended Specifications python

The optimized versions of MAF, Torch, and Flavor for each model are as follows:

Model MAF Version Torch Version Python Version Flavor Train Batch Eval Batch Name in {model}
hugglyllama/llama-30b 25.5.3005 2.1.0 3.10 flavor-default-16 32 16 llama_30b
meta-llama/Llama-2-7b-chat-hf 25.5.3005 2.1.0 3.10 flavor-default-8 64 32 llama2_7b
meta-llama/Llama-2-13b-chat-hf 25.5.3005 2.1.0 3.10 flavor-default-8 32 8 llama2_13b
meta-llama/Meta-Llama-2-70b-hf 25.5.3005 2.1.0 3.10 flavor-default-32 256 64 llama2_70b
meta-llama/Meta-Llama-3-8B-Instruct 25.5.3005 2.1.0 3.10 flavor-default-8 64 32 llama3_8b
meta-llama/Meta-Llama-3-70B-Instuct 25.5.3005 2.1.0 3.10 flavor-default-32 256 64 llama3_70b
meta-llama/Meta-Llama-3-70B-Instuct (with LoRA) 25.5.3005 2.1.0 3.10 flavor-default-8 16 16 llama3_70b_lora
Qwen/Qwen2-72B-Instruct 25.5.3005 2.1.0 3.10 flavor-default-32 32 32 qwen2_72b

πŸ”Έ Environment Setup

You can simply set up fine-tuning with the command below.

cd moai-examples/
uv sync

πŸ’‘ Note: This repository uses version 25.5.3005 for set A (recommended).

πŸ”Έ Run a fine-tuning script

To fine-tune the model, run the training script as follows:

cd finetuning_codes
bash scripts/train_{model}.sh

By specifying one of the models listed under example model names in {model}, you can also run other examples.

πŸ”Έ Example: train_llama3_8b.sh

#!/bin/bash
export TOKENIZERS_PARALLELISM=false
export TRANSFORMERS_VERBOSITY=info
export ACCELERATOR_PLATFORM_FLAVOR=flavor-default-8

uv run accelerate launch \
        --config_file config.yaml \
        train.py \
        --model meta-llama/Meta-Llama-3-8B-Instruct \
        --dataset bitext/Bitext-customer-support-llm-chatbot-training-dataset \
        --lr 0.00001 \
        --train-batch-size 64 \
        --eval-batch-size 32 \
        --block-size 1024 \
        --num-epochs 1 \
        --max-steps -1 \
        --log-interval 20 \
        --save-path $SAVE_DIR

The above script is based on execution from the moai-examples/finetuning_codes directory.

πŸ”· Inference

Please contact the owner of the MoAI platform you wish to use for instructions on how to create an endpoint.

πŸ”Έ Supported Models

πŸ”Έ Endpoint Command

The following is the command to create an endpoint. Most models use the same command, but some models may require additional settings.
The command includes several environment variables:

  • $MODEL: the name or path of the model
  • $MODEL_MAX_LEN: the maximum context length supported by the model
  • $TP_SIZE: tensor parallel size, must match the number of GPUs used

Please note each models may require different values for these variables. Be sure to adjust them accordingly depending on the model you are using.

vllm serve $MODEL --max-model-len $MODEL_MAX_LEN --trust-remote-code --tensor-parallel-size $TP_SIZE --gpu-memory-utilization 0.95 --quantization None --block-size 16 --max-num-batched-tokens $MODEL_MAX_LEN --enable-chunked-prefill False

Commands for these specific models are described separately below:

  • deepseek-ai/DeepSeek-R1 : Some arguments are unnecessary for DeepSeek.
vllm serve deepseek-ai/DeepSeek-R1 --max-model-len 163840 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --quantization None

πŸ”Έ Request Command

The following is an example of a request command using curl. This format is compatible with the OpenAI Chat Completions API.

curl $ENDPOINT_URL/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "$MODEL",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the world series in 2020?"}
    ],
    "max_tokens": 200
}'

If the request is sent correctly, you should receive a response similar to the one below.
Note that the exact response content may vary depending on the model and its configuration.

{
  "id":"chatcmpl-abc123",
  "object":"chat.completion",
  "created":1744950029,
  "model":"$MODEL",
  "choices":[
    {
      "index":0,
      "message":{
        "role":"assistant",
        "reasoning_content":null,
        "content":" The winning team of the 2020 World Series was the Los Angeles Dodgers. They beat the Tampa Bay Rays in the seven-game series, which was held in stringent COVID-19 protocols to ensure the safety of players, staff, and fans.",
        "tool_calls":[]
      },
      "logprobs":null,
      "finish_reason":"stop",
      "stop_reason":null
    }
  ],
  "usage":{
    "prompt_tokens":35,
    "total_tokens":91,
    "completion_tokens":56,
    "prompt_tokens_details":null
  },
  "prompt_logprobs":null
}

πŸ”· MoAI Accelerator

You can check the current moai version and flavor through ap-smi.

ap-smi

+-----------------------------------------------------------------------------+
|                                     Accelerator platform Version : v0.0.30  |
|              MoAI Framework Version : 25.5.3005 Latest Version : 25.5.3005  |
+---------+---------+------------------------+----------------+---------------+
|      #  |  NAME   |  FLAVOR                |  MEM USAGE(%)  |  GPU UTIL(%)  |
+---------+---------+------------------------+----------------+---------------+
|      0  |  1779   |  flavor-default-8      |  43            |  100          |
+---------+---------+------------------------+----------------+---------------+

πŸ”· Directory and Code Details

πŸ”Έ Repo Structure

The structure of the entire repository is as follows:

moai-examples
β”œβ”€β”€ README.md                 # Project overview and instructions
β”œβ”€β”€ checkpoints               # Directory to store model checkpoints during finetuning
β”œβ”€β”€ finetuning_codes          # Code related to model fine-tuning
β”œβ”€β”€ git-hooks                 # Git hooks directory for code formatting and other pre/post-commit tasks
β”œβ”€β”€ pyproject.toml						# Project metadata
└── uv.lock         					# Lockfile that contains exact information about the proejct's dependencies

πŸ”Έ finetuning_codes

finetuning_codes directory contains train codes, model configs and scripts necessary for fine-tuning.

finetuning_codes
β”œβ”€β”€ config.yaml                   # Config file for accelerate
β”œβ”€β”€ logs 													# Directory for training logs
β”œβ”€β”€ scripts                       # Directory containing shell scripts for different fine-tuning setups
β”œβ”€β”€ train.py                      # Main Python script for initiating the fine-tuning process
└── utils.py                      # Utility functions for train.py

Learn More

Section Description
Portal Overview of technologies and company
ModelHub Chatbot using the MoAI Platform solution

πŸ”Έ Running YOLO models on MoAI Platform

To train or run inference with ultralytics YOLO models on the MoAI Platform, which uses AMD accelerators, follow the steps below:

# Be sure to use the ROCm version of torch/torchvision 
pip3 install torch torchvision  --index-url https://download.pytorch.org/whl/rocm6.3
pip3 install ultralytics 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5