Skip to content

A CLI and API tool for converting, validating, and managing machine learning models across multiple formats. Supports ONNX, FP16, HuggingFace, TorchScript, GGUF, MLX, GPTQ, AWQ, and more.

Notifications You must be signed in to change notification settings

duoyuncloud/ModelConverterTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Converter Tool

A professional, API-first tool for machine learning model conversion and management. Supports ONNX, GGUF, MLX, TorchScript, GPTQ, AWQ, SafeTensors, HuggingFace, Megatron-LM, MTK, and more.

Features

  • Multi-format support: Convert between ONNX, GGUF, MLX, GPTQ, AWQ, SafeTensors, Megatron-LM, MTK, and more
  • Megatron-LM integration: Bidirectional conversion between HuggingFace and Megatron-LM formats
  • Tensor Parallel Support: Advanced conversion for distributed models with TP/PP (Tensor/Pipeline Parallelism)
  • Smart Auto-Detection: Automatically detects model type, size, and parallel configuration
  • MTK integration: Convert HuggingFace models to MTK format for MediaTek platforms
  • Advanced quantization: Fine-grained control with GPTQ/AWQ configuration
  • muP-to-LLaMA scaling: Automatic parameter rescaling for LLaMA compatibility
  • Fake weights: Generate test models without downloading large parameters
  • Batch processing: Convert multiple models using YAML/JSON configuration
  • API-first design: Use via CLI or integrate into Python workflows

Installation

git clone https://github.com/duoyuncloud/ModelConverterTool.git
cd ModelConverterTool
chmod +x install.sh
./install.sh
source venv/bin/activate

Quick Start

# Convert a single model
modelconvert convert gpt2 onnx

# Convert with quantization
modelconvert convert facebook/opt-125m gptq --quant 4bit

# Convert muP model to LLaMA format
modelconvert convert path/to/mup_model safetensors --mup2llama

# Convert HuggingFace to Megatron-LM format
modelconvert convert OpenBMB/MiniCPM4-0.5B hf2megatron --model-type minicpm

# Convert Megatron-LM to HuggingFace format
modelconvert convert models/megatron_model hf --model-type minicpm

# Convert tensor parallel Megatron model (auto-detection)
modelconvert convert models/tp_megatron_model hf --model-type auto

# Convert to MTK format for MediaTek platforms
modelconvert convert OpenBMB/MiniCPM4-0.5B mtk --model-type text-generation

# Generate fake weights for testing
modelconvert convert gpt2 safetensors --fake-weight

# Batch convert multiple models
modelconvert batch configs/batch_config.yaml

# Inspect model details
modelconvert inspect gpt2

# Check model usability
modelconvert check outputs/model.onnx

# View conversion history
modelconvert history

Commands

convert

Convert a model to a different format.

Usage: modelconvert convert <input_model> <output_format> [options]

Options:

  • -o, --output-path - Output file/directory path
  • --quant - Quantization type (4bit, q4_k_m, etc.)
  • --quant-config - Advanced quantization config (JSON/YAML)
  • --mup2llama - Enable muP-to-LLaMA scaling
  • --model-type - Model type for Megatron conversions (minicpm, llama, auto)
  • --use-smart-converter - Force use of smart converter for tensor parallel models
  • --use-legacy-converter - Force use of legacy converter for compatibility
  • --fake-weight - Use zero weights for testing
  • --dtype - Output precision (fp16, fp32)

batch

Batch convert models using a configuration file.

Usage: modelconvert batch <config_path> [options]

Options:

  • --max-workers - Concurrent workers (default: 1)
  • --max-retries - Max retries per task (default: 1)

inspect

Display detailed model information.

Usage: modelconvert inspect <model>

check

Test if a model can be loaded and run inference.

Usage: modelconvert check <model_path> [--format <format>] [--verbose]

history

Show conversion history.

Usage: modelconvert history

config

Manage configuration settings.

Usage: modelconvert config <show|get|set|list-presets> [args]

Advanced Features

Quantization Configuration

Fine-grained control for GPTQ and AWQ engines:

# Using config file
modelconvert convert model gptq --quant-config config.yaml

# Inline JSON
modelconvert convert model gptq --quant-config '{"bits": 4, "group_size": 128}'

Supported Parameters:

  • bits - Quantization bits (4, 8)
  • group_size - Group size (128, 256)
  • sym - Symmetric quantization (bool)
  • desc_act - Descriptive activation (bool)
  • damp_percent - Damping percentage
  • And more...

muP-to-LLaMA Scaling

Automatically convert muP-initialized models:

modelconvert convert mup_model safetensors --mup2llama

Megatron-LM Integration

Bidirectional conversion between HuggingFace and Megatron-LM formats with advanced tensor parallel support:

# HuggingFace to Megatron-LM
modelconvert convert OpenBMB/MiniCPM4-0.5B hf2megatron --model-type minicpm

# Megatron-LM to HuggingFace (basic)
modelconvert convert models/megatron_model hf --model-type minicpm

# Tensor Parallel Megatron-LM to HuggingFace (auto-detection)
modelconvert convert models/tp_megatron_model hf --model-type auto

# Force smart converter for complex models
modelconvert convert models/large_megatron_model hf --model-type minicpm --use-smart-converter

# Use legacy converter for compatibility
modelconvert convert models/old_megatron_model hf --model-type minicpm --use-legacy-converter

Supported Models:

  • MiniCPM Series: 0.5B, 1.5B, 3B, 8B, 14B (with TP/PP support)
  • MiniCPM-4: MoE (Mixture of Experts) models
  • Llama Series: 7B, 13B, 30B, 65B (with TP/PP support)
  • Mistral: Full bidirectional support

Tensor Parallel Features:

  • Auto-Detection: Automatically detects model type, size, and parallel configuration
  • Smart Conversion: Chooses optimal conversion strategy based on model characteristics
  • TP/PP Support: Handles Tensor Parallel (TP) and Pipeline Parallel (PP) configurations
  • Fallback Strategy: Graceful fallback to legacy converters if needed
  • Model-Specific Optimizations: Specialized converters for common model sizes

MTK Integration

Convert HuggingFace models to MTK format for MediaTek platforms:

# Basic LLM conversion
modelconvert convert OpenBMB/MiniCPM4-0.5B mtk --model-type text-generation

# VLM conversion with custom platform
modelconvert convert vision-model mtk --model-type image-classification \
  --quantization-config '{"platform": "MT6897", "model_size": "1_6B"}'

# Advanced configuration
modelconvert convert model mtk --model-type text-generation \
  --quantization-config '{
    "platform": "MT6991",
    "model_size": "1_2B", 
    "weight_bit": 4,
    "mtk_cloud_path": "/custom/path/to/mtk_cloud"
  }'

Supported Platforms:

  • MT6991: High-end MediaTek platform
  • MT6989: Mid-range MediaTek platform
  • MT6897: Entry-level MediaTek platform

Supported Model Sizes:

  • 0_5B, 0_9B, 1_2B, 1_6B, 8B, 0_58B

Features:

  • Automatic model type detection (LLM vs VLM)
  • Real-time conversion progress display
  • Custom MTK cloud path configuration
  • Environment variable support (MTK_CLOUD_PATH)
  • Output validation with TFLite file detection

Dependencies:

  • MTK conversion requires a separate mtk_cloud repository with install.sh
  • The repository provides conversion scripts for LLM and VLM models

Tensor Parallel Conversion

Advanced conversion for distributed models with tensor and pipeline parallelism:

# Auto-detect and convert tensor parallel model
modelconvert convert models/tp_megatron_model hf --model-type auto

# Force smart converter for complex distributed models
modelconvert convert models/large_tp_model hf --model-type minicpm --use-smart-converter

# Convert with specific parallel configuration
modelconvert convert models/custom_tp_model hf --model-type minicpm \
  --tp-size 4 --pp-size 2 --num-layer 32

# Convert MiniCPM-4 MoE model
modelconvert convert models/minicpm4_moe_model hf --model-type minicpm4

Supported Parallel Configurations:

  • Tensor Parallel (TP): 1, 2, 4, 8 ranks
  • Pipeline Parallel (PP): 1, 2, 4 stages
  • Hybrid TP/PP: Combined tensor and pipeline parallelism

Auto-Detection Features:

  • Model type detection (MiniCPM, Llama, MiniCPM-4)
  • Model size detection (0.5B, 1.5B, 3B, 8B, 14B, etc.)
  • Parallel configuration detection (TP/PP sizes)
  • MoE layer detection for MiniCPM-4 models

Fake Weights

Generate models with zero weights for testing:

# Zero weights
modelconvert convert gpt2 safetensors --fake-weight

# Custom shapes
modelconvert convert gpt2 safetensors --fake-weight --fake-weight-config shapes.yaml

Supported Formats

Conversion Matrix

Input Format Output Formats
HuggingFace All formats
SafeTensors HuggingFace, SafeTensors
TorchScript TorchScript
ONNX ONNX
GGUF GGUF
MLX MLX
Megatron-LM HuggingFace
MTK MTK

Quantization Support

Format Quantization Types
GPTQ 4bit, 8bit
AWQ 4bit, 8bit
GGUF q4_k_m, q5_k_m, q8_0
MLX q4_k_m, q8_0, q5_k_m
SafeTensors fp16, fp32
MTK 4bit, 8bit

API Usage

from model_converter_tool.api import ModelConverterAPI

api = ModelConverterAPI()

# Convert with fake weights for testing
result = api.convert_model(
    model_name="gpt2",
    output_format="onnx",
    output_path="./gpt2.onnx",
    fake_weight=True
)

Configuration Files

Create YAML/JSON files for batch processing:

models:
  - model_path: gpt2
    output_path: outputs/gpt2_onnx
    output_format: onnx
  
  - model_path: facebook/opt-125m
    output_path: outputs/opt_gptq
    output_format: gptq
    quantization: 4bit

Testing

Run the test suite:

# All tests
pytest

# Integration tests
pytest tests/test_integration.py

# Specific test
pytest tests/test_cli.py

Documentation

About

A CLI and API tool for converting, validating, and managing machine learning models across multiple formats. Supports ONNX, FP16, HuggingFace, TorchScript, GGUF, MLX, GPTQ, AWQ, and more.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages