A professional, API-first tool for machine learning model conversion and management. Supports ONNX, GGUF, MLX, TorchScript, GPTQ, AWQ, SafeTensors, HuggingFace, Megatron-LM, MTK, and more.
- Multi-format support: Convert between ONNX, GGUF, MLX, GPTQ, AWQ, SafeTensors, Megatron-LM, MTK, and more
- Megatron-LM integration: Bidirectional conversion between HuggingFace and Megatron-LM formats
- Tensor Parallel Support: Advanced conversion for distributed models with TP/PP (Tensor/Pipeline Parallelism)
- Smart Auto-Detection: Automatically detects model type, size, and parallel configuration
- MTK integration: Convert HuggingFace models to MTK format for MediaTek platforms
- Advanced quantization: Fine-grained control with GPTQ/AWQ configuration
- muP-to-LLaMA scaling: Automatic parameter rescaling for LLaMA compatibility
- Fake weights: Generate test models without downloading large parameters
- Batch processing: Convert multiple models using YAML/JSON configuration
- API-first design: Use via CLI or integrate into Python workflows
git clone https://github.com/duoyuncloud/ModelConverterTool.git
cd ModelConverterTool
chmod +x install.sh
./install.sh
source venv/bin/activate
# Convert a single model
modelconvert convert gpt2 onnx
# Convert with quantization
modelconvert convert facebook/opt-125m gptq --quant 4bit
# Convert muP model to LLaMA format
modelconvert convert path/to/mup_model safetensors --mup2llama
# Convert HuggingFace to Megatron-LM format
modelconvert convert OpenBMB/MiniCPM4-0.5B hf2megatron --model-type minicpm
# Convert Megatron-LM to HuggingFace format
modelconvert convert models/megatron_model hf --model-type minicpm
# Convert tensor parallel Megatron model (auto-detection)
modelconvert convert models/tp_megatron_model hf --model-type auto
# Convert to MTK format for MediaTek platforms
modelconvert convert OpenBMB/MiniCPM4-0.5B mtk --model-type text-generation
# Generate fake weights for testing
modelconvert convert gpt2 safetensors --fake-weight
# Batch convert multiple models
modelconvert batch configs/batch_config.yaml
# Inspect model details
modelconvert inspect gpt2
# Check model usability
modelconvert check outputs/model.onnx
# View conversion history
modelconvert history
Convert a model to a different format.
Usage: modelconvert convert <input_model> <output_format> [options]
Options:
-o, --output-path
- Output file/directory path--quant
- Quantization type (4bit, q4_k_m, etc.)--quant-config
- Advanced quantization config (JSON/YAML)--mup2llama
- Enable muP-to-LLaMA scaling--model-type
- Model type for Megatron conversions (minicpm, llama, auto)--use-smart-converter
- Force use of smart converter for tensor parallel models--use-legacy-converter
- Force use of legacy converter for compatibility--fake-weight
- Use zero weights for testing--dtype
- Output precision (fp16, fp32)
Batch convert models using a configuration file.
Usage: modelconvert batch <config_path> [options]
Options:
--max-workers
- Concurrent workers (default: 1)--max-retries
- Max retries per task (default: 1)
Display detailed model information.
Usage: modelconvert inspect <model>
Test if a model can be loaded and run inference.
Usage: modelconvert check <model_path> [--format <format>] [--verbose]
Show conversion history.
Usage: modelconvert history
Manage configuration settings.
Usage: modelconvert config <show|get|set|list-presets> [args]
Fine-grained control for GPTQ and AWQ engines:
# Using config file
modelconvert convert model gptq --quant-config config.yaml
# Inline JSON
modelconvert convert model gptq --quant-config '{"bits": 4, "group_size": 128}'
Supported Parameters:
bits
- Quantization bits (4, 8)group_size
- Group size (128, 256)sym
- Symmetric quantization (bool)desc_act
- Descriptive activation (bool)damp_percent
- Damping percentage- And more...
Automatically convert muP-initialized models:
modelconvert convert mup_model safetensors --mup2llama
Bidirectional conversion between HuggingFace and Megatron-LM formats with advanced tensor parallel support:
# HuggingFace to Megatron-LM
modelconvert convert OpenBMB/MiniCPM4-0.5B hf2megatron --model-type minicpm
# Megatron-LM to HuggingFace (basic)
modelconvert convert models/megatron_model hf --model-type minicpm
# Tensor Parallel Megatron-LM to HuggingFace (auto-detection)
modelconvert convert models/tp_megatron_model hf --model-type auto
# Force smart converter for complex models
modelconvert convert models/large_megatron_model hf --model-type minicpm --use-smart-converter
# Use legacy converter for compatibility
modelconvert convert models/old_megatron_model hf --model-type minicpm --use-legacy-converter
Supported Models:
- MiniCPM Series: 0.5B, 1.5B, 3B, 8B, 14B (with TP/PP support)
- MiniCPM-4: MoE (Mixture of Experts) models
- Llama Series: 7B, 13B, 30B, 65B (with TP/PP support)
- Mistral: Full bidirectional support
Tensor Parallel Features:
- Auto-Detection: Automatically detects model type, size, and parallel configuration
- Smart Conversion: Chooses optimal conversion strategy based on model characteristics
- TP/PP Support: Handles Tensor Parallel (TP) and Pipeline Parallel (PP) configurations
- Fallback Strategy: Graceful fallback to legacy converters if needed
- Model-Specific Optimizations: Specialized converters for common model sizes
Convert HuggingFace models to MTK format for MediaTek platforms:
# Basic LLM conversion
modelconvert convert OpenBMB/MiniCPM4-0.5B mtk --model-type text-generation
# VLM conversion with custom platform
modelconvert convert vision-model mtk --model-type image-classification \
--quantization-config '{"platform": "MT6897", "model_size": "1_6B"}'
# Advanced configuration
modelconvert convert model mtk --model-type text-generation \
--quantization-config '{
"platform": "MT6991",
"model_size": "1_2B",
"weight_bit": 4,
"mtk_cloud_path": "/custom/path/to/mtk_cloud"
}'
Supported Platforms:
- MT6991: High-end MediaTek platform
- MT6989: Mid-range MediaTek platform
- MT6897: Entry-level MediaTek platform
Supported Model Sizes:
- 0_5B, 0_9B, 1_2B, 1_6B, 8B, 0_58B
Features:
- Automatic model type detection (LLM vs VLM)
- Real-time conversion progress display
- Custom MTK cloud path configuration
- Environment variable support (
MTK_CLOUD_PATH
) - Output validation with TFLite file detection
Dependencies:
- MTK conversion requires a separate
mtk_cloud
repository withinstall.sh
- The repository provides conversion scripts for LLM and VLM models
Advanced conversion for distributed models with tensor and pipeline parallelism:
# Auto-detect and convert tensor parallel model
modelconvert convert models/tp_megatron_model hf --model-type auto
# Force smart converter for complex distributed models
modelconvert convert models/large_tp_model hf --model-type minicpm --use-smart-converter
# Convert with specific parallel configuration
modelconvert convert models/custom_tp_model hf --model-type minicpm \
--tp-size 4 --pp-size 2 --num-layer 32
# Convert MiniCPM-4 MoE model
modelconvert convert models/minicpm4_moe_model hf --model-type minicpm4
Supported Parallel Configurations:
- Tensor Parallel (TP): 1, 2, 4, 8 ranks
- Pipeline Parallel (PP): 1, 2, 4 stages
- Hybrid TP/PP: Combined tensor and pipeline parallelism
Auto-Detection Features:
- Model type detection (MiniCPM, Llama, MiniCPM-4)
- Model size detection (0.5B, 1.5B, 3B, 8B, 14B, etc.)
- Parallel configuration detection (TP/PP sizes)
- MoE layer detection for MiniCPM-4 models
Generate models with zero weights for testing:
# Zero weights
modelconvert convert gpt2 safetensors --fake-weight
# Custom shapes
modelconvert convert gpt2 safetensors --fake-weight --fake-weight-config shapes.yaml
Input Format | Output Formats |
---|---|
HuggingFace | All formats |
SafeTensors | HuggingFace, SafeTensors |
TorchScript | TorchScript |
ONNX | ONNX |
GGUF | GGUF |
MLX | MLX |
Megatron-LM | HuggingFace |
MTK | MTK |
Format | Quantization Types |
---|---|
GPTQ | 4bit, 8bit |
AWQ | 4bit, 8bit |
GGUF | q4_k_m, q5_k_m, q8_0 |
MLX | q4_k_m, q8_0, q5_k_m |
SafeTensors | fp16, fp32 |
MTK | 4bit, 8bit |
from model_converter_tool.api import ModelConverterAPI
api = ModelConverterAPI()
# Convert with fake weights for testing
result = api.convert_model(
model_name="gpt2",
output_format="onnx",
output_path="./gpt2.onnx",
fake_weight=True
)
Create YAML/JSON files for batch processing:
models:
- model_path: gpt2
output_path: outputs/gpt2_onnx
output_format: onnx
- model_path: facebook/opt-125m
output_path: outputs/opt_gptq
output_format: gptq
quantization: 4bit
Run the test suite:
# All tests
pytest
# Integration tests
pytest tests/test_integration.py
# Specific test
pytest tests/test_cli.py
- CLI Reference - Complete command documentation
- Configuration - Batch processing and advanced options
- Converter Engine - Technical details and architecture
- Megatron-LM Integration - Comprehensive guide for Megatron-LM conversions
- Tensor Parallel Conversion - Advanced guide for TP/PP model conversions
- Examples - Sample scripts and workflows