🎯 Reduce AI model size by up to 90% while maintaining performance quality
A comprehensive toolkit for pruning large language models and neural networks with multiple advanced strategies. Features both command-line and GUI interfaces for maximum flexibility.
- 🎯 Magnitude-Based Pruning - Remove smallest weights (unstructured)
- 🏗️ Structured Pruning - Remove entire neurons/attention heads
- ⏱️ Gradual Pruning - Iterative pruning with fine-tuning recovery
- 🎓 Knowledge Distillation - Train smaller student models
- 🔄 Comprehensive - Combined pipeline for maximum compression
- 💻 Command Line Interface - Perfect for scripting and automation
- 🎨 Modern GUI - User-friendly interface with real-time progress tracking
- 📊 Dynamic Quantization - Reduce memory footprint further
- 🧠 Real Dataset Integration - Uses WikiText for fine-tuning
- 🔍 Model Validation - Automatic output verification
- 📈 Detailed Reporting - Comprehensive pruning statistics
🖼️ View Input Model Folder Structure Guide
The AI Model Pruner is designed to work with HuggingFace-compatible models. When you select an input model folder, it should contain the standard files that HuggingFace models use.
your-model-folder/
├── 📄 config.json # ← REQUIRED: Model architecture configuration
├── 🧠 pytorch_model.bin # ← REQUIRED: Model weights (PyTorch format)
│ OR
├── 🧠 model.safetensors # ← ALTERNATIVE: Model weights (SafeTensors format)
├── 🔤 tokenizer.json # ← REQUIRED: Tokenizer configuration
├── 🔤 tokenizer_config.json # ← REQUIRED: Tokenizer settings
├── 📝 vocab.txt # ← REQUIRED: Vocabulary file
│ OR
├── 📝 vocab.json # ← ALTERNATIVE: Vocabulary (JSON format)
├── 📝 merges.txt # ← OPTIONAL: BPE merges (for some tokenizers)
└── 📋 special_tokens_map.json # ← OPTIONAL: Special token mappings
File | Purpose | Required? | What it Contains |
---|---|---|---|
config.json |
🏗️ Architecture | ✅ YES | Model dimensions, layer count, attention heads |
pytorch_model.bin |
🧠 Weights | ✅ YES | All trained parameters (billions of numbers) |
model.safetensors |
🧠 Weights | ✅ ALT | Same as above, but safer format |
tokenizer.json |
🔤 Text Processing | ✅ YES | How to convert text to numbers |
tokenizer_config.json |
⚙️ Tokenizer Settings | ✅ YES | Tokenizer behavior configuration |
vocab.txt / vocab.json |
📝 Vocabulary | ✅ YES | All words/tokens the model knows |
merges.txt |
🔗 BPE Rules | ❓ MAYBE | Word splitting rules (GPT-style models) |
special_tokens_map.json |
🏷️ Special Tokens | ❓ OPTIONAL | [CLS], [SEP], [PAD] token definitions |
bert-base-uncased/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
├── tokenizer_config.json
├── vocab.txt
└── special_tokens_map.json
gpt2/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
├── tokenizer_config.json
├── vocab.json
├── merges.txt
└── special_tokens_map.json
t5-small/
├── config.json
├── pytorch_model.bin
├── tokenizer.json
├── tokenizer_config.json
├── spiece.model # ← SentencePiece tokenizer
└── special_tokens_map.json
❌ pure-pytorch-model/
├── model.pth # Raw PyTorch state dict
└── custom_config.py # Custom Python configuration
❌ tensorflow-model/
├── saved_model.pb # TensorFlow format
└── variables/
❌ onnx-model/
├── model.onnx # ONNX format
└── config.yaml
❌ custom-format/
├── weights.dat # Custom binary format
└── architecture.xml # Custom config
import torch
from transformers import AutoConfig, AutoModel
# If you have a raw PyTorch model
state_dict = torch.load("model.pth")
# You need to create a compatible config
config = AutoConfig.from_pretrained("bert-base-uncased") # Use similar model as template
model = AutoModel.from_config(config)
model.load_state_dict(state_dict)
# Save in HuggingFace format
model.save_pretrained("./converted-model")
from transformers import TFAutoModel, AutoModel
# Load TensorFlow model
tf_model = TFAutoModel.from_pretrained("tf-model-path", from_tf=True)
# Convert to PyTorch
pytorch_model = AutoModel.from_pretrained("tf-model-path", from_tf=True)
pytorch_model.save_pretrained("./converted-model")
import os
from pathlib import Path
def check_model_folder(folder_path):
folder = Path(folder_path)
# Required files
required = ["config.json", "tokenizer_config.json"]
# Need either pytorch_model.bin OR model.safetensors
weights_files = ["pytorch_model.bin", "model.safetensors"]
# Need either vocab.txt OR vocab.json
vocab_files = ["vocab.txt", "vocab.json"]
print(f"🔍 Checking {folder}...")
# Check required files
for file in required:
if (folder / file).exists():
print(f"✅ {file} - Found")
else:
print(f"❌ {file} - Missing")
# Check weights
weights_found = any((folder / f).exists() for f in weights_files)
if weights_found:
found_weight = next(f for f in weights_files if (folder / f).exists())
print(f"✅ {found_weight} - Found")
else:
print(f"❌ No weight files found ({', '.join(weights_files)})")
# Check vocab
vocab_found = any((folder / f).exists() for f in vocab_files)
if vocab_found:
found_vocab = next(f for f in vocab_files if (folder / f).exists())
print(f"✅ {found_vocab} - Found")
else:
print(f"❌ No vocabulary files found ({', '.join(vocab_files)})")
# Usage
check_model_folder("./my-model-folder")
from transformers import AutoModel, AutoTokenizer, AutoConfig
def test_model_loading(model_path):
try:
print(f"🧪 Testing model loading from {model_path}...")
# Try to load config
config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
print("✅ Config loaded successfully")
# Try to load model
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
print("✅ Model loaded successfully")
# Try to load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
print("✅ Tokenizer loaded successfully")
print(f"🎉 Model is compatible! Parameters: {model.num_parameters():,}")
return True
except Exception as e:
print(f"❌ Error loading model: {e}")
return False
# Usage
test_model_loading("./my-model-folder")
- Always test load your model before pruning
- Keep backups of original model files
- Check file sizes -
pytorch_model.bin
should be the largest file - Verify permissions - ensure the pruner can read all files
# Download a test model to get started
python -c "
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('distilbert-base-uncased')
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
model.save_pretrained('./test-model')
tokenizer.save_pretrained('./test-model')
print('✅ Test model saved to ./test-model')
"
Issue | Cause | Solution |
---|---|---|
"Config not found" | Missing config.json |
Download complete model from HuggingFace |
"Model weights not found" | Missing .bin or .safetensors |
Ensure model file downloaded completely |
"Tokenizer error" | Missing tokenizer files | Re-download model or copy tokenizer files |
"Trust remote code" | Custom model code | Add trust_remote_code=True parameter |
🎯 Remember: The AI Model Pruner expects the exact same format that HuggingFace uses. If you can load your model with AutoModel.from_pretrained()
, then it will work with the pruner!
Pruning Strategy | Minimum VRAM | Recommended VRAM | Use Case |
---|---|---|---|
🎯 Magnitude | 8 GB | 12+ GB | Quick experimentation |
🏗️ Structured | 8 GB | 12+ GB | Maximum inference speed |
⏱️ Gradual | 12 GB | 24+ GB | Best quality preservation |
🎓 Distillation | 16 GB | 24+ GB | Clean, optimized models |
🔄 Comprehensive | 12 GB | 24+ GB | Maximum compression |
- OS: Windows 10+, macOS 10.14+, or Linux
- Python: 3.8 or higher
- Storage: SSD recommended for faster model loading
- RAM: 16 GB minimum, 32 GB+ recommended
# Clone the repository
git clone https://github.com/LMLK-seal/ai-model-pruner.git
cd ai-model-pruner
# Install dependencies
pip install -r requirements.txt
python pruner_gui.py
- 📁 Select Input Model - Choose your HuggingFace model directory
- 📂 Choose Output Location - Where to save the pruned model
- ⚙️ Configure Settings - Pick strategy and compression ratio
▶️ Start Pruning - Watch real-time progress and logs
# Basic pruning with magnitude strategy
python model_pruner.py \
--input_path ./models/original-model \
--output_path ./models/pruned-model \
--strategy magnitude \
--target_reduction 0.75
# Advanced: Comprehensive pruning pipeline
python model_pruner.py \
--input_path ./models/large-model \
--output_path ./models/compressed-model \
--strategy comprehensive \
--target_reduction 0.85 \
--validate
# "Digital Weeding" - Remove smallest weights
pruner.magnitude_based_pruning(sparsity_ratio=0.5)
- ✅ Pros: Simple, fast, fine-grained control
- ❌ Cons: Creates sparse models, may need specialized hardware
- 🎯 Best for: Quick tests, when sparsity is desired
# "Architectural Demolition" - Remove entire components
pruner.structured_pruning(reduction_ratio=0.3)
- ✅ Pros: Hardware-friendly, directly reduces complexity
- ❌ Cons: Coarse-grained, bigger impact per change
- 🎯 Best for: Maximizing inference speed on standard GPUs
# "Slow and Steady" - Iterative pruning with recovery
pruner.gradual_magnitude_pruning(
initial_sparsity=0.1,
final_sparsity=0.7,
steps=10
)
- ✅ Pros: Best quality preservation, adaptive recovery
- ❌ Cons: Very slow, computationally expensive
- 🎯 Best for: When model accuracy is critical
# "Master and Apprentice" - Train new smaller model
pruner.knowledge_distillation_pruning(
student_ratio=0.5,
epochs=3
)
- ✅ Pros: Clean optimized models, high potential
- ❌ Cons: Requires training data, complex process
- 🎯 Best for: Creating highly optimized models from scratch
# "Full Treatment" - Combined strategies
pruner.optimize_model_size()
- ✅ Pros: Maximum compression, best overall results
- ❌ Cons: Most complex, longest processing time
- 🎯 Best for: Achieving maximum size reduction
Original Model | Strategy | Size Reduction | Speed Increase | Quality Loss |
---|---|---|---|---|
GPT-2 Medium (355M) | Structured | 60% ↓ | 2.3x ↑ | <5% ↓ |
BERT Base (110M) | Gradual | 75% ↓ | 1.8x ↑ | <3% ↓ |
RoBERTa Large (355M) | Distillation | 80% ↓ | 4.1x ↑ | <8% ↓ |
Custom Model (1.3B) | Comprehensive | 85% ↓ | 3.2x ↑ | <10% ↓ |
ai-model-pruner/
├── 📄 model_pruner.py # Core pruning logic
├── 🎨 pruner_gui.py # GUI interface
├── 📖 README.md # This file
├── 📋 requirements.txt # Dependencies
Please use the issue tracker to report bugs.
We'd love to hear your ideas! Open an issue with the enhancement
label.
This project is licensed under the MIT License - see the LICENSE file for details.
- 🤗 Hugging Face - For the transformers library
- 🔥 PyTorch - For the deep learning framework
- 📊 Datasets Library - For easy dataset integration
- 🎨 CustomTkinter - For the modern GUI framework
- 📖 Documentation: Wiki
- 💬 Discussions: GitHub Discussions
- 🐛 Issues: Issue Tracker
⭐ Star this repo if it helped you!
Made with ❤️ by LMLK-seal