The all-in-one, modular image dataset utility for ML, with a focus on HQ/LQ image pairs for SISR and general computer vision. CLI-first, highly extensible, and packed with advanced tools for dataset curation, analysis, transformation, and validation.
Dataset Forge is a Python CLI tool for managing, analyzing, and transforming image datasets—especially high/low quality pairs for super-resolution and machine learning.
It streamlines dataset curation, analysis, transformation, and validation with an intuitive, extensible interface.
- Clean and organize image datasets (HQ/LQ pairs for super-resolution)
- Analyze dataset quality and generate reports
- Process, augment, and transform images
- Modular, CLI-first, and highly extensible
- Robust parallel and GPU-accelerated processing
- 🌐 Global Command System: Context-aware help and instant quit from any menu
- 📚 Comprehensive Help: Menu-specific documentation and navigation assistance
- See all features
git clone https://github.com/Courage-1984/Dataset-Forge.git
cd Dataset-Forge
- See Getting Started for full Instructions. and then see Special Installation Instructions for further instructions.
- Getting Started
- Features
- Usage Guide
- Troubleshooting
- Contributing
- Development Standards - Menu system patterns and coding standards
- MCP Integration Guide - Enhanced development with AI assistance
- Full Documentation Index
- Python: 3.12+ (see requirements.txt)
- OS: Windows (primary)
- CUDA/cuDNN: For GPU acceleration (see Special Installation)
- Thanks Kim2091 ❤️ for helpful-scripts
- Thanks umzi2 ❤️ for WTP Dataset Destroyer & PepeDP
- Thanks the-database ❤️ for traiNNer-redux
- Thanks Phhofm ❤️ for sisr
- PepeDP
- WTP Dataset Destroyer
- traiNNer-redux
- Getnative
- resdet
- ExifTool
- Oxipng
- Steghide
- zsteg
- IQA-PyTorch / py-iqa
- imagededup
- ffmpeg | ffmpeg builds
- GetFnative
- getfscaler
This project is licensed under the Creative Commons CC-BY-SA-4.0. See LICENSE for details.
A simplified overview of Dataset Forge's modular architecture:
flowchart TD
A["🚀 CLI Entrypoint"] --> B["📋 Main Menu"]
B --> C["🎛️ Menu System"]
C --> D["⚡ Actions Layer"]
D --> E["🛠️ Core Utilities"]
D --> F["🔧 DPID Implementations"]
subgraph "Core Components"
G["📂 Dataset Management"]
H["🔍 Analysis & Validation"]
I["✨ Image Processing"]
J["🛠️ Utilities & Tools"]
end
subgraph "Supporting Systems"
K["💾 Memory Management"]
L["⚡ Parallel Processing"]
M["🎨 UI/CLI System"]
N["🔧 External Libraries"]
end
C --> G
C --> H
C --> I
C --> J
D --> K
D --> L
D --> M
D --> N
For the full roadmap and advanced usage, see the Documentation Home.