A simple desktop app to convert a raw CSV into user-selected training dataset formats with optional train/val/test splits and stratification.
- Browse or paste path to input CSV.
- Choose output format from many exporters (see below).
- Select output folder and base filename.
- Optional column filtering (keep only specified columns).
- Optional train/val/test split with optional stratification column.
- Neural processing mode (optional) with multi-engine support for auto-labeling:
- Tasks: Detection or Classification
- Engines: Ultralytics YOLO (Det/Cls), TorchVision (Classification)
- Model path/name or preset (e.g.,
yolo11n.pt
,yolov8n.pt
,resnet18
) - Confidence threshold (YOLO)
- Overwrite existing labels toggle
- Download Models button to fetch common weights offline with a progress bar
- Logging panel for progress and validation messages.
- Preview first 50 rows of the CSV.
- Settings auto-save to reload your last used paths/options.
- About menu with version.
- Python 3.9+
- Core packages (installed via
requirements.txt
):pandas
pyarrow
(for Parquet)Pillow
(image IO for some exporters)
Optional extras (install only if you need the feature):
openpyxl
— Excel (XLSX) exportultralytics
— Neural mode (YOLO detection/classification)torch
,torchvision
— TorchVision Classification engine (CPU wheels recommended)tensorflow
— TFRecord export
Install dependencies:
pip install -r requirements.txt
# Optional (CPU-only wheels for TorchVision via PyTorch index):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python app.py
You can either run the portable EXE or run from source.
-
Run the EXE (no install):
- Build or download
dist/DatasetConverter/DatasetConverter.exe
- Double-click to launch. If SmartScreen warns: click “More info” → “Run anyway”.
- Build or download
-
Run from source:
# 1) Install Python 3.10 or 3.11 (64-bit) # 2) Create a virtual environment py -m venv .venv .\.venv\Scripts\activate python -m pip install --upgrade pip # 3) Install all dependencies pip install -r requirements.txt # Optional: on CPU-only machines, smaller wheels via PyTorch CPU index pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # 4) Run the app python app.py
Notes:
- Model weights (Ultralytics/TorchVision) download on first use. If offline, use the app’s
Download Models
button later when online, or place the.pt
files next to the EXE or in your working directory. - TensorFlow is only required for TFRecord export; if it fails to install on your Python/Windows version, remove it from
requirements.txt
and other features will still work. - Some systems may prompt for Visual C++ runtime; follow the prompt once if needed.
- Tabular: CSV, JSONL, Parquet, Feather, Excel (XLSX), SQLite
- Detection: COCO (Detection), YOLO TXT (Detection), Pascal VOC (XML), YOLO Dataset (images+labels)
- Classification: ImageFolder (class-per-subdir)
- Segmentation: COCO (Segmentation), YOLO TXT (Segmentation)
- ML/Hub friendly: Hugging Face Dataset (JSONL), WebDataset (tar shards)
- Other: TFRecord (requires TensorFlow), Audio Manifest (JSONL), TimeSeries Windows (Parquet)
- Parquet requires
pyarrow
. - Excel export requires
openpyxl
. - TFRecord export requires
tensorflow
. - Neural mode requires
ultralytics
and/ortorch
+torchvision
. See below. - Stratified split requires enough samples per class; otherwise the app falls back to a random split.
- Column list should be comma-separated without quotes, e.g.:
feature1, feature2, label
. - Your settings are stored at
%APPDATA%/DatasetConverter/settings.json
on Windows.
- Without split:
<output_folder>/<base>.{csv|jsonl|parquet}
- With split:
<output_folder>/<base>_train.*
,<base>_val.*
,<base>_test.*
- Click "Browse..." to pick
data/raw.csv
(or "Paste Path"). - Choose output folder, e.g.,
data/processed/
. - Select format:
JSONL
(or any from the dropdown). - Set base filename:
dataset
. - (Optional) Keep columns:
text,label
. - Enable split:
Train=0.8, Val=0.1, Test=0.1
, Stratify:label
. - Click Convert.
- Set Processing Mode to
Neural
. - Choose Task
Detection
orClassification
. - Select
Engine
: Ultralytics YOLO (Det/Cls) or TorchVision (Cls). - Pick a
Preset
or type a model/arch (e.g.,yolo11n.pt
,yolov8n.pt
,resnet18
). - Adjust
Confidence
andOverwrite
as needed (YOLO only). - Click Convert — predictions are applied before any split/export.
- During inference, a progress dialog shows per-image progress.
- If CSV reading fails, ensure the file is not open/locked and is a valid CSV.
- If saving Parquet fails, install
pyarrow
. - Large files: consider running from a 64-bit Python and enough RAM.
You can package this app as a single-folder Windows executable using PyInstaller.
./build.ps1
# or to clean previous builds
./build.ps1 -Clean
The EXE will be at dist/DatasetConverter/DatasetConverter.exe
.
# Create venv if needed
py -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install -r requirements.txt
.\.venv\Scripts\python -m pip install pyinstaller
# Build
.\.venv\Scripts\python -m PyInstaller \
--name "DatasetConverter" \
--noconfirm \
--windowed \
--clean \
app.py
If Windows SmartScreen warns when running the EXE, click "More info" → "Run anyway" (you may sign the binary if distributing).
- Zip the entire folder
dist/DatasetConverter/
intoDatasetConverter-win.zip
. - Upload the zip to a GitHub Release.
- In release notes, mention first-run notes (SmartScreen, model downloads/offline button).
- Optional: publish a SHA256 checksum:
certutil -hashfile DatasetConverter-win.zip SHA256
Neural mode auto-annotates your data using Ultralytics YOLO or TorchVision (classification-only).
# In your virtual environment (pick what you need)
# Ultralytics (YOLO engines)
pip install ultralytics
# TorchVision Classification (CPU wheels example)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
- Switch Processing Mode to
Neural
. - Choose
Task
(Detection or Classification). - Select
Engine
: Ultralytics YOLO (Det/Cls) or TorchVision (Cls). - Pick a
Preset
or type a model/arch (e.g.,yolo11n.pt
,yolov8n.pt
,resnet18
). - Adjust
Confidence
andOverwrite
as needed (YOLO only). - Click
Convert
— predictions are applied before any split/export. - During inference, a progress dialog shows per-image progress.
- Use the
Download Models
button to prefetch:yolo11n.pt
,yolov8n.pt
,yolov8n-cls.pt
,resnet18
,resnet50
,mobilenet_v3_small
,efficientnet_b0
. - The downloader runs PowerShell and shows logs with a green progress bar and percentage.
- If you attempt inference without network and weights are missing, the app will prompt you to download common models.