Generate high-quality, YOLOv8-compatible datasets with ease by leveraging the power of Grounding DINO for zero-shot object detection and Segment Anything (SAM) for precise segmentation. This tool allows you to annotate images using intuitive text prompts (e.g., "tomato", "onion", etc.), and automatically produces properly structured YOLO-format datasets complete with labeled training, validation, and testing sets, along with a data.yaml
file for immediate use in model training.
- 🔍 Zero-shot object detection using class names (e.g. "potato", "car", etc.)
- ✏️ Segmentation masks via Meta's Segment Anything
- 🌀 Automatically generates
train
,valid
,test
folders - 🔢 YOLOv8-compatible
data.yaml
file - 🚀 Easy-to-use, customizable CLI
https://github.com/codeprnv/yolo-dataset-generator.git
cd yolo-dataset-generator
pip install -r requirements.txt
Make sure you have Python 3.9+ and CUDA installed.
- Clone the Grounding DINO repository and download the SwinT model:
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
mkdir weights
# Download the SwinT model
curl -L https://github.com/IDEA-Research/GroundingDINO/releases/download/0.1.0/groundingdino_swint_ogc.pth -o weights/groundingdino_swint_ogc.pth
- Clone the SAM repository and download ViT-H model:
git clone https://github.com/facebookresearch/segment-anything.git
cd segment-anything
# Download ViT-H SAM model
curl -L https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -o weights/sam_vit_h_4b8939.pth
Update the paths in the script to point to your local GroundingDINO config and checkpoint, and the SAM checkpoint.
- Takes class names from user input (e.g., "potato, tomato")
- Detects objects using Grounding DINO with enhanced prompts (e.g., "all tomatoes")
- Segments each object using SAM
- Converts boxes to YOLO format
- Splits images and labels into
train
,valid
, andtest
folders - Creates a
data.yaml
file for training in YOLOv8
python gds_yolo_prep.py
You'll be prompted to:
- Enter class names (comma separated)
- Enter name for the output dataset folder
The script will:
- Process all images
- Detect, segment and convert to YOLO format
- Create folder structure:
dataset_name/
├── train/
│ ├── images/
│ └── labels/
├── valid/
│ ├── images/
│ └── labels/
├── test/
│ ├── images/
│ └── labels/
└── data.yaml
train: /path/to/train/images
val: /path/to/valid/images
nc: 3
names: ['tomato', 'potato', 'onion']
MIT License