Welcome, fellow digital alchemists and AI enthusiasts! Ever felt like your AI training runs were getting derailed by sneaky, subpar images or wonky captions? Frustrate no more! The Dragon Diffusion Dataset Validator is here to be your trusty sidekick, ensuring your datasets are as pristine and potent as a freshly brewed potion. ✨
This jolly little Gradio app helps you analyze your image datasets for quality, caption accuracy (from an AI's perspective!), and even face consistency. Say goodbye to wasted VRAM and hello to crisper, cleaner models!
Here’s how to get this dragon soaring on your machine in the correct order of spell-casting:
-
Conjure Your Virtual Cauldron (Virtual Environment) 🧪
Before we unleash the dragon, let’s brew a clean Python environment to keep your spells organized and avoid magical mishaps!python -m venv venv
- On Windows:
.\venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
Voilà! Your cauldron is bubbling, and you’re ready for the next step.
- On Windows:
-
Summon the Mighty PyTorch! 🔥
The Dragon needs PyTorch to breathe fire, but it must match your system’s CUDA magic (or lack thereof). Here are some example incantations:- For CUDA 12.6:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
- For CUDA 11.8:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
A Mighty Warning!
⚠️
Check your CUDA version withnvcc --version
ornvidia-smi
. If you’re GPU-less, use the CPU version:pip install torch torchvision torchaudio
Lost in the PyTorch forest? Consult the PyTorch installation guide for your perfect spell. Mismatched versions make for grumpy dragons!
- For CUDA 12.6:
-
Install the Mystical Dependencies 📜
With PyTorch in place, conjure the remaining libraries from the sacredrequirements.txt
:pip install -r requirements.txt
-
Clone this Fiery Repo 📦
If you haven’t already, summon the Dragon’s lair to your machine:git clone https://github.com/DragonDiffusionbyBoyo/Dragon-Diffusion-Dataset-Validator.git cd Dragon-Diffusion-Dataset-Validator
-
Prepare Your 'Input' Hoard! 📂
This is super important for smooth sailing, especially on Windows to avoid Gradio-related security quirks:- Create a folder named
input
directly in the root directory of this project (e.g.,Dragon-Diffusion-Dataset-Validator/input/
). - Place all the image folders you want to analyze inside this
input
folder. When you run the app, you’ll specify the path to your dataset likeinput/my_awesome_dataset/
.
- Create a folder named
-
Unleash the Dragon! 🚀
Simply run the main script and watch the magic unfold!python dragon_validatorv8.py
This will launch the Gradio web interface in your browser, ready to weave its analytical spells.
Once the Gradio app is up and running, you’ll see a friendly interface:
-
Reference Image (Optional but Recommended for Faces):
- Upload a clear reference image of the face you want to check for consistency across your dataset. This is used specifically for the "Face Consistency Analysis." If your dataset doesn’t involve consistent faces (e.g., landscapes), you can skip this.
-
Dataset Folder Path:
- Enter the path to the folder containing your images and their corresponding
.txt
caption files. Remember theinput
folder rule! If your dataset is inDragon-Diffusion-Dataset-Validator/input/my_training_data/
, you’d enterinput/my_training_data/
.
- Enter the path to the folder containing your images and their corresponding
-
Analyse Dataset:
- Hit this button and let the Dragon do its thing! It will churn through your images, perform its multi-faceted analysis, and populate the table with glorious RAG (Red/Amber/Green) ratings.
-
Select All Red:
- Feeling bold? This button automatically selects all images that scored a ‘Red’ in any of the analysis categories. A quick way to triage!
-
Move Selected to Hold:
- This button performs the ultimate magic trick! It whisks away all the selected images (and their
.txt
caption files) into a newly createdhold
folder within your original dataset directory. This keeps your main dataset clean and ready for prime-time training.
- This button performs the ultimate magic trick! It whisks away all the selected images (and their
This tool isn’t just about human eyes; it’s about seeing your data through an AI’s perceptive lens! Our brains are fantastic at filling in gaps and making sense of fuzzy images, but an AI’s learning process is far more literal and data-driven. If an image is subtly problematic for an AI, it can absolutely derail a training run, even if it looks okay to us.
The workflow and analysis are built on this core principle:
-
Initial Curation & Captioning (Human Touch):
You, the brilliant human, start by curating a dataset and applying captions based on your understanding of the images. -
Test 1: Image Quality Analysis (The AI’s Eye Test):
- The Dragon looks beyond superficial aesthetics. It assesses images for key qualities that affect an AI’s ability to extract meaningful features:
- Blur Score (Laplacian Variance): A low score means a blurry image, hard for an AI to define edges.
- Noise Score (Standard Deviation): High noise means distracting visual static for the AI.
- Brightness & Contrast: Too dark, too bright, or too flat makes it difficult for an AI to discern details.
- Crucially, it also identifies images that are statistical outliers within your own dataset’s overall visual characteristics. If an image is wildly different from the rest of your set, it might confuse the AI during training.
- RAG Rating: Based on dynamic thresholds derived from your dataset’s own statistics, images are rated Green (good), Amber (caution), or Red (problematic).
- The Dragon looks beyond superficial aesthetics. It assesses images for key qualities that affect an AI’s ability to extract meaningful features:
-
Test 2: Caption Accuracy Analysis (Does the AI Agree with You?):
- This is a clever one! The tool leverages a powerful Vision-Language Model (Qwen2-VL) to generate its own detailed description of your image.
- Then, using a Sentence Transformer model (all-MiniLM-L6-v2), it calculates the semantic similarity between your provided caption and the Qwen2-VL generated description.
- Why? Because if your caption doesn’t semantically align with what an AI (Qwen2-VL in this case) ‘sees’ in the image, your training model might learn misleading associations.
- RAG Rating: A high similarity score gets a Green, lower scores get Amber or Red.
- Diagnostic Export: For the curious, a
qwen2vl_diagnostic_analysis.txt
file is generated in your dataset folder, showing the detailed Qwen2-VL output and similarity scores, helping you understand why a caption might have been flagged!
-
Test 3: Face Consistency Analysis (Keeping Those Characters Consistent!):
- If you’re training a model to generate consistent characters, this is a lifesaver. The tool uses deepface (specifically the Facenet model) to generate an embedding of any detected face in the image.
- It then compares this face embedding to the embedding of your provided “Reference Image.”
- Why? To ensure that if you’re trying to train for a specific character, all the faces in your dataset belong to that character, preventing the AI from learning multiple identities.
- RAG Rating: Based on cosine similarity, indicating how close the detected face is to your reference.
-
The ‘Hold’ Folder Triage:
- Images flagged as problematic (especially Red) are moved to a
hold
folder. This quarantines them. - Rescue & Re-Analyze: Sometimes, these ‘held’ images can be rescued (e.g., by upscaling, de-noising, or minor edits). You can then feed them back into the Validator for another round of analysis to ensure the fix actually made the AI happy.
- Final Decision: If an image still fails the AI’s “sniff test” after rescue attempts, it’s best to pull it out. Even if your human eyes can’t quite pinpoint why, the AI’s metrics are telling you it’s not good training material.
- Images flagged as problematic (especially Red) are moved to a
This entire workflow means you can blast through thousands of images in minutes, confidently removing the ‘crap’ that would otherwise derail your precious AI training runs. Happy training!