A comprehensive Python project for detecting UI elements in mobile applications using YOLOv8 and Cascade R-CNN. Provides bounding box detection and privacy protection through automatic blurring. Supports the VNIS dataset.
- YOLOv8: Fast, accurate object detection trained on VNIS dataset (21 UI element classes)
- Cascade R-CNN: High-accuracy detector with multi-stage refinement
- Unified Interface: Single command to use either model
- UI element detection with bounding boxes
- VNIS dataset preparation (Pascal VOC XML β YOLO/COCO format)
- Privacy protection with automatic blurring (Gaussian, Median, Pixelate)
- Multi-category support (Android, iPhone, Rico, Wireframes, Uplabs)
- Customizable element filtering by type
- Model comparison and evaluation
# Install dependencies
pip install -r requirements.txt
# Note: For Detectron2 (Cascade R-CNN), you may need to install from source:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
The easiest way to use any model:
# YOLOv8 detection
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg
# Cascade R-CNN detection
python src/detect.py --model cascade_rcnn --weights runs/cascade_rcnn/model_final.pth --num_classes 21 --source image.jpg
# Privacy protection with any model
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg --blur --blur_type pixelate
# Convert VNIS dataset to model-ready format
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data
# Process specific categories only
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data --categories Android iphone
YOLOv8 - Fast, Real-time Detection
python models/yolov8/train.py --data ./data/dataset.yaml --epochs 100 --model yolov8n.pt
# For better accuracy, use larger model
python models/yolov8/train.py --data ./data/dataset.yaml --model yolov8m.pt --epochs 150 --batch 32
python models/yolov8/detect.py \
--model ./runs/train/ui_detection/weights/best.pt \
--source /path/to/image.jpg \
--conf 0.25 --save_txt
# Use the unified interface (recommended)
python src/blur.py \
--model yolov8 \
--weights ./runs/train/ui_detection/weights/best.pt \
--source /path/to/image.jpg \
--classes EditText Text Modal \
--blur_type pixelate --blur_strength 71
Cascade R-CNN - High-Accuracy Detection
# First, convert YOLO dataset to COCO format (see models/common/yolo_to_coco.py)
python models/cascade_rcnn/train.py \
--data_dir ./data_coco \
--num_classes 21 \
--max_iter 10000 \
--batch_size 2
python models/cascade_rcnn/detect.py \
--model runs/cascade_rcnn/model_final.pth \
--source image.jpg \
--num_classes 21 \
--conf 0.5
# Use the unified interface (recommended)
python src/blur.py \
--model cascade_rcnn \
--weights runs/cascade_rcnn/model_final.pth \
--source image.jpg \
--num_classes 21 \
--blur_type median
ui-element-privacy/
βββ src/ # π― Unified interfaces for all models
β βββ blur.py # Privacy protection interface
β βββ detect.py # Detection interface
β
βββ models/ # Model-specific implementations
β βββ common/ # Shared utilities
β β βββ detection_format.py # Standardized detection format
β β βββ blur_utils.py # Common blur utilities
β β βββ sensitive_classes.py # Privacy profile system
β β βββ prepare_dataset.py # VNIS β YOLO/COCO format conversion
β β
β βββ yolov8/ # YOLOv8 implementation
β β βββ train.py # Training script
β β βββ detect.py # Model-specific detection
β β
β βββ cascade_rcnn/ # Cascade R-CNN implementation
β βββ train.py # Training script
β βββ detect.py # Model-specific detection
β
βββ config/ # Configuration files
β βββ sensitive_classes.yaml # Privacy profiles
β βββ ui_elements.yaml # UI element definitions
β βββ README.md # Config documentation
β
βββ data/ # Processed dataset
β βββ images/
β βββ labels/
β βββ dataset.yaml
β
βββ runs/ # Training outputs
β βββ train/ # YOLOv8 training
β βββ cascade_rcnn/ # Cascade R-CNN training
β
βββ output/ # Inference outputs
β βββ yolov8_privacy_protected/
β βββ cascade_rcnn_privacy_protected/
β
βββ requirements.txt # Dependencies
βββ README.md # This file
βββ .gitignore
Feature | YOLOv8 | Cascade R-CNN |
---|---|---|
Speed | β‘ Very Fast (~20-50ms) | π Slow (~200-500ms) |
Accuracy | ββββ High | βββββ Highest |
Training Required | β Yes | β Yes |
Custom Classes | β Full (21 VNIS classes) | β Full (21 VNIS classes) |
GPU Memory | π Low (2-4 GB) | π΄ High (8-16 GB) |
Training Time | β‘ Fast (1-2 hours) | π Slow (4-8 hours) |
Best For | Production & Real-time | Maximum Accuracy & Research |
- Use YOLOv8 when: You need real-time performance, production deployment, or have limited GPU resources (2-4GB)
- Use Cascade R-CNN when: Maximum accuracy is critical, working on research, or have substantial compute resources (8-16GB GPU)
YOLOv8:
- Start with
yolov8n.pt
for experiments, scale up for production - Use
--patience 50
for early stopping - Monitor training:
tensorboard --logdir runs/train
Cascade R-CNN:
- Requires significant GPU memory (8GB+ recommended)
- Use smaller batch sizes (2-4)
- Training takes 2-3x longer than YOLOv8
- Verify before deployment: Use
--draw_boxes
to check what's being blurred - Adjust blur strength: Higher values (71-101) for sensitive content
- Selective blurring: Use
--blur_classes
to blur only specific elements - Blur types:
gaussian
: Smooth, natural blurmedian
: Good for removing text while preserving edgespixelate
: Stylized, retro effect
Optimize your trained model for mobile deployment with INT8 quantization:
# Optimize for all platforms (Android, iOS, cross-platform)
python models/yolov8/optimize_for_mobile.py \
--model runs/train/best.pt \
--optimize all
# Android only (TensorFlow Lite)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize tflite
# iOS only (CoreML)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize coreml
Results: Reduce model size by ~75% (6 MB β 1.5 MB) with INT8 quantization!
π Full Guide: See MOBILE_OPTIMIZATION.md for detailed instructions on:
- INT8/FP16 quantization
- Platform-specific deployment (Android/iOS)
- Integration examples (Kotlin, Swift, React Native)
- Performance benchmarking
# YOLO to COCO (for Cascade R-CNN)
python models/common/yolo_to_coco.py --input_dir ./data --output_dir ./data_coco
# Process entire directory
python src/detect.py --model yolov8 --weights runs/train/best.pt --source ./images/ --blur
# YOLOv8 with custom config
python models/yolov8/train.py \
--data ./data/dataset.yaml \
--model yolov8l.pt \
--epochs 300 \
--batch 16 \
--imgsz 800 \
--optimizer AdamW \
--lr0 0.001
Issue: ScreenAI/OmniParser model not found
- Solution: These models may require HuggingFace authentication or may not be publicly available
- Alternative: Use
microsoft/pix2struct-screen2words-large
Issue: Detectron2 installation fails
- Solution: Install from source:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
Issue: CUDA out of memory
- Solution: Reduce batch size, use smaller model, or enable gradient checkpointing
Issue: Poor detection results
- Solution:
- Train longer with more data
- Use larger model variant
- Adjust confidence threshold
- Check dataset quality
After preparation, you'll see UI element classes including:
- Navigation: UpperTaskBar, LowerTaskBar, NavigationBar
- Interactive: Button, TextButton, Switch, Slider, CheckBox, RadioButton
- Display: Text, Image, Icon, Modal
- Input: Input, TextInput, SearchBar
If you use this project, please cite:
@misc{ui_detection_privacy,
title={Mobile UI Element Detection for Privacy Protection},
author={Your Name},
year={2024},
howpublished={\url{https://github.com/mujacica/ui-element-privacy}}
}
Datasets and Models:
- VNIS Dataset: https://github.com/kevinwu/vins
- YOLOv8: https://github.com/ultralytics/ultralytics
- Detectron2: https://github.com/facebookresearch/detectron2
- ScreenAI: https://huggingface.co/google/screenai-1.0-ui
This project is for research and educational purposes. Please respect the licenses of:
- VNIS Dataset
- YOLOv8 (AGPL-3.0)
- Detectron2 (Apache-2.0)
- ScreenAI (Check HuggingFace model card)
- OmniParser (Check HuggingFace model card)
Contributions welcome! Areas for improvement:
- Enhanced response parsing for language models
- Additional blur/anonymization effects
- Fine-tuning guides for each model
- Benchmark suite for model comparison
- Web UI for inference
- Mobile deployment guides
- Video processing support
Star β this repo if you find it useful!