Mobile UI Element Detection for Privacy Protection

A comprehensive Python project for detecting UI elements in mobile applications using YOLOv8 and Cascade R-CNN. Provides bounding box detection and privacy protection through automatic blurring. Supports the VNIS dataset.

🎯 Features

Detection Models

YOLOv8: Fast, accurate object detection trained on VNIS dataset (21 UI element classes)
Cascade R-CNN: High-accuracy detector with multi-stage refinement
Unified Interface: Single command to use either model

Capabilities

UI element detection with bounding boxes
VNIS dataset preparation (Pascal VOC XML → YOLO/COCO format)
Privacy protection with automatic blurring (Gaussian, Median, Pixelate)
Multi-category support (Android, iPhone, Rico, Wireframes, Uplabs)
Customizable element filtering by type
Model comparison and evaluation

📦 Installation

# Install dependencies
pip install -r requirements.txt

# Note: For Detectron2 (Cascade R-CNN), you may need to install from source:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

🚀 Quick Start

Unified Interface (Recommended)

The easiest way to use any model:

# YOLOv8 detection
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg

# Cascade R-CNN detection
python src/detect.py --model cascade_rcnn --weights runs/cascade_rcnn/model_final.pth --num_classes 21 --source image.jpg

# Privacy protection with any model
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg --blur --blur_type pixelate

Dataset Preparation

# Convert VNIS dataset to model-ready format
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data

# Process specific categories only
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data --categories Android iphone

📖 Model-Specific Usage

YOLOv8 - Fast, Real-time Detection

Training

python models/yolov8/train.py --data ./data/dataset.yaml --epochs 100 --model yolov8n.pt

# For better accuracy, use larger model
python models/yolov8/train.py --data ./data/dataset.yaml --model yolov8m.pt --epochs 150 --batch 32

Detection

python models/yolov8/detect.py \
  --model ./runs/train/ui_detection/weights/best.pt \
  --source /path/to/image.jpg \
  --conf 0.25 --save_txt

Privacy Protection

# Use the unified interface (recommended)
python src/blur.py \
  --model yolov8 \
  --weights ./runs/train/ui_detection/weights/best.pt \
  --source /path/to/image.jpg \
  --classes EditText Text Modal \
  --blur_type pixelate --blur_strength 71

Cascade R-CNN - High-Accuracy Detection

Training

# First, convert YOLO dataset to COCO format (see models/common/yolo_to_coco.py)

python models/cascade_rcnn/train.py \
  --data_dir ./data_coco \
  --num_classes 21 \
  --max_iter 10000 \
  --batch_size 2

Detection

python models/cascade_rcnn/detect.py \
  --model runs/cascade_rcnn/model_final.pth \
  --source image.jpg \
  --num_classes 21 \
  --conf 0.5

Privacy Protection

# Use the unified interface (recommended)
python src/blur.py \
  --model cascade_rcnn \
  --weights runs/cascade_rcnn/model_final.pth \
  --source image.jpg \
  --num_classes 21 \
  --blur_type median

📁 Project Structure

ui-element-privacy/
├── src/                         # 🎯 Unified interfaces for all models
│   ├── blur.py                  # Privacy protection interface
│   └── detect.py                # Detection interface
│
├── models/                      # Model-specific implementations
│   ├── common/                  # Shared utilities
│   │   ├── detection_format.py      # Standardized detection format
│   │   ├── blur_utils.py            # Common blur utilities
│   │   ├── sensitive_classes.py     # Privacy profile system
│   │   └── prepare_dataset.py       # VNIS → YOLO/COCO format conversion
│   │
│   ├── yolov8/                  # YOLOv8 implementation
│   │   ├── train.py             # Training script
│   │   └── detect.py            # Model-specific detection
│   │
│   └── cascade_rcnn/            # Cascade R-CNN implementation
│       ├── train.py             # Training script
│       └── detect.py            # Model-specific detection
│
├── config/                      # Configuration files
│   ├── sensitive_classes.yaml   # Privacy profiles
│   ├── ui_elements.yaml         # UI element definitions
│   └── README.md                # Config documentation
│
├── data/                        # Processed dataset
│   ├── images/
│   ├── labels/
│   └── dataset.yaml
│
├── runs/                        # Training outputs
│   ├── train/                   # YOLOv8 training
│   └── cascade_rcnn/            # Cascade R-CNN training
│
├── output/                      # Inference outputs
│   ├── yolov8_privacy_protected/
│   └── cascade_rcnn_privacy_protected/
│
├── requirements.txt             # Dependencies
├── README.md                    # This file
└── .gitignore

🔄 Model Comparison

Performance Comparison

Feature	YOLOv8	Cascade R-CNN
Speed	⚡ Very Fast (~20-50ms)	🐌 Slow (~200-500ms)
Accuracy	⭐⭐⭐⭐ High	⭐⭐⭐⭐⭐ Highest
Training Required	✅ Yes	✅ Yes
Custom Classes	✅ Full (21 VNIS classes)	✅ Full (21 VNIS classes)
GPU Memory	💚 Low (2-4 GB)	🔴 High (8-16 GB)
Training Time	⚡ Fast (1-2 hours)	🐌 Slow (4-8 hours)
Best For	Production & Real-time	Maximum Accuracy & Research

💡 Tips and Best Practices

Model Selection Guide

Use YOLOv8 when: You need real-time performance, production deployment, or have limited GPU resources (2-4GB)
Use Cascade R-CNN when: Maximum accuracy is critical, working on research, or have substantial compute resources (8-16GB GPU)

Training Tips

YOLOv8:

Start with yolov8n.pt for experiments, scale up for production
Use --patience 50 for early stopping
Monitor training: tensorboard --logdir runs/train

Cascade R-CNN:

Requires significant GPU memory (8GB+ recommended)
Use smaller batch sizes (2-4)
Training takes 2-3x longer than YOLOv8

Privacy Protection Tips

Verify before deployment: Use --draw_boxes to check what's being blurred
Adjust blur strength: Higher values (71-101) for sensitive content
Selective blurring: Use --blur_classes to blur only specific elements
Blur types:
- gaussian: Smooth, natural blur
- median: Good for removing text while preserving edges
- pixelate: Stylized, retro effect

🔧 Advanced Usage

📱 Mobile Optimization (NEW!)

Optimize your trained model for mobile deployment with INT8 quantization:

# Optimize for all platforms (Android, iOS, cross-platform)
python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize all

# Android only (TensorFlow Lite)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize tflite

# iOS only (CoreML)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize coreml

Results: Reduce model size by ~75% (6 MB → 1.5 MB) with INT8 quantization!

📖 Full Guide: See MOBILE_OPTIMIZATION.md for detailed instructions on:

INT8/FP16 quantization
Platform-specific deployment (Android/iOS)
Integration examples (Kotlin, Swift, React Native)
Performance benchmarking

Converting Dataset Formats

# YOLO to COCO (for Cascade R-CNN)
python models/common/yolo_to_coco.py --input_dir ./data --output_dir ./data_coco

Batch Processing

# Process entire directory
python src/detect.py --model yolov8 --weights runs/train/best.pt --source ./images/ --blur

Custom Model Training

# YOLOv8 with custom config
python models/yolov8/train.py \
  --data ./data/dataset.yaml \
  --model yolov8l.pt \
  --epochs 300 \
  --batch 16 \
  --imgsz 800 \
  --optimizer AdamW \
  --lr0 0.001

🐛 Troubleshooting

Issue: ScreenAI/OmniParser model not found

Solution: These models may require HuggingFace authentication or may not be publicly available
Alternative: Use microsoft/pix2struct-screen2words-large

Issue: Detectron2 installation fails

Solution: Install from source:

python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Issue: CUDA out of memory

Solution: Reduce batch size, use smaller model, or enable gradient checkpointing

Issue: Poor detection results

Solution:
- Train longer with more data
- Use larger model variant
- Adjust confidence threshold
- Check dataset quality

📚 VNIS Dataset Classes

After preparation, you'll see UI element classes including:

Navigation: UpperTaskBar, LowerTaskBar, NavigationBar
Interactive: Button, TextButton, Switch, Slider, CheckBox, RadioButton
Display: Text, Image, Icon, Modal
Input: Input, TextInput, SearchBar

🎓 Citation

If you use this project, please cite:

@misc{ui_detection_privacy,
  title={Mobile UI Element Detection for Privacy Protection},
  author={Your Name},
  year={2024},
  howpublished={\url{https://github.com/mujacica/ui-element-privacy}}
}

Datasets and Models:

VNIS Dataset: https://github.com/kevinwu/vins
YOLOv8: https://github.com/ultralytics/ultralytics
Detectron2: https://github.com/facebookresearch/detectron2
ScreenAI: https://huggingface.co/google/screenai-1.0-ui

📄 License

This project is for research and educational purposes. Please respect the licenses of:

VNIS Dataset
YOLOv8 (AGPL-3.0)
Detectron2 (Apache-2.0)
ScreenAI (Check HuggingFace model card)
OmniParser (Check HuggingFace model card)

🤝 Contributing

Contributions welcome! Areas for improvement:

Enhanced response parsing for language models
Additional blur/anonymization effects
Fine-tuning guides for each model
Benchmark suite for model comparison
Web UI for inference
Mobile deployment guides
Video processing support

🔗 Related Projects

Star ⭐ this repo if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
models		models
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
MOBILE_OPTIMIZATION.md		MOBILE_OPTIMIZATION.md
OPTIMIZATION_QUICK_REFERENCE.md		OPTIMIZATION_QUICK_REFERENCE.md
PRIVACY_PROFILES.md		PRIVACY_PROFILES.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICK_START.md		QUICK_START.md
README.md		README.md
UNIFIED_INTERFACE.md		UNIFIED_INTERFACE.md
requirements.txt		requirements.txt

mujacica/ui-element-privacy

Folders and files

Latest commit

History

Repository files navigation

Mobile UI Element Detection for Privacy Protection

🎯 Features

Detection Models

Capabilities

📦 Installation

🚀 Quick Start

Unified Interface (Recommended)

Dataset Preparation

📖 Model-Specific Usage

Training

Detection

Privacy Protection

Training

Detection

Privacy Protection

📁 Project Structure

🔄 Model Comparison

Performance Comparison

💡 Tips and Best Practices

Model Selection Guide

Training Tips

Privacy Protection Tips

🔧 Advanced Usage

📱 Mobile Optimization (NEW!)

Converting Dataset Formats

Batch Processing

Custom Model Training

🐛 Troubleshooting

📚 VNIS Dataset Classes

🎓 Citation

📄 License

🤝 Contributing

🔗 Related Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages