Skip to content

mujacica/ui-element-privacy

Repository files navigation

Mobile UI Element Detection for Privacy Protection

A comprehensive Python project for detecting UI elements in mobile applications using YOLOv8 and Cascade R-CNN. Provides bounding box detection and privacy protection through automatic blurring. Supports the VNIS dataset.

🎯 Features

Detection Models

  • YOLOv8: Fast, accurate object detection trained on VNIS dataset (21 UI element classes)
  • Cascade R-CNN: High-accuracy detector with multi-stage refinement
  • Unified Interface: Single command to use either model

Capabilities

  • UI element detection with bounding boxes
  • VNIS dataset preparation (Pascal VOC XML β†’ YOLO/COCO format)
  • Privacy protection with automatic blurring (Gaussian, Median, Pixelate)
  • Multi-category support (Android, iPhone, Rico, Wireframes, Uplabs)
  • Customizable element filtering by type
  • Model comparison and evaluation

πŸ“¦ Installation

# Install dependencies
pip install -r requirements.txt

# Note: For Detectron2 (Cascade R-CNN), you may need to install from source:
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

πŸš€ Quick Start

Unified Interface (Recommended)

The easiest way to use any model:

# YOLOv8 detection
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg

# Cascade R-CNN detection
python src/detect.py --model cascade_rcnn --weights runs/cascade_rcnn/model_final.pth --num_classes 21 --source image.jpg

# Privacy protection with any model
python src/detect.py --model yolov8 --weights runs/train/best.pt --source image.jpg --blur --blur_type pixelate

Dataset Preparation

# Convert VNIS dataset to model-ready format
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data

# Process specific categories only
python models/common/prepare_dataset.py --input_dir ./vins --output_dir ./data --categories Android iphone

πŸ“– Model-Specific Usage

YOLOv8 - Fast, Real-time Detection

Training

python models/yolov8/train.py --data ./data/dataset.yaml --epochs 100 --model yolov8n.pt

# For better accuracy, use larger model
python models/yolov8/train.py --data ./data/dataset.yaml --model yolov8m.pt --epochs 150 --batch 32

Detection

python models/yolov8/detect.py \
  --model ./runs/train/ui_detection/weights/best.pt \
  --source /path/to/image.jpg \
  --conf 0.25 --save_txt

Privacy Protection

# Use the unified interface (recommended)
python src/blur.py \
  --model yolov8 \
  --weights ./runs/train/ui_detection/weights/best.pt \
  --source /path/to/image.jpg \
  --classes EditText Text Modal \
  --blur_type pixelate --blur_strength 71
Cascade R-CNN - High-Accuracy Detection

Training

# First, convert YOLO dataset to COCO format (see models/common/yolo_to_coco.py)

python models/cascade_rcnn/train.py \
  --data_dir ./data_coco \
  --num_classes 21 \
  --max_iter 10000 \
  --batch_size 2

Detection

python models/cascade_rcnn/detect.py \
  --model runs/cascade_rcnn/model_final.pth \
  --source image.jpg \
  --num_classes 21 \
  --conf 0.5

Privacy Protection

# Use the unified interface (recommended)
python src/blur.py \
  --model cascade_rcnn \
  --weights runs/cascade_rcnn/model_final.pth \
  --source image.jpg \
  --num_classes 21 \
  --blur_type median

πŸ“ Project Structure

ui-element-privacy/
β”œβ”€β”€ src/                         # 🎯 Unified interfaces for all models
β”‚   β”œβ”€β”€ blur.py                  # Privacy protection interface
β”‚   └── detect.py                # Detection interface
β”‚
β”œβ”€β”€ models/                      # Model-specific implementations
β”‚   β”œβ”€β”€ common/                  # Shared utilities
β”‚   β”‚   β”œβ”€β”€ detection_format.py      # Standardized detection format
β”‚   β”‚   β”œβ”€β”€ blur_utils.py            # Common blur utilities
β”‚   β”‚   β”œβ”€β”€ sensitive_classes.py     # Privacy profile system
β”‚   β”‚   └── prepare_dataset.py       # VNIS β†’ YOLO/COCO format conversion
β”‚   β”‚
β”‚   β”œβ”€β”€ yolov8/                  # YOLOv8 implementation
β”‚   β”‚   β”œβ”€β”€ train.py             # Training script
β”‚   β”‚   └── detect.py            # Model-specific detection
β”‚   β”‚
β”‚   └── cascade_rcnn/            # Cascade R-CNN implementation
β”‚       β”œβ”€β”€ train.py             # Training script
β”‚       └── detect.py            # Model-specific detection
β”‚
β”œβ”€β”€ config/                      # Configuration files
β”‚   β”œβ”€β”€ sensitive_classes.yaml   # Privacy profiles
β”‚   β”œβ”€β”€ ui_elements.yaml         # UI element definitions
β”‚   └── README.md                # Config documentation
β”‚
β”œβ”€β”€ data/                        # Processed dataset
β”‚   β”œβ”€β”€ images/
β”‚   β”œβ”€β”€ labels/
β”‚   └── dataset.yaml
β”‚
β”œβ”€β”€ runs/                        # Training outputs
β”‚   β”œβ”€β”€ train/                   # YOLOv8 training
β”‚   └── cascade_rcnn/            # Cascade R-CNN training
β”‚
β”œβ”€β”€ output/                      # Inference outputs
β”‚   β”œβ”€β”€ yolov8_privacy_protected/
β”‚   └── cascade_rcnn_privacy_protected/
β”‚
β”œβ”€β”€ requirements.txt             # Dependencies
β”œβ”€β”€ README.md                    # This file
└── .gitignore

πŸ”„ Model Comparison

Performance Comparison

Feature YOLOv8 Cascade R-CNN
Speed ⚑ Very Fast (~20-50ms) 🐌 Slow (~200-500ms)
Accuracy ⭐⭐⭐⭐ High ⭐⭐⭐⭐⭐ Highest
Training Required βœ… Yes βœ… Yes
Custom Classes βœ… Full (21 VNIS classes) βœ… Full (21 VNIS classes)
GPU Memory πŸ’š Low (2-4 GB) πŸ”΄ High (8-16 GB)
Training Time ⚑ Fast (1-2 hours) 🐌 Slow (4-8 hours)
Best For Production & Real-time Maximum Accuracy & Research

πŸ’‘ Tips and Best Practices

Model Selection Guide

  • Use YOLOv8 when: You need real-time performance, production deployment, or have limited GPU resources (2-4GB)
  • Use Cascade R-CNN when: Maximum accuracy is critical, working on research, or have substantial compute resources (8-16GB GPU)

Training Tips

YOLOv8:

  • Start with yolov8n.pt for experiments, scale up for production
  • Use --patience 50 for early stopping
  • Monitor training: tensorboard --logdir runs/train

Cascade R-CNN:

  • Requires significant GPU memory (8GB+ recommended)
  • Use smaller batch sizes (2-4)
  • Training takes 2-3x longer than YOLOv8

Privacy Protection Tips

  1. Verify before deployment: Use --draw_boxes to check what's being blurred
  2. Adjust blur strength: Higher values (71-101) for sensitive content
  3. Selective blurring: Use --blur_classes to blur only specific elements
  4. Blur types:
    • gaussian: Smooth, natural blur
    • median: Good for removing text while preserving edges
    • pixelate: Stylized, retro effect

πŸ”§ Advanced Usage

πŸ“± Mobile Optimization (NEW!)

Optimize your trained model for mobile deployment with INT8 quantization:

# Optimize for all platforms (Android, iOS, cross-platform)
python models/yolov8/optimize_for_mobile.py \
  --model runs/train/best.pt \
  --optimize all

# Android only (TensorFlow Lite)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize tflite

# iOS only (CoreML)
python models/yolov8/optimize_for_mobile.py --model runs/train/best.pt --optimize coreml

Results: Reduce model size by ~75% (6 MB β†’ 1.5 MB) with INT8 quantization!

πŸ“– Full Guide: See MOBILE_OPTIMIZATION.md for detailed instructions on:

  • INT8/FP16 quantization
  • Platform-specific deployment (Android/iOS)
  • Integration examples (Kotlin, Swift, React Native)
  • Performance benchmarking

Converting Dataset Formats

# YOLO to COCO (for Cascade R-CNN)
python models/common/yolo_to_coco.py --input_dir ./data --output_dir ./data_coco

Batch Processing

# Process entire directory
python src/detect.py --model yolov8 --weights runs/train/best.pt --source ./images/ --blur

Custom Model Training

# YOLOv8 with custom config
python models/yolov8/train.py \
  --data ./data/dataset.yaml \
  --model yolov8l.pt \
  --epochs 300 \
  --batch 16 \
  --imgsz 800 \
  --optimizer AdamW \
  --lr0 0.001

πŸ› Troubleshooting

Issue: ScreenAI/OmniParser model not found

  • Solution: These models may require HuggingFace authentication or may not be publicly available
  • Alternative: Use microsoft/pix2struct-screen2words-large

Issue: Detectron2 installation fails

  • Solution: Install from source:
    python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Issue: CUDA out of memory

  • Solution: Reduce batch size, use smaller model, or enable gradient checkpointing

Issue: Poor detection results

  • Solution:
    • Train longer with more data
    • Use larger model variant
    • Adjust confidence threshold
    • Check dataset quality

πŸ“š VNIS Dataset Classes

After preparation, you'll see UI element classes including:

  • Navigation: UpperTaskBar, LowerTaskBar, NavigationBar
  • Interactive: Button, TextButton, Switch, Slider, CheckBox, RadioButton
  • Display: Text, Image, Icon, Modal
  • Input: Input, TextInput, SearchBar

πŸŽ“ Citation

If you use this project, please cite:

@misc{ui_detection_privacy,
  title={Mobile UI Element Detection for Privacy Protection},
  author={Your Name},
  year={2024},
  howpublished={\url{https://github.com/mujacica/ui-element-privacy}}
}

Datasets and Models:

πŸ“„ License

This project is for research and educational purposes. Please respect the licenses of:

  • VNIS Dataset
  • YOLOv8 (AGPL-3.0)
  • Detectron2 (Apache-2.0)
  • ScreenAI (Check HuggingFace model card)
  • OmniParser (Check HuggingFace model card)

🀝 Contributing

Contributions welcome! Areas for improvement:

  • Enhanced response parsing for language models
  • Additional blur/anonymization effects
  • Fine-tuning guides for each model
  • Benchmark suite for model comparison
  • Web UI for inference
  • Mobile deployment guides
  • Video processing support

πŸ”— Related Projects


Star ⭐ this repo if you find it useful!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages