Skip to content

Suhaila-Hassan/Traffic-Scene-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Traffic-Analysis

A comprehensive Gradio application that combines Vision-Language Models (VLLMs) with computer vision techniques to analyze traffic scenes and detect license plates with high accuracy.

Overview

This application integrates multiple state-of-the-art AI models to provide:

  1. Traffic Scene Description - Using LLaVA-NeXT for comprehensive scene understanding
  2. License Plate Detection - Using YOLOv11 for accurate plate localization
  3. Text Extraction - Using PaddleOCR with advanced preprocessing for text recognition
  4. Structured Output - JSON format combining all analysis results

Features

Core Capabilities

  • πŸ” Multi-modal Analysis: Combines vision and language understanding
  • 🎯 Accurate Detection: YOLOv11-based license plate detection
  • πŸ“ Robust OCR: PaddleOCR with preprocessing and confidence filtering
  • βš™οΈ Parameter Control: Adjustable thresholds and generation parameters
  • πŸš€ Optimized Performance: Memory-efficient model loading with quantization
  • πŸ“Š Structured Output: Task-compliant JSON format

Adjustable Parameters

  • YOLO Confidence Threshold (0.1-1.0): Controls detection sensitivity
  • OCR Confidence Threshold (0.0-1.0): Filters low-quality text recognition
  • VLLM Temperature (0.1-1.0): Controls creativity/randomness in descriptions
  • VLLM Top-p (0.1-1.0): Controls diversity vs focus in language generation

Setup Instructions

Environment Requirements

  • Platform: Kaggle
  • Python: 3.8+
  • CUDA: Compatible GPU for optimal performance (Kaggle typically provides a T4 or P100 GPU)

Step 1: Install Dependencies

# Install packages
!pip install gradio ultralytics paddlepaddle paddleocr transformers torch torchvision accelerate bitsandbytes opencv-python pillow numpy

Step 2: Upload Model Files

Upload custom YOLOv11 model trained for license plate detection

  • /kaggle/input/license_plate_detect_yolo11/pytorch/default/1/best.pt

Kaggle model link: https://www.kaggle.com/models/suhailaaboubakr/license_plate_detect_yolo11/

Model Architecture

1. Scene Understanding: LLaVA-NeXT

  • Model: llava-hf/llava-v1.6-mistral-7b-hf
  • Quantization: 4-bit with BitsAndBytesConfig
  • Purpose: Generate comprehensive traffic scene descriptions
  • Optimization: Memory-efficient loading with device mapping

2. License Plate Detection: YOLOv11

  • Model: Custom trained best.pt
  • Input Size: 640x640 optimized
  • Purpose: Detect and localize license plates in images
  • Output: Bounding boxes with confidence scores

3. Text Recognition: PaddleOCR

  • Language: English optimized
  • Features: Angle classification, text detection & recognition
  • Preprocessing: CLAHE enhancement, noise reduction, thresholding
  • Purpose: Extract text from detected license plate regions

πŸ“– Usage Examples

Basic Usage

  1. Upload Image: Select a traffic scene image
  2. Adjust Parameters (optional):
    • YOLO Confidence: 0.5 (default)
    • OCR Confidence: 0.1 (default)
    • Temperature: 0.7 (default)
    • Top-p: 0.9 (default)
  3. Click Submit: Process the image
  4. Review Results: Scene description, plate details, and JSON output

Sample Test

Test Image 1

test_image1

Test Image 2

test_image2

Test Image 3

test_image3

πŸŽ›οΈ Parameter Tuning Guide

YOLO Confidence Threshold

  • Low (0.1-0.3): Detects more plates, higher false positive rate
  • Medium (0.4-0.6): Balanced detection, recommended for most cases
  • High (0.7-1.0): Only high-confidence detections, may miss some plates

OCR Confidence Threshold

  • Very Low (0.0-0.1): Accept all OCR results, may include noise
  • Low (0.1-0.3): Accept most readable text, some false readings
  • Medium (0.3-0.6): Good balance of accuracy and recall
  • High (0.6-1.0): Only high-quality text, may miss valid plates

VLLM Temperature

  • Low (0.1-0.3): More focused, factual descriptions
  • Medium (0.4-0.6): Balanced creativity and accuracy
  • High (0.7-1.0): More creative, potentially less accurate

VLLM Top-p

  • Low (0.1-0.5): Conservative vocabulary, more predictable
  • Medium (0.6-0.8): Balanced diversity
  • High (0.9-1.0): Maximum vocabulary diversity

πŸ“Š JSON Output Format

The application outputs a structured JSON following the task specifications:

{
  "scene_description": "",
  "total_plates_detected": 1,
  "license_plates": [
    {
      "bbox": [98, 82, 249, 140],
      "detection_confidence": 0.5,
      "plate_text": "",
      "ocr_confidence": 0.9
    }
  ],
  "parameters_used": {
    "yolo_confidence_threshold": 0.5,
    "ocr_confidence_threshold": 0.5,
    "vllm_temperature": 0.7,
    "vllm_top_p": 0.5
  }
}

Prompt Engineering Rationale

Scene Description Prompt Design

The VLLM prompt is carefully crafted to extract maximum traffic-relevant information:

Analyze this traffic scene in detail. Describe:
1. Types of vehicles present (cars, trucks, motorcycles, etc.)
2. Traffic signs, signals, and road markings visible
3. Road conditions and infrastructure
4. Weather and lighting conditions
5. Overall traffic flow and density
6. Any notable safety considerations or hazards

Rationale:

  • Structured approach: Numbered points ensure comprehensive coverage
  • Traffic-focused: Specifically targets transportation elements
  • Safety-oriented: Includes hazard identification
  • Detailed yet concise: Balances thoroughness with readability

Parameter Choices

Default Temperature (0.7):

  • Balances factual accuracy with descriptive richness
  • Avoids overly repetitive descriptions
  • Maintains focus on observable elements

Default Top-p (0.9):

  • Allows diverse vocabulary while maintaining coherence
  • Prevents overly conservative language choices
  • Enables detailed technical descriptions

Performance Optimization

  1. Memory Management:

    • Use model quantization (4-bit enabled by default)
    • Clear GPU cache between runs if needed
    • Monitor VRAM usage
  2. Speed Optimization:

    • Resize large images before processing
    • Use appropriate batch sizes
    • Enable half-precision when supported
    • Load models before inference
  3. Accuracy Improvement:

    • Use high-quality input images
    • Adjust confidence thresholds based on use case
    • Consider image preprocessing for difficult lighting

Limitations

  • Performance depends on image quality and lighting
  • OCR accuracy varies with plate condition and angle
  • Complex scenes may require parameter adjustment
  • GPU memory limits maximum image resolution