VideoAnnotator

A modern, research-focused toolkit for comprehensive video analysis of human interactions. Built with simplified schemas, standards-based pipelines, and seamless annotation tool integration.

✨ Key Features

🏗️ Modern Architecture

YOLO11-powered person tracking and scene detection
Open-source models for face analysis and audio processing
Simplified JSON schemas for maximum interoperability
Test-driven development with 94% success rate

🎯 Research Ready

Annotation tool integration: Direct export to CVAT, LabelStudio, ELAN
Flexible data formats: String/integer IDs, extensible schemas
Batch processing: Efficient multi-video workflows
Reproducible outputs: Version-controlled processing

🚀 Production Scalable

GPU acceleration for compute-intensive pipelines
Configurable processing via YAML configs
Docker support for containerized deployment
Cross-platform Windows/macOS/Linux compatibility

Quick Start

# Clone and setup
git clone https://github.com/InfantLab/VideoAnnotator.git
cd VideoAnnotator
pip install -r requirements.txt

# Process a video
python -m videoannotator process video.mp4

# View results
ls output/video/  # JSON files ready for analysis

📖 Full Documentation | 🧪 Examples | 🔧 Installation Guide python main.py --input video.mp4 --config configs/high_performance.yaml

Batch process multiple videos

python main.py --input videos/ --batch --parallel 4

🧩 Pipeline Architecture

VideoAnnotator provides four core pipelines, each optimized for specific analysis tasks:

🎬 Scene Detection

Technology: PySceneDetect + CLIP classification
Purpose: Boundary detection and environment classification
Output: Scene segments with transition metadata

👥 Person Tracking

Technology: YOLO11 + ByteTrack
Purpose: Multi-person detection and tracking
Output: Normalized bounding boxes with persistent IDs

😊 Face Analysis

Technology: Multiple backends available:
- OpenFace 3.0 (recommended): Comprehensive facial behavior analysis
- LAION Face: CLIP-based face analysis and emotion detection
- OpenCV: Basic face detection with emotion analysis
Purpose: Face detection, landmark extraction, emotion recognition, action units, head pose, gaze estimation
Output: COCO format with facial landmarks, emotions, and behavioral features

🎤 Audio Processing

Technology: Whisper + pyannote.audio
Purpose: Speech recognition and speaker diarization
Output: Transcripts with speaker identification

🎭 OpenFace 3.0 Integration

VideoAnnotator now supports OpenFace 3.0 for comprehensive facial behavior analysis:

Features

68-point facial landmarks (2D and 3D)
Facial Action Units (AUs) intensity and presence detection
Head pose estimation (rotation and translation)
Gaze direction and eye tracking
Face tracking across video frames
COCO format output for annotation tool compatibility

Quick Setup

# 1. Install OpenFace 3.0 dependencies
python scripts/test_openface3.py

# 2. Process video with OpenFace 3.0
python main.py --config configs/openface3.yaml --video_path video.mp4

# 3. Results include comprehensive facial analysis
# - Facial landmarks in COCO keypoints format
# - Action unit intensities
# - Head pose angles
# - Gaze direction vectors

📖 OpenFace 3.0 Integration Guide

📊 Output Formats

All pipelines generate simple JSON arrays compatible with annotation tools:

[
  {
    "type": "person_bbox",
    "video_id": "example",
    "t": 12.34,
    "person_id": 1,
    "bbox": [0.2, 0.3, 0.4, 0.5],
    "confidence": 0.87
  }
]

✅ Key Benefits:

Tool Integration: Direct import to CVAT, LabelStudio, ELAN
Research Friendly: Simple formats for analysis and visualization
Extensible: Models can add custom fields seamlessly

🚀 Usage Examples

Python API

from videoannotator import VideoAnnotator

# Process all pipelines
annotator = VideoAnnotator()
results = annotator.process("video.mp4")

# Specific pipelines only  
results = annotator.process("video.mp4", pipelines=["person_tracking"])

# Custom configuration
annotator = VideoAnnotator(config="configs/high_performance.yaml")
results = annotator.process("video.mp4")

Command Line

# Single video processing
python -m videoannotator process video.mp4

# Batch processing
python -m videoannotator batch videos/ --output results/

# Specific pipeline
python -m videoannotator process video.mp4 --pipeline face_analysis

# Custom config
python -m videoannotator process video.mp4 --config configs/lightweight.yaml

Export to Annotation Tools

from videoannotator.exporters import CVATExporter, LabelStudioExporter

# Export to CVAT
CVATExporter().export(annotations, "cvat_project.json")

# Export to LabelStudio  
LabelStudioExporter().export(annotations, "labelstudio_tasks.json")

📁 Project Structure

VideoAnnotator/
├── src/
│   ├── pipelines/           # Core analysis pipelines
│   ├── schemas/             # JSON schemas & validation  
│   ├── exporters/           # Annotation tool exporters
│   └── utils/               # Shared utilities
├── tests/                   # Comprehensive test suite (94% success)
├── configs/                 # Pipeline configurations
├── examples/                # Usage examples and demos
├── docs/                    # Documentation
└── requirements.txt         # Dependencies

📚 Documentation

Document	Description
Installation Guide	Setup and dependencies
Pipeline Specs	Technical pipeline details
Output Formats	JSON schema documentation
Testing Standards	Test framework and practices
Configuration Guide	YAML configuration options

🧪 Quality Assurance

VideoAnnotator maintains high code quality through comprehensive testing:

# Run full test suite (94% success rate)
python -m pytest tests/ -v

# Test specific pipelines
python -m pytest tests/test_face_pipeline_modern.py -v

# Performance benchmarks
python -m pytest tests/ -m performance -v

# Test coverage analysis
python -m pytest tests/ --cov=src --cov-report=html

📊 Test Results:

✅ 67/71 tests passing (94% success rate)
✅ Zero code duplication after rationalization
✅ Modern test patterns across all pipelines
✅ Performance benchmarks for optimization

🤝 Contributing

Follow Standards: Use existing Testing Standards
Add Tests: Integrate into existing test files in tests/
Update Docs: Keep documentation current with changes
Quality Check: Ensure test suite maintains 90%+ success rate

📄 License & Acknowledgments

License: MIT - see LICENSE for details

Acknowledgments: Built on the shoulders of giants including YOLO, Whisper, PyTorch, and the open-source ML community. Special thanks to research communities advancing computer vision and audio processing.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.claude		.claude
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
configs		configs
docs		docs
examples		examples
pipelines		pipelines
scripts		scripts
serve		serve
src		src
tests		tests
weights		weights
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEMO_COMMANDS_IMPROVED.md		DEMO_COMMANDS_IMPROVED.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
TEST_IMPROVEMENT_PLAN.md		TEST_IMPROVEMENT_PLAN.md
batch_demo.py		batch_demo.py
constraints.txt		constraints.txt
demo.py		demo.py
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
findimports.py		findimports.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_openface.txt		requirements_openface.txt
validate_apis.py		validate_apis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VideoAnnotator

✨ Key Features

🏗️ Modern Architecture

🎯 Research Ready

🚀 Production Scalable

Quick Start

Batch process multiple videos

🧩 Pipeline Architecture

🎬 Scene Detection

👥 Person Tracking

😊 Face Analysis

🎤 Audio Processing

🎭 OpenFace 3.0 Integration

Features

Quick Setup

📊 Output Formats

🚀 Usage Examples

Python API

Command Line

Export to Annotation Tools

📁 Project Structure

📚 Documentation

🧪 Quality Assurance

🤝 Contributing

📄 License & Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

InfantLab/VideoAnnotator

Folders and files

Latest commit

History

Repository files navigation

VideoAnnotator

✨ Key Features

🏗️ Modern Architecture

🎯 Research Ready

🚀 Production Scalable

Quick Start

Batch process multiple videos

🧩 Pipeline Architecture

🎬 Scene Detection

👥 Person Tracking

😊 Face Analysis

🎤 Audio Processing

🎭 OpenFace 3.0 Integration

Features

Quick Setup

📊 Output Formats

🚀 Usage Examples

Python API

Command Line

Export to Annotation Tools

📁 Project Structure

📚 Documentation

🧪 Quality Assurance

🤝 Contributing

📄 License & Acknowledgments

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages