An intelligent video processing tool that analyzes CCTV footage using AI to detect people and generate descriptions of their activities. Creates hierarchical HTML viewers for easy browsing of security footage.
- Two-Stage Processing: First uses OpenCV for fast person detection, then applies LLM analysis only to videos containing people
- Smart Filtering: Skips videos with no human activity to save processing time
- AI Descriptions: Generates natural language descriptions of what people are doing in the footage
- HTML Generation: Creates browsable HTML interfaces organized by camera and date
- Hierarchical Structure: Supports camera/date/video folder structures (designed for MotionEye)
pip install -r requirements.txt
Note: PyTorch with CUDA support is recommended for faster processing. The requirements include CUDA 12.8 versions.
python cctv_llm_processor.py /path/to/video/directory
python cctv_llm_processor.py [OPTIONS] DIRECTORY
Options:
-r, --recursive Process subdirectories recursively
-o, --overwrite Overwrite existing .txt description files
--description-prompt Custom prompt for video descriptions
-h, --help Show help message
# Process MotionEye footage recursively
python cctv_llm_processor.py -r /root/motioneye/
# Use custom description prompt
python cctv_llm_processor.py --description-prompt "What security event is happening?" /path/to/videos/
# Overwrite existing descriptions
python cctv_llm_processor.py -o -r /path/to/videos/ --description-prompt "new prompt for ALL videos"
Designed for MotionEye-style hierarchies but works with any similar structure:
/root/camera/date/videos/
├── camera1/
│ ├── 2025-08-01/
│ │ ├── video1.mp4
│ │ ├── video1.txt # Generated descriptions
│ │ └── videos_with_people.html # Day viewer
│ ├── 2025-08-02/
│ └── index.html # Camera index
├── camera2/
└── index.html # Root index
- Generated alongside each video file
- Contains AI-generated description or "No people detected"
- Only videos with people get LLM analysis
- Day HTML: Lists all videos with people for a specific camera/date
- Camera Index: Overview of all days with activity for each camera
- Root Index: Top-level view of all cameras and their activity
Current Model: LanguageBind/Video-LLaVA-7B-hf
- The AI model may occasionally hallucinate or misinterpret scenes
- Descriptions should be verified for critical security applications
- The model can be easily swapped out as better alternatives become available
- Consider experimenting with different prompts for improved accuracy
- Fast Pre-filtering: OpenCV person detection runs quickly on all videos
- Selective LLM Processing: Only videos with detected people are sent to the LLM
- CUDA Acceleration: GPU processing significantly speeds up AI analysis
- Progress Tracking: Real-time progress bars show processing status
python cctv_llm_processor.py --description-prompt "Describe the security situation in detail" /path/to/videos/
The VideoProcessor class can be easily modified to use different models as the state of the art improves.
- Python 3.8+
- CUDA-capable GPU (recommended)
- Sufficient disk space for model downloads (~14GB for Video-LLaVA-7B)
- CUDA Issues: Ensure compatible PyTorch CUDA version is installed
- Memory Errors: Reduce batch size or use smaller model variants
- Slow Processing: Verify CUDA is working; CPU-only processing is much slower