A proof-of-concept (POC) OSINT (Open Source Intelligence) video analysis system built with LangGraph. This implementation demonstrates how deep research agents can be used for automated video analysis, including face detection, object recognition, image matching, sentiment analysis, and cross-video correlation for intelligence gathering use cases.
This project serves as a demonstration of:
- Multi-agent video analysis workflows using LangGraph's agent orchestration
- Specialized sub-agents for domain-specific analysis (faces, objects, content)
- Integration patterns for combining computer vision, AI, and intelligence analysis
- OSINT methodologies applied to video content through automated agents
- Scalable architecture for processing and analyzing large video collections
The codebase is designed to showcase techniques and patterns rather than provide a production-ready system.
What This POC Demonstrates:
- Multi-agent orchestration using LangGraph for complex video analysis workflows
- Integration patterns between AI services (Cloudglue), computer vision (OpenCV), and deep learning models
- Structured approach to OSINT video analysis with confidence scoring and evidence collection
- Extensible architecture for adding new analysis capabilities and specialized agents
What a Production System Would Additionally Need:
- Robust error handling and recovery mechanisms
- Scalable video processing infrastructure
- User authentication and access controls
- Advanced caching and optimization strategies
- Comprehensive logging and monitoring
- Data privacy and retention policies
- Performance optimization for large-scale deployments
An additional limitation not covered by this repo is the scraping infrastructure for collecting videos at scale from target public sources, but a POC for how to gather videos from social sources like TikTok (as shown in the Recon Village @ DEF CON 33 talk) is provided here: Gumloop Social Media Intelligence Listener / TikTok Scraper Template
- Face Detection & Matching: Identify and match faces against reference databases
- Object Detection: Zero-shot detection of any specified objects using state-of-the-art models
- Image Matching: RANSAC-based matching for buildings, logos, and visual elements
- Content Analysis: Sentiment analysis, temporal information extraction, and contextual intelligence
- Cross-Video Analysis: Compare and correlate findings across multiple videos
- Memory System: Persistent storage of investigation findings and notes
- Temporal Analysis: Extract time-of-day, weather, seasonal, and location indicators
- Evidence Management: Automatic confidence scoring and evidence documentation
- Specialized Sub-Agents: Domain experts for face analysis, visual analysis, and content analysis
- Coordinated Investigation: Systematic approach following OSINT best practices
- Comprehensive Reporting: Detailed intelligence reports with confidence assessments
- Python 3.12+
- FFmpeg (for video processing)
- Cloudglue API account
- Anthropic API key (for Claude)
-
Clone the repository
git clone https://github.com/your-username/autonomous-video-hunter.git cd autonomous-video-hunter
-
Install dependencies
pip install -r requirements-core.txt
-
Install system dependencies
macOS:
brew install ffmpeg
Ubuntu/Debian:
sudo apt update sudo apt install ffmpeg
-
Set up environment variables
cp .env.sample .env
Edit
.env
and add your API keys:CLOUDGLUE_API_KEY=your_cloudglue_api_key_here ANTHROPIC_API_KEY=your_anthropic_api_key_here VIDEO_CONTEXT_DB_PATH=db.jsonl
-
Cloudglue API: For video understanding and processing
- Sign up at cloudglue.dev
- Used for dense multimodal video description & transcription, scene analysis, and content extraction
-
Anthropic API: For the underlying Claude LLM
- Get your key from console.anthropic.com
- Powers the agent reasoning and analysis
-
LangSmith API (Optional): For monitoring and tracing
- Get your key from smith.langchain.com
- Enables detailed execution tracing and debugging
The system uses a JSONL file to store processed video context data. Each line contains a complete video analysis record including:
- Video metadata and Cloudglue URI
- Extracted frame thumbnails with timestamps
- AI-generated descriptions and summaries
- Boolean flags for content features (faces, logos, speech, etc.)
- Investigation memories and notes
Configure the database path with:
export VIDEO_CONTEXT_DB_PATH="/path/to/your/video-database.jsonl"
If not set, defaults to db.jsonl
in the current directory.
Creating a Video Database:
from video_processor import VideoProcessor
processor = VideoProcessor(
media_dir="./media",
db_path="./my-videos.jsonl",
api_key="your-cloudglue-key"
)
# Process a video and save to database
result = processor.process("/path/to/video.mp4", save_to_db=True)
Start the LangGraph development server:
langgraph dev
This will:
- Start the development server on
http://localhost:8123
- Provide a web interface for interacting with the video analysis agent
- Enable hot reloading during development
- Offer debugging and monitoring capabilities
The system is designed around investigation workflows:
- Start an Investigation: Define your investigation question or target
- Load Video Context: Videos should be pre-processed and available in the context database
- Run Analysis: The agent will coordinate specialized sub-agents for comprehensive analysis
- Review Results: Get detailed intelligence reports with confidence assessments
# Example of what the agent can investigate:
# - "Analyze this protest video for key participants"
# - "Find all Starbucks logos across these surveillance videos"
# - "Identify occurences of this person <path to face image> in these social media videos"
# - "Extract temporal and location clues from these outdoor videos"
Before analysis, videos need to be processed:
- Video Upload: Upload video to Cloudglue
- Frame Extraction: Extract uniform thumbnail samples
- Content Analysis: Run Cloudglue analysis for transcription and scene understanding
- Database Storage: Store results in JSONL context database
detect_faces_in_video()
: Match faces against reference databases- Confidence scoring and temporal tracking
- Automatic face cropping and storage
detect_objects_in_video()
: Zero-shot object detectionmatch_images_in_video()
: RANSAC-based image matching- Building, logo, and landmark identification
analyze_video_sentiment()
: Emotional tone analysisextract_temporal_info()
: Time, weather, and location clues- Cross-video comparison and correlation
autonomous-video-hunter/
├── video_analysis_agent.py # Main LangGraph agent definition
├── video_analysis_tools.py # Core analysis tools and functions
├── video_context.py # Video context database management
├── video_processor.py # Video processing pipeline
├── video_understander.py # Cloudglue integration wrapper
├── video_helper.py # Video download and frame extraction
├── face_matcher.py # Face detection and matching
├── image_matcher.py # RANSAC-based image matching
├── zeroshot_detect.py # Zero-shot object detection
├── requirements.txt # Python dependencies
├── langgraph.json # LangGraph configuration
└── .env.sample # Environment variables template
- Create new analysis functions in
video_analysis_tools.py
- Add tools to the agent in
video_analysis_agent.py
- Update sub-agents if needed for specialized analysis
- Test with
langgraph dev
The system uses specialized sub-agents:
- face-analysis-agent: Face detection and people identification
- visual-analysis-agent: Object, logo, and visual scene analysis
- content-analysis-agent: Sentiment, temporal, and contextual analysis
Video contexts are stored as JSONL with this schema:
{
"local_video_path": "/path/to/video.mp4",
"cloudglue_uri": "cg://...",
"local_frames": [...],
"description": "Markdown description",
"has_logo": true,
"logos": ["Starbucks"],
"has_face": true,
"has_speech": true,
"is_outdoors": false,
"has_text_on_screen": true,
"duration_seconds": 30.5
}
- API Keys: Never commit API keys to version control
- Video Data: Consider data retention policies for processed videos
- Privacy: Be aware of privacy implications when analyzing videos with people
- Compliance: Ensure compliance with local laws regarding video analysis
This tool is designed for legitimate intelligence gathering purposes. Users should:
- Respect privacy and consent
- Follow applicable laws and regulations
- Use responsibly for security, journalism, or research purposes
- Consider the ethical implications of automated video analysis
-
FFmpeg not found
- Install FFmpeg using your system package manager
- Ensure it's in your PATH
-
Cloudglue API errors
- Verify your API key is correct
- Check your account has sufficient credits
- Ensure network connectivity
-
Face detection issues
- Install required face recognition models
- Check reference image quality
- Verify image paths are correct
-
Memory issues with large videos
- Consider video compression before processing
- Adjust frame extraction parameters
- Monitor system resources
Use LangSmith for detailed execution tracing:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=your_langsmith_key
langgraph dev
- Built with LangGraph
- Video analysis powered by Cloudglue
- Face recognition using DeepFace
- Object detection using Transformers
- Image matching using OpenCV RANSAC algorithms