A comprehensive audio-visual bird detection and classification system designed for academic research. This pipeline processes video recordings from stationary cameras to detect and identify bird species using parallel audio and video analysis.
- Parallel Processing: Simultaneous audio and video stream analysis
- Multi-modal Detection: Audio-only, video-only, and combined detections
- Hierarchical Confidence Scoring: Advanced confidence calculation based on detection type and species classification
- Species Identification: Fine-grained species classification using TransFG
- Interactive Web Interface: Real-time visualization and analysis tools
- Audio Detection: BirdNET integration for bird call detection
- Video Detection: YOLOv10 for bird object detection
- Species Classification: TransFG for fine-grained species identification
- Event Integration: Temporal correlation of audio and video detections
- Export Capabilities: JSON and CSV export with filtering options
Input Video (MP4/AVI/MOV)
├── Audio Stream → BirdNET → Audio Detections
├── Video Stream → YOLOv10 → Video Detections
└── Integration → Event Correlation → Species Classification → Results
- Media Preprocessing: Extract audio and video streams using FFmpeg
- Parallel Detection:
- Audio: Segment-based analysis with mel-spectrograms
- Video: Frame-based object detection
- Event Integration: Temporal correlation with configurable time windows
- Species Classification: Fine-grained classification for video detections
- Result Export: Structured data output with confidence metrics
- Python 3.11+
- FFmpeg
- CUDA (optional, for GPU acceleration)
# Clone the repository
git clone <repository-url>
cd tanbo_tori
# Create virtual environment
python -m venv claude-env
source claude-env/bin/activate # Linux/Mac
# or
claude-env\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Clone the repository
git clone <repository-url>
cd tanbo_tori
# Build and run with Docker Compose
docker-compose up -d
# Access the web interface at http://localhost
export BIRD_MONITOR_ENVIRONMENT=production
export BIRD_MONITOR_LOG_LEVEL=INFO
export BIRD_MONITOR_GPU_ENABLED=true
Edit config.yaml
to customize processing parameters:
# Detection thresholds
audio_confidence_threshold: 0.3
video_confidence_threshold: 0.5
species_confidence_threshold: 0.6
# Processing settings
audio_segment_length: 3.0
temporal_correlation_window: 2.0
# Process a single video
python -m src.cli process video.mp4 --output results/
# Process with custom thresholds
python -m src.cli process video.mp4 \
--audio-threshold 0.4 \
--video-threshold 0.6 \
--species-threshold 0.7
# Batch processing
python -m src.cli batch input_videos/ output_results/ --pattern "*.mp4"
# Check system status
python -m src.cli status
- Start the web application:
python -m src.web.app
-
Open your browser to
http://localhost:8000
-
Upload a video file and configure detection parameters
-
Monitor processing progress and view results
from src.core.pipeline import BirdMonitoringPipeline
from src.models.data_models import ProcessingConfig
# Initialize pipeline
pipeline = BirdMonitoringPipeline()
# Configure processing
config = ProcessingConfig(
input_video_path="path/to/video.mp4",
output_dir="results/",
audio_confidence_threshold=0.3,
video_confidence_threshold=0.5
)
# Process video
result = pipeline.process_video("video.mp4", "output/", config)
# Access results
print(f"Total events: {result.total_events}")
print(f"Species found: {result.unique_species}")
For testing and development without installing heavy ML dependencies (TensorFlow, OpenCV), use the minimal version:
TensorFlow doesn't fully support Python 3.13 yet. For best compatibility, use Python 3.11 or 3.12.
# Clone the repository
git clone https://github.com/atsuki-ichikawa/bird-monitoring-pipeline.git
cd bird-monitoring-pipeline
# Install minimal dependencies (works with Python 3.13)
pip install -r requirements-minimal.txt
pip install pydantic-settings
# Test with minimal version
python -m src.web.app_minimal
# Access: http://localhost:8000
# For Apple Silicon Macs
pip install tensorflow-macos==2.13.0
pip install tensorflow-metal==1.0.1
pip install opencv-python==4.8.1.78
# For Intel Macs
pip install tensorflow==2.13.0
pip install opencv-python==4.8.1.78
# Common dependencies
pip install torch torchvision torchaudio
pip install ultralytics librosa scikit-learn
# Run automated macOS setup
python setup_macos.py
# This will:
# - Check system requirements
# - Install Homebrew dependencies (ffmpeg, etc.)
# - Create virtual environment
# - Install appropriate ML dependencies for your Mac
# - Create launch scripts
# Clone the repository
git clone https://github.com/atsuki-ichikawa/bird-monitoring-pipeline.git
cd bird-monitoring-pipeline
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install minimal dependencies
pip install -r requirements-minimal.txt
# Check system status (minimal version)
python -m src.cli_minimal status
# Process a test video (mock processing)
python -m src.cli_minimal process test_video.mp4 --output results/
# Start web interface (minimal version)
python -m src.cli_minimal serve --port 8000
# Run minimal tests
pytest tests/test_minimal.py -v
# Method 1: Direct module execution
python -m src.web.app_minimal
# Method 2: Using uvicorn directly (alternative)
uvicorn src.web.app_minimal:app --host 0.0.0.0 --port 8000 --reload
# Method 3: Using CLI minimal serve command
python -m src.cli_minimal serve --port 8000
# Access at http://localhost:8000
# - API documentation: http://localhost:8000/docs
# - Health check: http://localhost:8000/health
# - System status: http://localhost:8000/api/status
- ✅ Data Models: Complete Pydantic schemas with validation
- ✅ Configuration System: Full configuration management
- ✅ Web API: All REST endpoints with mock responses
- ✅ CLI Interface: All commands with mock processing
- ✅ Testing Suite: Comprehensive unit tests
⚠️ Mock Processing: No actual ML inference (returns test data)
# Install full dependencies including ML libraries
pip install -r requirements.txt
# Use full versions
python -m src.cli process video.mp4 # Full CLI
python -m src.web.app # Full web app
- Download BirdNET model files
- Place in
models/birdnet/
- Update model path in configuration
# Download YOLOv10 weights
wget https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10n.pt
mv yolov10n.pt models/yolo/
- Download TransFG model weights
- Place in
models/transfg/
- Include species list JSON file
{
"event_id": "evt_001",
"start_time": 10.5,
"end_time": 13.5,
"detection_type": "audio_video",
"initial_confidence": 0.7,
"final_confidence": 0.916,
"species_classification": {
"species_name": "Eurasian Tree Sparrow",
"confidence": 0.92
},
"bounding_box": {
"timestamp": 11.2,
"x": 150, "y": 200,
"width": 50, "height": 50
}
}
detection_results.json
: Complete pipeline resultsaudio_detections.json
: Audio-only detection datavideo_detections.json
: Video-only detection dataspecies_classifications.json
: Species identification resultsstatistics.json
: Processing and quality metrics
- Synchronized video playback with detection timeline
- Interactive bounding box overlays
- Jump-to-detection functionality
- Color-coded detection events
- Confidence-based visual indicators
- Click-to-navigate interface
- Filter by species, confidence, detection type
- Time-range filtering
- Real-time result updates
- JSON: Complete structured data
- CSV: Tabular format for analysis
- Custom filtered exports
- Minimum: 8GB RAM, 4-core CPU
- Recommended: 16GB RAM, 8-core CPU, GPU with 6GB VRAM
- Storage: 100GB+ for models and temporary files
- CPU: ~2-3x real-time processing
- GPU: ~5-8x real-time processing
- Memory: ~1GB per hour of video
# Docker Compose scaling
docker-compose up --scale celery-worker=4
tanbo_tori/
├── src/
│ ├── core/ # Core pipeline components
│ ├── models/ # Data models and schemas
│ ├── utils/ # Utilities and configuration
│ ├── web/ # Web application
│ └── cli.py # Command-line interface
├── tests/ # Unit and integration tests
├── docker/ # Docker configuration
├── models/ # Pre-trained model storage
└── data/ # Input/output data
# Unit tests
pytest tests/unit/
# Integration tests
pytest tests/integration/
# All tests with coverage
pytest --cov=src tests/
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
curl http://localhost:8000/health
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000
- Application logs:
logs/bird_monitor_*.log
- Web access logs:
logs/nginx/access.log
- Error logs:
logs/nginx/error.log
FFmpeg not found
# Ubuntu/Debian
sudo apt-get install ffmpeg
# macOS
brew install ffmpeg
# Docker: Already included in image
GPU not detected
# Check CUDA installation
nvidia-smi
# Verify PyTorch CUDA support
python -c "import torch; print(torch.cuda.is_available())"
Memory errors during processing
# Reduce batch size in config.yaml
batch_size: 16 # Default: 32
video_frame_skip: 2 # Process every 2nd frame
Model loading failures
- Verify model files exist and are readable
- Check model paths in configuration
- Ensure sufficient disk space
- Enable GPU acceleration if available
- Increase
max_workers
for CPU processing - Use frame skipping for faster processing
- Monitor system resources during processing
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this pipeline in academic research, please cite:
@software{bird_monitoring_pipeline,
title={Bird Monitoring Pipeline: Audio-Visual Detection and Classification},
author={Your Name},
year={2024},
url={https://github.com/yourusername/tanbo_tori}
}
- BirdNET: For audio-based bird detection
- YOLOv10: For real-time object detection
- TransFG: For fine-grained visual classification
- FastAPI: For web application framework
- FFmpeg: For media processing capabilities
For questions, issues, or contributions:
- Create an issue on GitHub
- Contact: your.email@domain.com
- Documentation: Project Wiki