ICTV-git transforms the International Committee on Taxonomy of Viruses (ICTV) classification system into a transparent, versioned, and community-driven platform using git version control principles. This solves the reproducibility crisis in virology research by enabling researchers to track taxonomic changes, migrate datasets between versions, and cite specific taxonomy versions.
- β Phase 1 - Family Size Analysis: Evidence-based guidelines for viral family management
- β Phase 2 - Temporal Evolution: Multi-rank growth patterns and taxonomic stability analysis
- β Phase 3 - Discovery Method Evolution: Technology-driven discovery paradigm shifts
- β Real Data Framework: Strict policy ensuring all analyses use only documented ICTV statistics
- β 12 Publication-Ready Visualizations: Comprehensive plots documenting viral taxonomy evolution
Current viral taxonomy management suffers from:
- Breaking changes without migration paths - The Caudovirales reclassification eliminated 50+ years of ecological data associations
- Version incompatibility - 18 MSL releases since 2005, each incompatible with the others
- Lost institutional knowledge - When families split, historical reasoning disappears
- Reproducibility crisis - Papers published months apart use incompatible taxonomies
- π Optimal Size Guidelines: 50-300 species per family based on 20-year patterns
- π Caudovirales Case Study: 1,847 species reorganization from 3 to 15 families
- π Growth Metrics: 14.8x species increase with 15.2% annual growth
β οΈ Crisis Thresholds: Families >1,000 species require immediate action
- π Multi-Rank Evolution: Species (14.8x), Genera (15.5x), Families (4.5x) growth
- π 5 Acceleration Periods: Technology-driven growth spikes (up to 79.7% in 2017)
- π Stability Analysis: Families most stable (CV=0.425), species least stable (CV=0.512)
- π Technology Eras: Pre-NGS β NGS β Metagenomics β AI (2005-2024)
- π¬ 4 Discovery Eras: Culture β Molecular β Metagenomics β AI-assisted
- π 34x Discovery Rate Increase: From 125 to 3,297 species/year
- π Paradigm Shift: 90% pathogen-focused β 70% environmental-focused
- π° 200x Cost Reduction: $10,000 β $50 per genome enabling mass discovery
# Clone the repository
git clone https://github.com/shandley/ICTV-git.git
cd ICTV-git
# Create virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install matplotlib pandas numpy pyyaml
# Phase 1: Family Size Analysis
python research/family_size_analysis/basic_analysis.py
python research/family_size_analysis/create_matplotlib_plots.py
# Phase 2: Temporal Evolution Analysis
python research/temporal_evolution_analysis/temporal_analysis.py
python research/temporal_evolution_analysis/create_temporal_plots.py
# Phase 3: Discovery Method Evolution
python research/discovery_method_evolution/discovery_method_analysis.py
python research/discovery_method_evolution/create_discovery_plots.py
- Analysis Results: JSON files in each phase's
results/
directory - Publication Plots: 12 total plots (4 per phase) as PNG/PDF files
- Research Reports: Comprehensive findings documents for each phase
Our analysis of 20 years of ICTV Master Species Lists revealed:
- 14.8x Growth: Viral species increased from 1,950 (MSL23, 2005) to 28,911 (MSL40, 2024)
- 15.2% Annual Growth: Consistent exponential growth driven by technology advances
- Technology Acceleration: Major growth spurts during sequencing cost reduction (2012), metagenomics revolution (2017), and COVID-19 response (2021)
- Caudovirales Dissolution: Largest reorganization in ICTV history - 1,847 species split from 3 families into 15 families (2021)
- Family Size Crisis: Evidence-based framework showing families >1,000 species require immediate reorganization
- Evidence-based Guidelines: Optimal family sizes (50-300 species) based on real ICTV patterns
- Crisis Prevention: Early warning system for families approaching instability (>1,000 species)
- Reorganization Planning: Learn from Caudovirales dissolution to plan future taxonomy changes
# Example: Load and analyze real ICTV growth data
import json
with open('research/family_size_analysis/results/family_size_analysis_basic.json', 'r') as f:
data = json.load(f)
growth_data = data['growth_analysis']['growth_data']
for year_data in growth_data:
print(f"{year_data['year']}: {year_data['species_count']:,} species")
- Growth Trajectory: 20-year exponential species growth with technology milestones
- Acceleration Analysis: Technology-driven discovery periods with real growth rates
- Caudovirales Timeline: Before/after visualization of largest viral taxonomy reorganization
- Management Framework: Evidence-based family size guidelines with crisis zone identification
- Real Data Only: All analyses use exclusively documented ICTV Master Species List statistics
- Zero Mock Data: Comprehensive elimination of any simulated or synthetic data
- Source Verification: Every finding traceable to official ICTV publications
- Validation Pipeline: Multi-stage verification ensuring research integrity
- β Phase 1: Family Size Analysis - COMPLETE
- β Phase 2: Temporal Evolution Analysis - COMPLETE
- β Phase 3: Discovery Method Evolution - COMPLETE
- π Next Phase: Full MSL parsing for git repository creation
- π Future Work: Migration tools, semantic diffs, and community platform
ICTV-git/
βββ research/ # Research analysis modules
β βββ family_size_analysis/ # Phase 1: Family size analysis
β β βββ basic_analysis.py # Core analysis with real ICTV data
β β βββ create_matplotlib_plots.py # Publication visualizations
β β βββ results/ # 4 plots + analysis data
β β βββ REAL_DATA_FINDINGS.md # Comprehensive findings
β βββ temporal_evolution_analysis/ # Phase 2: Temporal patterns
β β βββ temporal_analysis.py # Multi-rank evolution analysis
β β βββ create_temporal_plots.py # Growth and stability plots
β β βββ results/ # 4 plots + temporal data
β β βββ TEMPORAL_EVOLUTION_FINDINGS.md # Evolution findings
β βββ discovery_method_evolution/ # Phase 3: Discovery methods
β β βββ discovery_method_analysis.py # Method contribution analysis
β β βββ create_discovery_plots.py # Technology impact plots
β β βββ results/ # 4 plots + method data
β β βββ DISCOVERY_METHOD_EVOLUTION_FINDINGS.md # Method findings
β βββ MOCK_DATA_ARCHIVE/ # Archived mock data (not used)
βββ manuscript_findings_v3.json # Consolidated research findings
βββ CLAUDE.md # Project development guidelines
βββ README.md # This file
- Phase 1: Family Size Findings
- Phase 2: Temporal Evolution Findings
- Phase 3: Discovery Method Findings
- Consolidated: Manuscript Findings
- Development Guidelines - Project instructions and data policies
- Research Implementation Plan - Analysis roadmap
We welcome contributions to expand this research! Current priorities:
- MSL Data Parsing: Help build parsers for all 18 MSL Excel files (2005-2024)
- Git Conversion Tools: Create tools to convert parsed taxonomy into git repositories
- Additional Analyses: Apply methodology to other viral taxonomy questions
- Visualization Improvements: Enhance publication-quality plots and add new analysis types
- Data Validation: Help verify and cross-check ICTV statistics across sources
If you use ICTV-git analyses in your research, please cite:
@software{ictv-git,
author = {Handley, Scott},
title = {ICTV-git: Comprehensive Analysis of Viral Taxonomy Evolution},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/shandley/ICTV-git},
note = {Three-phase analysis of ICTV data (2005-2024): family size dynamics, temporal evolution, and discovery method impacts}
}
Research manuscript in preparation examining 20 years of viral taxonomy evolution through git-based analysis.
- ICTV Official Site - International Committee on Taxonomy of Viruses
- Master Species Lists - Source data for all analyses
- Project Issues - Report bugs or request features
This project is licensed under the MIT License - see LICENSE for details. ICTV data is used under Creative Commons license as specified by ICTV.
- Project Lead: Scott Handley
- GitHub Issues: https://github.com/shandley/ICTV-git/issues
- Email: handley.scott@gmail.com
- International Committee on Taxonomy of Viruses (ICTV) for providing open access to MSL data
- The virology community for feedback and use cases
- Git and open source community for inspiration
Transforming viral taxonomy from static documents to dynamic, versioned data