World-class machine learning system for geomagnetic storm prediction using 29 years of space weather data
SOLARIS-X is an advanced ensemble machine learning system designed for real-time geomagnetic storm prediction. Built with 29 years of space weather data (1996-2025) and 79 physics-informed features, it achieves 96.5% AUC with superior storm detection capabilities.
- π 67.8% Storm Recall - Catches 2 out of 3 geomagnetic storms
- β‘ <100ms Inference - Real-time prediction capability
- π¬ Physics-Informed - Features based on solar wind & magnetospheric physics
- π Production-Ready - Complete MLOps pipeline with deployment architecture
| Model | AUC | Precision | Recall | F1-Score | Best For |
|---|---|---|---|---|---|
| Meta-Ensemble β | 0.9646 | 0.4862 | 0.6784 | 0.5664 | Storm Detection |
| LightGBM Baseline | 0.9671 | 0.6655 | 0.4904 | 0.5647 | High Precision |
ποΈ Meta-Ensemble selected for production - Superior storm detection critical for space weather operations
-
Advanced Feature Engineering
- 29-year dataset (257,232 samples)
- 79 physics-based predictors
- Temporal lag features (1-24 hours)
- Solar cycle & seasonal patterns
-
Ensemble Architecture
- LightGBM: Gradient boosting baseline
- Bidirectional GRU: Temporal sequence modeling
- Meta-Learner: Combines predictions optimally
-
Production Pipeline
- Memory-optimized data processing
- Robust scaling & preprocessing
- Model persistence & versioning
- Real-time inference capability
- NASA OMNI Database: Solar wind parameters, IMF data
- NOAA SWPC: Geomagnetic indices (Kp, AE, Dst)
- Temporal Coverage: 1996-2025 (29 years)
- Data Quality: 98.3% completeness after cleaning
- Kp-index - Geomagnetic activity indicator
- IMF Magnitude - Interplanetary magnetic field strength
- Plasma Beta - Solar wind plasma parameter
- AE Index - Auroral electrojet activity
- Solar Cycle Phase - Long-term solar variability
Clone repository
git clone https://github.com/yourusername/SOLARIS-X.git
cd SOLARIS-X
Create virtual environment
python -m venv venv
Activate virtual environment
#Windows:
venv\Scripts\activate
#Linux/Mac:
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
Run complete training pipeline
python scripts/training/complete_pipeline.py
Individual model training
python scripts/training/test_lightgbm.py
python scripts/training/test_neural_network.py
import joblib
import pandas as pd
import numpy as np
Example space weather features
features = {
'Kp_index': 3.5,
'IMF_Magnitude_lag_12h': 7.2,
'Plasma_Beta': 0.8,
'AE_index': 150.0,
'Solar_Cycle_Phase': 0.6
# ... additional features required
}
print("SOLARIS-X Space Weather Prediction System")
print("Geomagnetic Storm Prediction: Ready for deployment")
π Data Pipeline
data/processed/features/- Engineered feature datasets (excluded)data/raw/omni/- Original OMNI database files (excluded)
π€ Training System
scripts/training/models/- Individual model trainers (4 advanced models)scripts/training/utils/- Training utilities and base classesscripts/training/complete_pipeline.py- Main orchestrator
πΎ Model Management
models/checkpoints/- Training checkpoints and metadatamodels/trained/- Production models (excluded)
π Results & Analysis
results/plots/- Performance visualizations (10 charts)
π Configuration
requirements.txt- Python dependencies.gitignore- Repository optimizationREADME.md- Documentation
- 25+ Python modules with professional architecture
- 4 advanced ML models including meta-ensemble system
- Complete MLOps pipeline with automated evaluation
- Production-ready deployment configuration
- Modular Design: Separated model trainers and utilities
- Production Ready: Complete MLOps pipeline structure
- Optimized Storage: Large files excluded via .gitignore
- Comprehensive Evaluation: Visualization and metrics tracking
- Professional Organization: Clear separation of concerns
Note: Files marked as "(excluded)" are not tracked in git due to size constraints but are generated during training.
- Temporal Validation: Proper chronological train/validation/test splits
- Physics-Informed: Features based on magnetospheric coupling theory
- Imbalanced Learning: Specialized techniques for rare storm events
- Ensemble Methods: Meta-learning for optimal prediction combination
- No Data Leakage: Strict temporal separation of datasets
- Multiple Metrics: AUC, F1, Precision, Recall for comprehensive assessment
- Cross-Validation: Robust performance estimation
- Uncertainty Quantification: Prediction confidence intervals
- Space Weather Centers: NOAA, ESA integration
- Satellite Operations: ISS, commercial satellite protection
- Power Grid Safety: Geomagnetic storm early warning
- Aviation: High-altitude flight safety alerts
- Space Physics: Magnetospheric dynamics research
- Climate Science: Space weather impact studies
- Machine Learning: Rare event prediction techniques
- Data Science: Time series ensemble methods
- Python: 3.8+
- RAM: 8GB minimum, 16GB recommended
- CPU: Multi-core processor (12+ cores optimal)
- Storage: 5GB for full dataset and models
lightgbm==4.6.0
tensorflow-cpu==2.20.0
scikit-learn==1.3.0
pandas==2.0.3
numpy==1.24.3
matplotlib==3.7.1
seaborn==0.12.2
Sumanth - Space Weather & Machine Learning Research
- π GitHub: @Sumanth1410-git
- π§ Contact: sumanthp141005@gmail.com
- π₯ 96.5% AUC - World-class space weather prediction performance
- π― 67.8% Storm Recall - Superior rare event detection
- π Production-Ready - Complete MLOps pipeline
- π¬ Physics-Informed - Scientifically validated approach
- π 29-Year Dataset - Comprehensive historical coverage
This project is licensed under the MIT License.
- NASA OMNI Database - Space weather data provision
- NOAA Space Weather Prediction Center - Operational data access
- Space Weather Community - Research inspiration and validation
- Open Source Contributors - Tool and library development
Built with β€οΈ for the space weather research community
β Star this repository if it helps your research!