NetRail-Incident-Analysis is a prototype refactoring of a older rail incident-delay model, currently it ingests raw feeds (e.g. weather, timetables, delay incidents) and produces clean, per-segment, per-hour feature datasets for modelling and analysis.
In future versions, this will include incident and delay modelling subpackages.
src/
├── rail_data/
│ ├── io/ # Raw data ingestion, caching, and parsing
│ ├── features/ # Feature engineering on cached data
│ └── models/ # Model training utilities (rough draft!)
- Track segments are identified by
ELR_MIL
codes (Engineer’s Line Reference + milepoint bin). - All datasets are hourly resolution, partitioned by segment and time.
- The pipeline works in three stages:
- Data ingestion (
io/
) Fetches and normalises raw feeds (weather, train schedules, delay logs, holidays, shapefiles…). - Feature engineering (
features/
) Builds timebases, aggregates weather, counts trains/incidents, and outputs partitioned Parquet datasets. - Modelling (
models/
) Combines feature tables and fits statistical models for incident data.
- Data ingestion (
import rail_data
rail_data.io.get_datasets("2024-01-01", "2024-12-31")
rail_datafeatures.create_datasets("2024-01-01", "2024-12-31")