Skip to content

NetRail Incident Analysis is a Python toolkit that ingests raw Network Rail data; like timetables, delay logs, geospatial files, and weather records and extracts features.

License

Notifications You must be signed in to change notification settings

Katielocks/NetRailPipeline

Repository files navigation

NetRail-Incident-Analysis Package

NetRail-Incident-Analysis is a prototype refactoring of a older rail incident-delay model, currently it ingests raw feeds (e.g. weather, timetables, delay incidents) and produces clean, per-segment, per-hour feature datasets for modelling and analysis.

In future versions, this will include incident and delay modelling subpackages.


Contents

src/
├── rail_data/
│   ├── io/        # Raw data ingestion, caching, and parsing
│   ├── features/  # Feature engineering on cached data
│   └── models/    # Model training utilities (rough draft!)

Key Concepts

  • Track segments are identified by ELR_MIL codes (Engineer’s Line Reference + milepoint bin).
  • All datasets are hourly resolution, partitioned by segment and time.
  • The pipeline works in three stages:
    1. Data ingestion (io/) Fetches and normalises raw feeds (weather, train schedules, delay logs, holidays, shapefiles…).
    2. Feature engineering (features/) Builds timebases, aggregates weather, counts trains/incidents, and outputs partitioned Parquet datasets.
    3. Modelling (models/) Combines feature tables and fits statistical models for incident data.

Example Workflow

import rail_data

rail_data.io.get_datasets("2024-01-01", "2024-12-31")

rail_datafeatures.create_datasets("2024-01-01", "2024-12-31")

About

NetRail Incident Analysis is a Python toolkit that ingests raw Network Rail data; like timetables, delay logs, geospatial files, and weather records and extracts features.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages