Proactive Failure Prediction in Train Air Production Units Using XGBoost for Enhanced Safety

Overview

This project aims to proactively predict failures in train air production units using the MetroPT3 dataset and XGBoost, improving operational safety and maintenance planning. It involves preprocessing large-scale sensor data, training several machine learning models, and interpreting results with SHAP (SHapley Additive exPlanations).

Objectives

Anticipate failures in train air production units by detecting pre-failure states.
Use the XGBoost model for accurate and reliable prediction, along with other models (RF, DT, KNN, LR) for comparison.
Apply interpretable analysis using SHAP to understand model decisions.
Allow direct reuse of pre-trained models without the need to rerun training.
Provide ready-to-use results with preprocessed data and precomputed SHAP analyses.

Dataset

Source: Kaggle - MetroPT3 Dataset

Period: February – August 2020
Sampling Rate: 1 Hz
Total Records: 15,169,480
Features: 15 (7 analog + 8 digital)

Note

Before running the project, download the dataset from Kaggle and place it in the data/raw/ directory.

Project Structure

Proactive-Failure-Prediction-in-Train-Air-Production-Units-Using-XGBoost-for-Enhanced-Safety/
├── data/                            # All data files used in the project
│   ├── raw/                         # Raw dataset (e.g., MetroPT3(AirCompressor).csv)
│   ├── processed/                   # Cleaned and preprocessed data (e.g., X_train.csv, X_test.csv, etc.)
│   ├── predictions/                 # Model predictions stored as CSVs
│   └── shap/                        # SHAP values for model interpretability
│
├── models/                          # Trained models saved as .pkl files (e.g., xgboost_model.pkl)
│
├── notebooks/                       # Jupyter Notebooks for each step of the pipeline
│   ├── 01_data_exploration.ipynb        # Exploratory Data Analysis (EDA)
│   ├── 02_data_preprocessing.ipynb      # Preprocessing, feature engineering (3-day rolling averages), normalization
│   ├── 03_model_training.ipynb          # Training multiple models: XGBoost, KNN, Logistic Regression, etc.
│   ├── 04_results.ipynb                 # Evaluation of models (accuracy, precision, recall, ROC curves)
│   └── 05_model_interpretation.ipynb    # SHAP-based interpretation of the XGBoost model
│
├── src/                             # Modular Python scripts for reusable logic
│   ├── data/
│   │   └── preprocess.py            # Functions for data cleaning, transformation, and splitting
│   ├── models/
│   │   ├── train.py                 # Functions for training different models
│   │   └── evaluate.py              # Functions to calculate and visualize performance metrics
│   └── utils/
│       └── helpers.py               # Utility functions for:
│                                   # - Plotting ROC curves
│                                   # - Generating and saving classification reports
│
├── requirements.txt                # Required Python libraries (pandas, sklearn, xgboost, shap, etc.)
└── README.md                       # Project documentation (this file)

Getting Started

Prerequisites

Python 3.8 or higher
Jupyter Notebook
Git

Installation

1. Clone the repository

git clone https://github.com/YahiaouiLydia/Proactive-Failure-Prediction-in-Train-Air-Production-Units-Using-XGBoost-for-Enhanced-Safety.git
cd Proactive-Failure-Prediction-in-Train-Air-Production-Units-Using-XGBoost-for-Enhanced-Safety

2. (Optional) Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

Tip

It's recommended to run the project inside a virtual environment to manage dependencies.

How to Run the Project

Run the Jupyter notebooks in the following order:

Data Exploration
- notebooks/01_data_exploration.ipynb
- Visualize and understand the raw sensor data
Data Preprocessing
- notebooks/02_data_preprocessing.ipynb
- Clean and process the dataset, generate engineered features, and create train/test splits (saved in data/processed/)
Model Training
- notebooks/03_model_training.ipynb
- Train XGBoost and other ML models and save them to models/
Model Evaluation
- notebooks/04_results.ipynb
- Evaluate models using accuracy, precision, recall, F1, ROC curves
- Save predictions to data/predictions/
Model Interpretation
- notebooks/05_model_interpretation.ipynb
- Analyze model behavior using SHAP and store visualizations in data/shap/

Caution

Preprocessing and training might be resource-intensive due to the dataset size.

Tip

To save time, use the existing trained models and SHAP analysis provided in the repository.

Results

Model Performance (XGBoost Example)

Accuracy: 99.95%
Precision: 99.95%
Recall: 99.95%
F1 Score: 99.95%
ROC AUC: 1.00

Performance plots and classification reports are saved as PNG files in the evaluation notebook.

SHAP Interpretation

Key Insights

Top Features:
- DV_pressure_avg_3day — High relevance to pre-failure state
- Oil_temperature_avg_3day — Elevated temperatures linked to failures
Less Important:
- LPS, Pressure_switch

SHAP Summary Plot

X-axis: Mean absolute SHAP value (feature impact)
Y-axis: Sorted features
Color Legend:
- Red = Pre-Failure (1)
- Green = No Failure (0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Proactive Failure Prediction in Train Air Production Units Using XGBoost for Enhanced Safety

Overview

Objectives

Dataset

Project Structure

Getting Started

Prerequisites

Installation

1. Clone the repository

2. (Optional) Create a virtual environment

3. Install dependencies

How to Run the Project

Results

Model Performance (XGBoost Example)

SHAP Interpretation

Key Insights

SHAP Summary Plot

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
data		data
models		models
notebooks		notebooks
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
shap_summary_xgboost_bar_classes.png		shap_summary_xgboost_bar_classes.png

YahiaouiLydia/Proactive-Failure-Prediction-in-Train-Air-Production-Units-Using-XGBoost-for-Enhanced-Safety

Folders and files

Latest commit

History

Repository files navigation

Proactive Failure Prediction in Train Air Production Units Using XGBoost for Enhanced Safety

Overview

Objectives

Dataset

Project Structure

Getting Started

Prerequisites

Installation

1. Clone the repository

2. (Optional) Create a virtual environment

3. Install dependencies

How to Run the Project

Results

Model Performance (XGBoost Example)

SHAP Interpretation

Key Insights

SHAP Summary Plot

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages