The provided code aims to address the problem of predictive maintenance using machine learning techniques. Predictive maintenance is crucial for industries to avoid unexpected machine failures, reduce downtime, and optimize maintenance schedules. By leveraging historical data and machine learning algorithms, it's possible to predict when maintenance is required before a breakdown occurs. The code performs data preprocessing, model creation, model evaluation, and prediction for custom input.
Predictive maintenance is essential for industries to minimize downtime, optimize maintenance schedules, and reduce costs associated with unexpected machine failures. The objective is to predict the type of failure a machine might experience based on its operational data.
- The code first reads in two datasets: To show that the model works even if on two separate datasets.
- Data preprocessing steps include:
- Handling missing values
- Encoding categorical variables
- Scaling numerical features
- Removing outliers
- Various classification models are trained on the datasets to predict different failure types. These models include:
- Logistic Regression
- Random Forest
- AdaBoost
- Bagging
- SVM (Support Vector Machine)
- For each model, the code evaluates its performance using accuracy scores.
- The code selects the best-performing model for each dataset based on mean accuracy scores.
- After selecting the best model for Dataset 1, the code retrains the selected model using the entire Dataset 1.
- It then predicts failure types for a custom user input provided as a dictionary.
The results include data visualizations, model evaluation metrics, and insights into model performance.
- Scatter plots and heat maps are generated for each failure type in both datasets. These visualizations help understand the relationships between features and failure types.
- Accuracy scores are calculated for each model and each failure type.
- A bar graph compares the mean accuracy of different models, providing insights into their performance.
- All scatter plots, heat maps, and accuracy graphs are included in this section for easy reference.
- Scatter plots are generated for each failure type in both datasets to visualize the relationships between features and failure types.
- Heat maps are generated for each failure type in both datasets to visualize the correlations between features and failure types.
- A bar graph compares the mean accuracy of different models for each failure type in both datasets.
- Logistic Regression:
- Dataset 1:
- Machine failure Accuracy: 0.9706652697747512
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9910948140387638
- PWF Accuracy: 0.9968569931901519
- OSF Accuracy: 0.9931901519119958
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Logistic Regression: 99.10948140387637
- Dataset 2:
- Machine failure Accuracy: 0.9696176008381352
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9910948140387638
- PWF Accuracy: 0.9968569931901519
- OSF Accuracy: 0.9910948140387638
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Logistic Regression: 99.05709795704557
- Dataset 1:
- Random Forest:
- Dataset 1:
- Machine failure Accuracy: 0.9879518072289156
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9984284965950759
- PWF Accuracy: 0.9973808276584599
- OSF Accuracy: 0.9937139863803038
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Random Forest: 99.53727955299458
- Dataset 2:
- Machine failure Accuracy: 0.9858564693556836
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9994761655316919
- PWF Accuracy: 0.9973808276584599
- OSF Accuracy: 0.9937139863803038
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Random Forest: 99.51981840405098
- Dataset 1:
- AdaBoost:
- Dataset 1:
- Machine failure Accuracy: 0.9717129387113672
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9994761655316919
- PWF Accuracy: 0.9984284965950759
- OSF Accuracy: 0.9968569931901519
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for AdaBoost: 99.35393748908679
- Dataset 2:
- Machine failure Accuracy: 0.9722367731796753
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 1.0
- PWF Accuracy: 0.9984284965950759
- OSF Accuracy: 0.9937139863803038
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for AdaBoost: 99.31901519119958
- Dataset 1:
- Bagging:
- Dataset 1:
- Machine failure Accuracy: 0.9884756416972237
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 1.0
- PWF Accuracy: 0.998952331063384
- OSF Accuracy: 0.9952854897852279
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Bagging: 99.6245852977126
- Dataset 2:
- Machine failure Accuracy: 0.9879518072289156
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 1.0
- PWF Accuracy: 0.998952331063384
- OSF Accuracy: 0.9942378208486118
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for Bagging: 99.59839357429719
- Dataset 1:
- SVM:
- Dataset 1:
- Machine failure Accuracy: 0.9685699319015191
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9905709795704557
- PWF Accuracy: 0.997904662126768
- OSF Accuracy: 0.9931901519119958
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for SVM: 99.08328968046096
- Dataset 2:
- Machine failure Accuracy: 0.9675222629649031
- TWF Accuracy: 0.9968569931901519
- HDF Accuracy: 0.9910948140387638
- PWF Accuracy: 0.997904662126768
- OSF Accuracy: 0.9905709795704557
- RNF Accuracy: 0.997904662126768
- Mean Accuracy for SVM: 99.03090623363018
- Dataset 1:
Bagging, or Bootstrap Aggregating, combines multiple models (typically decision trees) trained on different subsets of the training data, thereby reducing variance and improving stability. Here's why Bagging might result in the best performance:
- Reduction of Overfitting: By training multiple models on different subsets of data, Bagging reduces overfitting and variance, leading to more robust predictions.
- Increased Generalization: Combining predictions from multiple models reduces the risk of bias and can lead to better generalization on unseen data.
- Better Handling of Complex Relationships: Bagging can capture complex relationships between features and target variables by aggregating predictions from multiple models, each trained on different aspects of the data.
- Robustness to Noisy Data: Bagging is less sensitive to noisy data compared to individual models, as it averages out predictions across multiple models.
Overall, Bagging's ability to reduce overfitting, improve generalization, and handle complex relationships makes it a powerful ensemble learning technique for predictive maintenance tasks.
In conclusion, the code successfully addresses the problem of predictive maintenance by leveraging machine learning techniques. By preprocessing the data, creating models, and evaluating their performance, it provides insights into machine failure prediction. The inclusion of visualizations and accuracy metrics enhances the understanding of model behavior and performance.