Skip to content

Specific analysis of the data collected by the sensors for the development of a Machine Learning application, which exploits the correlations discovered in the data set to provide a sufficiently valid and correct prediction regarding a probable failure of a machine.

Notifications You must be signed in to change notification settings

michele-abruzzese/predictive_maintenance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📊 predictive_maintenance 🛠️

A project for the Artificial Intelligence course part of the Master Degree in Computer Science at the University of Bologna.


Specific analysis of the data collected by sensors of industrial machines for the development of a Machine Learning application which exploits the correlations between various features discovered in the dataset we used, in order to provide a sufficiently valid and correct prediction model regarding a probable failure of a machine.


Dataset used: “AI4I Predictive Maintenance Dataset”.
Link: https://archive.ics.uci.edu/dataset/601/ai4i+2020+predictive+maintenance+dataset
The version of the dataset we used can be found in this github repository under the name "predictive_maintenance.csv".


🎯 Objectives:

  1. Refinement and pre-processing of the dataset used in order to keep the useful data for our purpose and discard everything else;
  2. Exploratory Data Analysis (EDA), an in-depth study of the dataset aimed at discovering its main characteristics and searching for possible patterns, using statistical analysis tools;
  3. Application of certain classification algorithms to see which one can do the job most accurately.

Example of dataset samples after pre-processing

For informations on the characteristics of the dataset please refer to the link at the top of the page.

exampledataser

EDA (Exploratory Data Analysis)

The output shows the type of failure and its absolute frequency. A quick glance at the table below immediately shows that the number of failures detected is extremely low compared to the cases labelled as "No Failure".

frequency

Correlation matrixes

In the first picture, we can observe the correlation between the various features, while in the second one the correlation between the various types of failure. Analysing them, we can see for example a significant positive correlation of 0.88 between air temperature and process temperature, while there is a significant negative correlation of -0.88 between rotation speed and torque.

correlation correlation

Data visualization

An imbalance was found in the dataset as the number of machine failures were found to be 3.39%.

sbilanciamento failures

Classification models used

  • Logistic regression
  • KNN
  • Support Vector Machine
  • Random Forest
  • Ada Boost
  • XG Boost
  • Naive Bayes
  • Decision Tree Classifier
  • Multi Layer Perceptron

Comparison of different configurations and solutions

The parameters used to evaluate the model are accuracy and total running time, although the latter turns out to be quite irrelevant for ranking them. As can be seen from the image below, all the algorithms applied to the designated model turn out to have a more than remarkable accuracy, being for all of them between 96% and 98%. What varies considerably however is the execution time, ranging from the order of thousandths of a second for Naive Bayes to almost 10 seconds for the Multi Layer Perceptron. In retrospective, another parameter we could have used is recall since it tells more useful informations about the classification model rather than running time (and perhaps accuracy).

results

The 'XGB Classifier' algorithm returned an accuracy of 98% and an execution time of 0.4/0.5s. For a more in-depth study, we show below the scores XGB returned on the classification of records on the target attributes (precision, recall, f-1 score and support).

xgb

About

Specific analysis of the data collected by the sensors for the development of a Machine Learning application, which exploits the correlations discovered in the data set to provide a sufficiently valid and correct prediction regarding a probable failure of a machine.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •