This project uses machine learning techniques to predict carbon monoxide (CO) levels based on air quality attributes. By analyzing data from gas sensor arrays, the project aims to improve air quality forecasting and help protect public health and the environment.
- Course Title: Artificial Intelligence
- Course Code: CSE 422
- Section: 14
- Group: 07
- Date: 6th January 2025
- Arnab Das - 19301029
- Meshkatul Arefin - 24341079
- Introduction
- Dataset Description
- Data Preprocessing
- Machine Learning Models
- Results & Analysis
- Conclusion
- References
Carbon monoxide is a dangerous pollutant, and its high levels can cause serious health issues. Forecasting CO levels helps protect the environment and public health. This project implements machine learning to predict CO levels using the Air Quality UCI dataset.
- Source: Air Quality UCI Dataset
- Size: 9,357 rows, 16 columns
- CO_GT: Carbon monoxide levels (mg/m³)
- PT08_S1_CO: Sensor data for CO
- NMHC_GT: Non-methanic hydrocarbon levels (microg/m³)
- NO2_GT: Nitrogen dioxide levels (microg/m³)
- T: Temperature (°C)
- RH: Relative humidity (%)
- Replaced missing values (
-200
) with NaN. - Column
NMHC_GT
was removed due to excessive missing values. - Remaining NaN values were filled using column means.
- Applied MinMaxScaler to normalize features to a range of [0, 1].
- Training data: 70%
- Testing data: 30%
- Training Accuracy: 1.0
- Testing Accuracy: 1.0
- Observed overfitting.
- Training Accuracy: 1.0
- Testing Accuracy: 0.9968
- Best performance, minimal overfitting.
- Training Accuracy: 0.94
- Testing Accuracy: 0.8857
- Underperformed compared to tree-based models.
- Random Forest was the most balanced model with high accuracy and minimal overfitting.
- Decision Tree performed perfectly but raised concerns of overfitting.
- KNN struggled to generalize and achieved lower scores.
Model | Precision | Recall |
---|---|---|
Decision Tree | 1.0000 | 1.0000 |
Random Forest | 0.9968 | 0.9968 |
K-Nearest Neighbor | 0.8857 | 0.8857 |
Machine learning models like Random Forest and Decision Tree excel at air quality prediction. Random Forest is the recommended model due to its balance of accuracy and generalization. These insights can help authorities monitor and improve air quality.
- U.S. Environmental Protection Agency (2022): Carbon monoxide pollution in outdoor air
- Dataset: Air Quality UCI Dataset