Skip to content

This project aims to predict rainfall using machine learning and deep learning models. It includes data analysis, preprocessing, and the application of algorithms like Logistic Regression, SVM, Random Forest, and deep learning models like LSTM for the Kaggle Rain Prediction Challenge.

Notifications You must be signed in to change notification settings

louis-alexandre-laguet/Rain-Prediction-DL-ML

Repository files navigation

Rain Prediction - Machine Learning & Deep Learning

This project was carried out as part of the AI and Applications course in the 2024-2025 academic year. The goal was to predict rainfall for the following day based on weather data from Algeria, using machine learning and deep learning models.

Project Description

The project aimed to predict whether it would rain the next day (RainTomorrow) using weather data collected between January 2010 and December 2014. The target variable is binary: "Yes" if rainfall is expected to be greater than 1 mm, and "No" otherwise. We applied both traditional machine learning algorithms and deep learning techniques to tackle this problem.

Approach and Models

Data Analysis and Preprocessing

The project began with an extensive analysis of the dataset, which included:

  • Handling missing values.
  • Identifying strong correlations with the target variable.
  • Transforming categorical variables into numerical features.

Machine Learning

Several machine learning models were implemented and evaluated, including:

  • Logistic Regression
  • SVM (Support Vector Machine)
  • Random Forest
  • K-Nearest Neighbors (K-NN)
  • Decision Trees

Deep Learning

We explored various deep learning architectures to improve the prediction:

  • Simple Fully Connected Networks
  • Convolutional Models
  • LSTM (Long Short-Term Memory) Models for capturing temporal dependencies in the data.

Model Evaluation

Evaluation metrics included accuracy, precision, recall, and F1 score, particularly focusing on recall to address the imbalance between the classes.

Results and Insights

Machine Learning

Machine learning models such as Logistic Regression, SVM, and Random Forest demonstrated solid performance in predicting the absence of rain (class 0). However, they struggled to correctly identify rainy days (class 1), as evidenced by the low recall and F1 scores for this class. The class imbalance had a significant impact on performance.

Deep Learning

Deep learning models, especially LSTM networks, performed better in capturing temporal dependencies, improving recall for rainy days (class 1) compared to machine learning models. However, this came at the cost of reduced overall accuracy, highlighting the trade-off between recall and precision for imbalanced datasets.

Conclusion

In summary, machine learning models performed better for the majority class (no rain), while deep learning models, particularly LSTMs, showed better performance in predicting rainy days. However, both approaches faced challenges with the class imbalance in the dataset.

To improve results, further exploration of the following strategies could be valuable:

  • Data augmentation to balance the classes.
  • Generative Adversarial Networks (GANs) for synthetic data generation.
  • Hyperparameter optimization with tools like Optuna to fine-tune models for the specific dataset.
  • Data imputation based on seasonality to better handle missing data, especially for features like sunlight, which are consistent year-over-year.

While the current models provide a solid foundation for predicting rainfall, improving the detection of rainy days remains a key challenge, requiring more advanced methods and potentially more balanced datasets.

Prerequisites

  • Python installed on your machine.
  • Libraries such as scikit-learn, PyTorch, and PyTorch Lightning should be installed.
  • Use Jupyter or VS Code to run the notebooks.

Project Structure

  1. rain-prediction-machine-learning.ipynb → Implements machine learning models such as Logistic Regression, SVM, Random Forest, K-NN, and Decision Trees.

  2. rain-prediction-deep-learning.ipynb → Implements deep learning models, including SimpleModel, ConvModel, and LSTMModel, using PyTorch Lightning.

Results

  • The LSTM model showed significant improvements in recall for predicting rainy days compared to other models.

About

This project aims to predict rainfall using machine learning and deep learning models. It includes data analysis, preprocessing, and the application of algorithms like Logistic Regression, SVM, Random Forest, and deep learning models like LSTM for the Kaggle Rain Prediction Challenge.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published