This mini-project is a simple machine learning model to predict rainfall. Using a Kaggle dataset, I performed:
- Data Cleaning: Addressed missing values and ensured data consistency.
- Feature Engineering: Created and selected meaningful features to improve model performance.
- Model Training and Evaluation: Leveraged scikit-learn to train and test multiple machine learning models.
This mini-project highlights my ability to independently process raw datasets, apply machine learning concepts, and derive actionable insights from the data.
Source: Kaggle Dataset
- Dataset contains Date, Location, Temperature, Humidity, Cloud Cover, Rain Tomorrow, Precipitation, Wind Speed
- Target variable:
Rainfall Tomorrow
-
Load Dataset:
- Imported data using
pandas
and performed an exploratory data analysis (EDA)
- Imported data using
-
Data Preprocessing:
- Handled missing values
- Encoded categorical variables
- Standardized numerical features
-
Feature Engineering:
- Created new features to capture patterns in the data
-
Model Training:
- Split data into training and testing sets
- Trained Logistic Regression (normal and SGD-based) and Decision Classification models using
Scikit-learn
- Utilized class balancing to improve F1-scores
-
Evaluation:
- Evaluated models using F1-score
- Current Model: Logistic Regression
- Key Metric: F1 of 0.91
- Gained hands-on experience in feature engineering, data preprocessing, and understood logistic regression and decision tree classification.
- Strengthened understanding of machine learning workflows.
- Explored practical challenges in handling real-world datasets.
- Experiment with advanced models like Random Forests or Gradient Boosting.
- Fine-tune hyperparameters for better accuracy.
- Deploy the model using Streamlit or Flask for user interaction.
This project is open-source under the MIT License.