Skip to content

🏑 House Price Prediction Challenge This repository contains a complete end-to-end machine learning workflow to solve a real-world House Price Prediction problem using a dataset from Kaggle. The project focuses on data wrangling, exploratory data analysis, feature engineering, model building, hyperparameter tuning, and model evaluation.

Notifications You must be signed in to change notification settings

Kilugha737/data-science-project_house-price-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

data-science-project_house-price-prediction

🏑 House Price Prediction Challenge This repository contains a complete end-to-end machine learning workflow to solve a real-world House Price Prediction problem using a dataset from Kaggle. The project focuses on data wrangling, exploratory data analysis, feature engineering, model building, hyperparameter tuning, and model evaluation.

πŸ“Š Project Overview

The goal is to predict house prices (in Lakhs) based on several property-related features, such as square footage, number of rooms, location, and more. A Random Forest Regressor was trained using an optimized pipeline, and the best model was selected through GridSearchCV.

πŸ” Key Highlights πŸ“₯ Dataset downloaded using kagglehub

🧹 Custom wrangle() function for preprocessing and cleaning

🧠 Baseline model with DummyRegressor

βš™οΈ Full pipeline: imputation β†’ scaling β†’ encoding β†’ regression

πŸ§ͺ Cross-validation and hyperparameter tuning with GridSearchCV

πŸ“ˆ Evaluation using RMSE and RΒ² metrics

πŸ—ΊοΈ Feature importance analysis and visualization

πŸ“¦ Model serialization using joblib

🧾 Reusable prediction function for new data

πŸ› οΈ Technologies Used Python (3)

Pandas, NumPy, Seaborn, Matplotlib, Plotly

Scikit-learn (Pipeline, RandomForestRegressor, GridSearchCV)

Scipy (for statistical tests)

Joblib (for model persistence)

πŸ“ˆ Results Best cross-validated RMSE: ~19.36

Test RMSE: ~19.54

Test RΒ²: ~0.76

Most important features: SQUARE_FT, LONGITUDE, LATITUDE, and location-related encodings

πŸ“Œ Future Work Try advanced models (e.g., XGBoost, LightGBM)

Deploy the model as an API (e.g., with FastAPI or Flask)

Integrate with streamlit for interactive dashboard

πŸ“¬ Contact Feel free to open an issue or reach out if you have suggestions or questions!

About

🏑 House Price Prediction Challenge This repository contains a complete end-to-end machine learning workflow to solve a real-world House Price Prediction problem using a dataset from Kaggle. The project focuses on data wrangling, exploratory data analysis, feature engineering, model building, hyperparameter tuning, and model evaluation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published