π‘ House Price Prediction Challenge This repository contains a complete end-to-end machine learning workflow to solve a real-world House Price Prediction problem using a dataset from Kaggle. The project focuses on data wrangling, exploratory data analysis, feature engineering, model building, hyperparameter tuning, and model evaluation.
π Project Overview
The goal is to predict house prices (in Lakhs) based on several property-related features, such as square footage, number of rooms, location, and more. A Random Forest Regressor was trained using an optimized pipeline, and the best model was selected through GridSearchCV.
π Key Highlights π₯ Dataset downloaded using kagglehub
π§Ή Custom wrangle() function for preprocessing and cleaning
π§ Baseline model with DummyRegressor
βοΈ Full pipeline: imputation β scaling β encoding β regression
π§ͺ Cross-validation and hyperparameter tuning with GridSearchCV
π Evaluation using RMSE and RΒ² metrics
πΊοΈ Feature importance analysis and visualization
π¦ Model serialization using joblib
π§Ύ Reusable prediction function for new data
π οΈ Technologies Used Python (3)
Pandas, NumPy, Seaborn, Matplotlib, Plotly
Scikit-learn (Pipeline, RandomForestRegressor, GridSearchCV)
Scipy (for statistical tests)
Joblib (for model persistence)
π Results Best cross-validated RMSE: ~19.36
Test RMSE: ~19.54
Test RΒ²: ~0.76
Most important features: SQUARE_FT, LONGITUDE, LATITUDE, and location-related encodings
π Future Work Try advanced models (e.g., XGBoost, LightGBM)
Deploy the model as an API (e.g., with FastAPI or Flask)
Integrate with streamlit for interactive dashboard
π¬ Contact Feel free to open an issue or reach out if you have suggestions or questions!