This project aims to predict housing prices using machine learning models and analyze the importance of housing features. The dataset contains structured information about real estate properties including area, number of rooms, location factors, and furnishing status.
housing-price-prediction/
β
βββ data/
β βββ Housing.csv # Original dataset
β βββ housing_cleaned.csv # Cleaned dataset with additional features
β
βββ notebooks/
β βββ 01_data_preparation.ipynb # Data cleaning, encoding, outlier detection
β βββ 02_model_training.ipynb # Model training, evaluation, feature importance
β
βββ requirements.txt # Required Python libraries
βββ README.md
The dataset includes:
area
,bedrooms
,bathrooms
,stories
mainroad
,guestroom
,basement
, etc.furnishingstatus
: encoded in multiple formats (ordinal and one-hot)- Engineered features:
area_per_bedroom
area_per_bathroom
is_fully_equipped
- Converted object columns to numeric/boolean
- Used one-hot and ordinal encoding strategies
- Detected and removed outliers using Isolation Forest
- Created engineered features
Trained and evaluated the following models:
Model | Encoding Used |
---|---|
Linear Regression | One-Hot |
Ridge Regression | One-Hot |
Random Forest Regressor | Ordinal |
Gradient Boosting Regressor | Ordinal |
XGBoost Regressor | Ordinal |
All models were compared using:
- RMSE
- MAE
- RΒ² Score
- Actual vs Predicted scatter plots
- Coefficients (linear models)
- Feature importances (tree models)
- Fine-tune hyperparameters
- Use SHAP values for explainability
- Explore ensemble and stacking models
- Deploy a trained model as an API