A Big Data Analytics project that predicts residential property prices in Indian cities using various machine learning algorithms.
This project analyzes a large dataset of residential properties from major Indian cities and implements different regression models to predict property prices based on various features.
- π² Random Forest Regression
- π Linear Regression
- π³ Decision Tree Regression
- π Gradient Boosted Tree Regression
- Comprehensive data preprocessing and cleaning
- Feature selection using Chi-Square test
- Outlier detection and handling
- Model performance comparison and visualization
- Property price prediction for new inputs
- RMSE (Root Mean Square Error)
- MSE (Mean Square Error)
- MAE (Mean Absolute Error)
- RΒ² Score
The Gradient Boosted Tree (GBT) Regression showed the best performance with:
- RΒ² Score: 0.74
- Lowest RMSE: 0.032
- Lowest MAE: 0.009
- PySpark ML
- Pandas
- Matplotlib
- Seaborn
- Plotly
π "Real Estate Price Prediction Using PySpark MLlib"
- Published in: 2024 IEEE International Conference on Smart Power Control and Renewable Energy (ICSPCRE)
- DOI - https://doi.org/10.1109/ICSPCRE62303.2024.10675142
Name | GitHub |
---|---|
Prajwal | @pk24100 |
This project is licensed under the Creative Commons BY-NC 4.0 - see the LICENSE.txt file for details.
Contributions are welcome! Please feel free to submit a Pull Request.