Skip to content

pk24100/real-estate-price-prediction-using-pyspark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏠 Real Estate Price Prediction using PySpark ML

A Big Data Analytics project that predicts residential property prices in Indian cities using various machine learning algorithms.

🎯 Overview

This project analyzes a large dataset of residential properties from major Indian cities and implements different regression models to predict property prices based on various features.

πŸ”§ Models Implemented

  • 🌲 Random Forest Regression
  • πŸ“ˆ Linear Regression
  • 🌳 Decision Tree Regression
  • πŸš€ Gradient Boosted Tree Regression

✨ Key Features

  • Comprehensive data preprocessing and cleaning
  • Feature selection using Chi-Square test
  • Outlier detection and handling
  • Model performance comparison and visualization
  • Property price prediction for new inputs

πŸ“Š Performance Metrics

  • RMSE (Root Mean Square Error)
  • MSE (Mean Square Error)
  • MAE (Mean Absolute Error)
  • RΒ² Score

πŸ† Best Performing Model

The Gradient Boosted Tree (GBT) Regression showed the best performance with:

  • RΒ² Score: 0.74
  • Lowest RMSE: 0.032
  • Lowest MAE: 0.009

πŸ› οΈ Technologies Used

  • PySpark ML
  • Pandas
  • Matplotlib
  • Seaborn
  • Plotly

Research Paper of the Project

πŸ“‘ "Real Estate Price Prediction Using PySpark MLlib"

πŸ‘₯ Collaborators

Name GitHub
Prajwal @pk24100

πŸ“ License

This project is licensed under the Creative Commons BY-NC 4.0 - see the LICENSE.txt file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published