Skip to content

MahdiOsali/House-Price-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

House Price Prediction in Tehran 🏠

This project uses linear regression to predict house prices in Tehran based on apartment features such as area, number of rooms, parking availability, warehouse availability, and address.

Dataset πŸ“Š

The dataset, sourced from "Apartments_information.csv", contains information about apartments in Tehran with the following columns:

🏠 Area: Apartment area in square meters (initially string, converted to float).
πŸ›οΈ Room: Number of rooms (integer).
πŸš— Parking: Availability of parking (True/False).
πŸ“¦ Warehouse: Availability of a warehouse (True/False).
πŸ›— Elevator: Availability of an elevator (True/False).
πŸ™οΈ Address: Apartment address (categorical, later encoded).
πŸ’² Price(USD): Price in USD (float).

Initial size: 3479 rows; after preprocessing: 3452 rows.

Preprocessing βš™οΈ

-Loaded the dataset from "Apartments_information.csv".
-Dropped rows with missing values (23 rows dropped initially, then 4 more after Area conversion).
-Converted Area from string to float, replacing commas with dots.
-Assumed log transformations for Area and Price (as Area_log and Price_log) to handle skewness.
-Applied target encoding to Address within cross-validation, using the mean Price_log per address from training data.

Model 🧠

Algorithm: Linear Regression (sklearn.linear_model.LinearRegression)
Features:
πŸ›οΈ Room

πŸ“ Area

πŸš— Parking

πŸ“¦ Warehouse

πŸ™οΈ Address_encoded

Target: Price

Evaluation: 5-fold cross-validation (KFold, shuffle=True, random_state=42)

Results πŸ“ˆ

RΒ² Scores:

Fold 1: 0.854
Fold 2: 0.875
Fold 3: 0.837
Fold 4: 0.857
Fold 5: 0.856
Average RΒ²: 0.856

MSE Scores:

Fold 1: 0.164
Fold 2: 0.142
Fold 3: 0.183
Fold 4: 0.172
Fold 5: 0.169
Average MSE: 0.166

Interpretation πŸ’‘

The model explains ~85.6% of the variance in Price_log (RΒ² = 0.856), indicating a strong fit.
The MSE of 0.166 on the log scale suggests reasonable prediction accuracy.
A Pearson correlation of 0.81 between Area and Price highlights a strong positive relationship.

Usage πŸš€

To run the code:

1.Ensure the required libraries are installed (numpy, pandas, matplotlib, seaborn, sklearn).
2.Place "Apartments_information.csv" in the working directory.
3.Execute the Jupyter notebook (House_price_prediction.ipynb).

About

In this project, a model for predicting house prices has been created using regression.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published