This project uses linear regression to predict house prices in Tehran based on apartment features such as area, number of rooms, parking availability, warehouse availability, and address.
The dataset, sourced from "Apartments_information.csv", contains information about apartments in Tehran with the following columns:
π Area: Apartment area in square meters (initially string, converted to float).
ποΈ Room: Number of rooms (integer).
π Parking: Availability of parking (True/False).
π¦ Warehouse: Availability of a warehouse (True/False).
π Elevator: Availability of an elevator (True/False).
ποΈ Address: Apartment address (categorical, later encoded).
π² Price(USD): Price in USD (float).
Initial size: 3479 rows; after preprocessing: 3452 rows.
-Loaded the dataset from "Apartments_information.csv".
-Dropped rows with missing values (23 rows dropped initially, then 4 more after Area conversion).
-Converted Area from string to float, replacing commas with dots.
-Assumed log transformations for Area and Price (as Area_log and Price_log) to handle skewness.
-Applied target encoding to Address within cross-validation, using the mean Price_log per address from training data.
Algorithm: Linear Regression (sklearn.linear_model.LinearRegression)
Features:
ποΈ Room
π Area
π Parking
π¦ Warehouse
ποΈ Address_encoded
Target: Price
Evaluation: 5-fold cross-validation (KFold, shuffle=True, random_state=42)
RΒ² Scores:
Fold 1: 0.854
Fold 2: 0.875
Fold 3: 0.837
Fold 4: 0.857
Fold 5: 0.856
Average RΒ²: 0.856
MSE Scores:
Fold 1: 0.164
Fold 2: 0.142
Fold 3: 0.183
Fold 4: 0.172
Fold 5: 0.169
Average MSE: 0.166
The model explains ~85.6% of the variance in Price_log (RΒ² = 0.856), indicating a strong fit.
The MSE of 0.166 on the log scale suggests reasonable prediction accuracy.
A Pearson correlation of 0.81 between Area and Price highlights a strong positive relationship.
To run the code:
1.Ensure the required libraries are installed (numpy, pandas, matplotlib, seaborn, sklearn).
2.Place "Apartments_information.csv" in the working directory.
3.Execute the Jupyter notebook (House_price_prediction.ipynb).