Skip to content

omtekyav/BursaKiraTahmini

Repository files navigation

Bursa Rental Price Prediction Using ML and Regression Models

Project Overview

This project focuses on predicting rental prices for apartments in Bursa, Turkey using machine learning regression models. Instead of relying on pre-existing datasets from platforms like Kaggle, I prioritized collecting real-world data to ensure practicality and relevance.

The primary goal is to build an accurate predictive model that can estimate rental prices based on features like apartment size, room count, floor, location, and age of the building.

Data Collection

Source: hepsiemlak.com image

Why Hepsiemlak?

While platforms like Sahibinden are more popular, Hepsiemlak provided a more accessible and structured dataset. The website's URL structure and embedded details made scraping smoother.

Tool Used:

image

Data was scraped using the Instant Data Scraper Chrome extension. Collected data covers 32 pages of rental listings in Bursa, resulting in 800+ apartment records. Image

Challenges Faced:

Extracting district names posed challenges, especially when some districts appeared in multiple URL segments (e.g., bursa-nilufer-29-ekim). After several hours of refining extraction methods, I successfully handled these inconsistencies.

Data Preprocessing

Key Steps:

Feature Engineering: Extracted and cleaned vital features like: room (Number of rooms) living_room (Number of living rooms) area (Square meters) floor (Floor number) age (Building age) city, district, neighborhood (Location details) Handled missing values and inconsistent formats. Encoded categorical variables using One-Hot Encoding and scaled numerical features using StandardScaler. Current Status:

Preprocessing for primary features is complete. Image

Visualizations

Image

Image

Image

Image

Image

image

Bins were used to group the data distribution in order to obtain clear and understandable results, and frequency densities were analyzed.

Model Development

I experimented with various regression models to predict rental prices:

Models Tested:

Linear Regression

image image Evaluation of model

Random Forest Regressor

image

Gradient Boosting Regressor

image

Lasso, Ridge, and ElasticNet Regression

image image image

SVM

image

Overall Values

image

Optimization:

image

Used GridSearchCV for hyperparameter tuning. Optimized parameters like n_estimators, max_depth, and learning_rate for ensemble models. image

Best Performing Model:

Random Forest Regressor with: max_features = 12 n_estimators = 200 Achieved: R² Score: 0.6282 RMSE: 3378.78 Results & Evaluation The optimized Random Forest model significantly improved prediction accuracy:

Model RMSE R² Score Notes Linear Regression 4724.32 0.57 Baseline model Random Forest (optimized) 3378.78 0.6282 Best performing model

Insights: Hyperparameter tuning with GridSearchCV improved RMSE by approximately 28%. Higher n_estimators and max_features contributed to better stability and generalization.

Conclusion

This project successfully demonstrates the application of machine learning regression models to predict rental prices in Bursa, Turkey. By focusing on real-world data collected from hepsiemlak.com, we ensured the analysis and models reflect actual market conditions rather than relying on pre-existing datasets.

Data Collection & Preprocessing:

Collected over 800+ rental listings with features such as apartment size, room count, floor number, and location. Addressed challenges like extracting district names from complex URL structures. Handled missing values, performed feature engineering, and applied data scaling and encoding to prepare the data for modeling.

Exploratory Data Analysis (EDA):

Used bins to group data for better visualization of frequency densities. Identified Nilüfer and Osmangazi as the most active rental markets in Bursa. Discovered clear relationships between apartment area and rental price.

Model Development & Evaluation:

Tested various regression models, including Linear Regression, Random Forest, Gradient Boosting, and Ridge Regression. Performed hyperparameter tuning using GridSearchCV, which improved model performance significantly. The optimized Random Forest Regressor emerged as the best model, achieving an R² score of 0.6282 and an RMSE of 3378.78, outperforming the baseline models.

Future Work

To further enhance the model and analysis, future work may include:

Extend the dataset beyond Bursa to cover the entire Marmara region, providing a broader perspective on regional rental trends. Increasing the volume of data will improve the model’s generalizability and enable comparisons across multiple cities. Feature enrichment: Incorporate additional features like proximity to schools, public transportation, and amenities to better capture price determinants.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published