This project focuses on predicting rental prices for apartments in Bursa, Turkey using machine learning regression models. Instead of relying on pre-existing datasets from platforms like Kaggle, I prioritized collecting real-world data to ensure practicality and relevance.
The primary goal is to build an accurate predictive model that can estimate rental prices based on features like apartment size, room count, floor, location, and age of the building.
While platforms like Sahibinden are more popular, Hepsiemlak provided a more accessible and structured dataset. The website's URL structure and embedded details made scraping smoother.
Data was scraped using the Instant Data Scraper Chrome extension.
Collected data covers 32 pages of rental listings in Bursa, resulting in 800+ apartment records.
Extracting district names posed challenges, especially when some districts appeared in multiple URL segments (e.g., bursa-nilufer-29-ekim). After several hours of refining extraction methods, I successfully handled these inconsistencies.
Feature Engineering: Extracted and cleaned vital features like: room (Number of rooms) living_room (Number of living rooms) area (Square meters) floor (Floor number) age (Building age) city, district, neighborhood (Location details) Handled missing values and inconsistent formats. Encoded categorical variables using One-Hot Encoding and scaled numerical features using StandardScaler. Current Status:
Preprocessing for primary features is complete.
Bins were used to group the data distribution in order to obtain clear and understandable results, and frequency densities were analyzed.
I experimented with various regression models to predict rental prices:
Used GridSearchCV for hyperparameter tuning.
Optimized parameters like n_estimators, max_depth, and learning_rate for ensemble models.
Random Forest Regressor with: max_features = 12 n_estimators = 200 Achieved: R² Score: 0.6282 RMSE: 3378.78 Results & Evaluation The optimized Random Forest model significantly improved prediction accuracy:
Model RMSE R² Score Notes Linear Regression 4724.32 0.57 Baseline model Random Forest (optimized) 3378.78 0.6282 Best performing model
Insights: Hyperparameter tuning with GridSearchCV improved RMSE by approximately 28%. Higher n_estimators and max_features contributed to better stability and generalization.
This project successfully demonstrates the application of machine learning regression models to predict rental prices in Bursa, Turkey. By focusing on real-world data collected from hepsiemlak.com, we ensured the analysis and models reflect actual market conditions rather than relying on pre-existing datasets.
Collected over 800+ rental listings with features such as apartment size, room count, floor number, and location. Addressed challenges like extracting district names from complex URL structures. Handled missing values, performed feature engineering, and applied data scaling and encoding to prepare the data for modeling.
Used bins to group data for better visualization of frequency densities. Identified Nilüfer and Osmangazi as the most active rental markets in Bursa. Discovered clear relationships between apartment area and rental price.
Tested various regression models, including Linear Regression, Random Forest, Gradient Boosting, and Ridge Regression. Performed hyperparameter tuning using GridSearchCV, which improved model performance significantly. The optimized Random Forest Regressor emerged as the best model, achieving an R² score of 0.6282 and an RMSE of 3378.78, outperforming the baseline models.
To further enhance the model and analysis, future work may include:
Extend the dataset beyond Bursa to cover the entire Marmara region, providing a broader perspective on regional rental trends. Increasing the volume of data will improve the model’s generalizability and enable comparisons across multiple cities. Feature enrichment: Incorporate additional features like proximity to schools, public transportation, and amenities to better capture price determinants.