A machine learning project that uses logistic regression to predict real estate sale probability and support business decision-making.
Real Estate Valuation Dataset from UCI Machine Learning Repository contains 414 property records with the following key features:
- House Age: Age of the property in years
- Distance to MRT: Distance to nearest Mass Rapid Transit station (meters)
- Number of Convenience Stores: Count of nearby convenience stores
- Unit Price: House price per unit area (NT$/m²)
- Location: Latitude and longitude coordinates
Objective: Predict whether a real estate property will sell based on its characteristics and listing price.
Target Variable: Binary classification
1
= Property will likely sell (priced within 120% of market average)0
= Property will likely NOT sell (priced above 120% of market average)
Model Performance:
- Training Accuracy: 99.7%
- Test Accuracy: 99.2%
This logistic regression model supports several critical business decisions:
- Pricing Strategy: Determine optimal listing price to maximize sale probability
- Market Analysis: Identify properties likely to sell quickly vs. those requiring price adjustments
- Investment Decisions: Assess property attractiveness based on location and characteristics
- Risk Assessment: Quantify the probability of a successful sale before listing
Business Impact: Real estate agents and property investors can use this model to make data-driven pricing decisions, reducing time on market and maximizing profitability.
- Jupyter Notebook Analysis: Complete data exploration, model training, and evaluation
- Interactive Streamlit App: Web interface for real-time property sale probability predictions
- Model Persistence: Trained model saved for production use
- Visualization: Price vs. probability charts and business insights
real_estate/
├── businessAnalysis_realEstateValuation.ipynb # Main analysis notebook
├── real_estate_app.py # Streamlit web application
├── real_estate_model.pkl # Trained logistic regression model
├── data/
│ ├── Real estate valuation data set.xlsx # Original dataset
│ └── real+estate+valuation+data+set.zip # Backup data
└── README.md # This file
-
Clone the repository:
git clone https://github.com/mobius29er/realEstateValuation.git cd realEstateValuation
-
Install required packages:
pip install pandas numpy scikit-learn matplotlib streamlit joblib openpyxl
-
Run the Jupyter Notebook:
jupyter notebook businessAnalysis_realEstateValuation.ipynb
-
Launch the Streamlit App:
streamlit run real_estate_app.py
To deploy this app on Streamlit Cloud:
- Fork or clone this repository to your GitHub account
- Visit share.streamlit.io
- Connect your GitHub account and select this repository
- Set the main file path to:
real_estate_app.py
- Deploy - Streamlit Cloud will automatically install dependencies from
requirements.txt
Live Demo: The app can be deployed and accessed via Streamlit Cloud for interactive demonstrations.
Open businessAnalysis_realEstateValuation.ipynb
to explore:
- Data loading and preprocessing
- Model training and evaluation
- Price sensitivity analysis
- Business insights and recommendations
Launch the app to interactively:
- Input property characteristics (age, MRT distance, convenience stores, price)
- Get real-time sale probability predictions
- Receive business recommendations
- Visualize results with color-coded feedback
- Algorithm: Logistic Regression
- Features: House age, MRT distance, convenience stores count, listing price
- Target: Binary classification (will sell / won't sell)
- Performance: 99.2% test accuracy
- Business Rule: Properties priced within 120% of market average are likely to sell
This model can be used by:
- Real Estate Agents: Optimize listing prices for faster sales
- Property Investors: Assess investment opportunities
- Homeowners: Determine competitive pricing strategies
- Market Analysts: Understand pricing dynamics
- Properties priced within 120% of market average have high sale probability
- Distance to MRT station significantly impacts sale likelihood
- Number of nearby convenience stores affects property attractiveness
- House age influences buyer preferences
- Python 3.13+
- pandas: Data manipulation and analysis
- scikit-learn: Machine learning algorithms
- matplotlib: Data visualization
- streamlit: Web application framework
- joblib: Model serialization
This project is part of an educational assignment for AI/ML coursework.
This is an educational project. Feel free to fork and experiment with different models or features.