Property Predictions

Predictive modeling of Real Estate using Machine Learning and dimensionality reduction

Data Analysis on Redfin sales data to explore predictive possibilities

By Sravani, Lee, Roman, and Siege

Data Source and Objective

Using monthly sales data from RedFin spanning 2012-2020, we wanted to explore relationships between the Median Sales Prices of homes in any given neighborhood of Seattle, and meta-data about related sales. This included the number of new sales on the market, the total inventory of homes, and the length of time the average home was for sale, among other measures.

Discoveries

The bulk of our research was done in Data Exploration by running a variety of models to find correlations and determine if there was any meaningful predicatability. As we discovered, there was very little correlation between the majority of measures, and the few that existed were self-referential.

Seattle City Analysis

As a first step, analysis was done soley on Seattle city data (non-neighborhood specific). Predictive modeling was done using ARMA, ARIMA, LSTM RNN, Linear Regression, Balanced Random Forest, and Random Forest models. All models utilized median home sale price data as the target variable. On models that required training and testing, a 70/30 train/test split was utilized. As the data analyzed was time series data, the data was not shuffled to ensure model accuracy and consistency.

After compiling and running all models it was found that no models offered strong predictive capabilities. When analyzing the features importances of the Balanced Random Forest and Randome Forest models it was interesting to notice differences. The top two features for the Balanced Random Forest model were new listings and inventory, whereas the top two features for the Random Forest model were inventory and average sale to list. Images of model predictions vs. actual values, features importance graphs, and numerous other visual graphics can be found in the Images folder for reference.

One final interesting finding from analyzing Seattle city data was in the similarities and differences of the predictions of the LSTM, Linear Regression, Balanced Random Forest, and Random Forest models. As can be seen via the image below, three of the four models predicted similarly. The LSTM model predicted the most accurately, but all predictions hovered around $700K.

Neighborhood Analysis

After modeling the data using ARMA, ARIMA, LSTM, and Regression for Seattle city as a whole, there was no significant predicting capacity for home prices. So, the project scope was extended to model each neighborhood in the city separately. After doing a standard deviation on the median price data, five least and most volatile neighborhoods were picked to analyze.

The neighborhoods analyzed were:

Least Volatile:

Belltown
Broadway
International District
Pinehurst
Dunlap

Most Volatile:

Laurelhurst
Madison Park
Portage Bay
Seattle Central District
Denny Blaine

After doing a Balanced Random Forest features importance analysis, there were interesting deductions made:

Most important features for

Low Volatile neighborhoods:

Average Sale Price to List Price
Inventory

High Volatile neighborhoods:

Average Sale Price to List Price
Days on Market

Least important features for

Low Volatile neighborhoods:

Homes Sold
Days on Market

High Volatile neighborhoods:

New Listings
Inventory

The images relating to the model performance and correlations are presented in the images folder for reference.

Conclusions

Given the dataset, no real meanignful connections can be made, however we posit that a few extra datapoints may produce more meaningful relationships. For example, the average salary of neighborhood residents would likely result in stronger correlations for Median Sales Price of a home in any given neighborhood. In addition, fundamental details about houses in each neighborhood would be helpful to predict Median Sales Price, details such as room count, number of floors, floor area, year built, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
Images		Images
.gitignore		.gitignore
City_Clerk_Neighborhoods.geojson		City_Clerk_Neighborhoods.geojson
City_Clerk_Neighborhoods.kml		City_Clerk_Neighborhoods.kml
LICENSE		LICENSE
README.md		README.md
Seattle Home Price Predictions.pptx		Seattle Home Price Predictions.pptx
Zip_Codes.csv		Zip_Codes.csv
final_seattle_data.ipynb		final_seattle_data.ipynb
lstm.ipynb		lstm.ipynb
neighborhoods.ipynb		neighborhoods.ipynb
seattle-neighborhoods.csv		seattle-neighborhoods.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Property Predictions

Data Analysis on Redfin sales data to explore predictive possibilities

By Sravani, Lee, Roman, and Siege

Data Source and Objective

Discoveries

Seattle City Analysis

Neighborhood Analysis

Conclusions

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

CapraRoyale/property-predictions

Folders and files

Latest commit

History

Repository files navigation

Property Predictions

Data Analysis on Redfin sales data to explore predictive possibilities

By Sravani, Lee, Roman, and Siege

Data Source and Objective

Discoveries

Seattle City Analysis

Neighborhood Analysis

Conclusions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages