The KingHouse project is an end-to-end data analysis project focused on understanding house sales in King County, Washington. The project demonstrates the use of SQL for data cleaning and analysis, as well as Tableau for interactive data visualization. The insights gained from this project can assist real estate agencies, investors, and homebuyers in making informed decisions.
You can interact with the dashboard here.
- Analyze house sales data to uncover trends and insights.
- Utilize SQL for data cleaning, transformation, and exploratory data analysis (EDA).
- Create an interactive Tableau dashboard to visualize and communicate findings effectively.
The dataset used in this project contains information on house sales, including details about price, location, size, condition, and additional features such as waterfront views.
- id: Unique identifier for each property.
- date: Date of the sale.
- price: Sale price of the property.
- bedrooms: Number of bedrooms.
- bathrooms: Number of bathrooms.
- sqft_living: Square footage of living space.
- sqft_lot: Square footage of the lot.
- floors: Number of floors in the property.
- waterfront: Indicates if the property has a waterfront view.
- view: Quality of the view.
- condition: Overall condition of the house.
- grade: Overall grade given to the house based on its design and construction.
- yr_built: Year the house was built.
- zipcode: Zip code of the property.
- lat: Latitude coordinate of the property.
- long: Longitude coordinate of the property.
Below is the SQL code used to create the HouseData
table:
-- DDL Script
CREATE TABLE HouseData (
id BIGINT PRIMARY KEY,
date DATE,
price DOUBLE PRECISION,
bedrooms INTEGER,
bathrooms DOUBLE PRECISION,
sqft_living INTEGER,
sqft_lot INTEGER,
floors DOUBLE PRECISION,
waterfront VARCHAR(50),
view VARCHAR(50),
condition VARCHAR(50),
grade INTEGER,
yr_built INTEGER,
zipcode INTEGER,
lat DOUBLE PRECISION,
long DOUBLE PRECISION
);
Below is the SQL code used to insert data into the HouseData
table:
-- DML Script
INSERT INTO HouseData (id, date, price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view, condition, grade, yr_built, zipcode, lat, long)
VALUES
-- Add your data rows here.
(...);
- Daily Average House Sale Price:
- Displays the trend of average house prices over time.
- Distribution of House Pricing:
- Visualizes the frequency of house prices in different price ranges.
- Map Visualization:
- Highlights price variations across King County by zip code.
- View vs. Condition Analysis:
- Explores how house conditions and views affect pricing.
- Price Trends: Average house prices fluctuate daily but show consistent patterns over time.
- Geographical Insights: Certain zip codes consistently show higher property prices, indicating premium areas.
- Impact of Features:
- Waterfront properties command significantly higher prices.
- Better condition and higher grades correlate with increased pricing.
- SQL: For data cleaning, transformation, and exploratory analysis.
- Tableau: For interactive visualization and dashboard creation.
- PostgreSQL: For database management.
- Database Setup:
- Use the provided DDL script to create the
HouseData
table in your database. - Use the DML script to insert the dataset into the table.
- Use the provided DDL script to create the
- SQL Queries:
- Run SQL queries to extract insights and prepare the data for visualization.
- Tableau Dashboard:
- Use Tableau to create interactive visualizations using the exported query results.
- Incorporate machine learning models to predict house prices.
- Analyze additional datasets, such as economic indicators or demographics, to enrich insights.
- Automate data cleaning and visualization processes.
Mina Wahba
Feel free to explore the project and provide feedback!