This project is a web-based application built with Dash to analyze and visualize Yelp business and review data. The app integrates AWS Athena for querying large datasets stored in S3 and provides interactive data visualizations using Plotly.
- Integration with AWS Athena: Query large-scale Yelp datasets efficiently.
- Interactive Dash Visualizations:
- Distribution of star ratings across businesses.
- Analysis of popular business categories.
- Visualization of the most reviewed cities.
- User review distribution insights.
- Geospatial Visualization: (Optional) Explore data points on an interactive world map.
root/ ├── app.py # Main application logic ├── requirements.txt └── README.md # Project documentation
To run this application, you need the following:
- AWS Credentials: Configured with permissions to query Athena and access S3.
- Athena Database and Tables: Ensure
yelp-db
is set up withyelp_academic_dataset_business
andyelp_academic_dataset_review
tables. - Python Environment: Python 3.8 or later with required libraries (see
requirements.txt
).
- Clone the repository:
git clone https://github.com/your-username/yelp-analysis-app.git cd yelp-analysis-app
-
pip install -r requirements.txt
-
AWS_ACCESS_KEY_ID=<your-access-key> AWS_SECRET_ACCESS_KEY=<your-secret-key> REGION_NAME=us-west-1 S3_BUCKET_NAME=cityu-cs519-tp
-
python app.py
Star Rating Distribution: Displays the distribution of star ratings across businesses.
Popular Business Categories: Highlights the top 20 business categories by frequency.
Most Reviewed Cities: Bar chart showing cities with the most reviews.
User Review Insights:
KDE plot of user review activity. Cumulative distribution of user reviews.