Skip to content

A data analytics project that scrapes Amazon electronics data (laptops, smartphones, headphones) using Selenium, stores it in MySQL, and creates interactive visualizations through Power BI dashboard."

Notifications You must be signed in to change notification settings

ikomalgupta/amazon-product-analytics

Repository files navigation

Amazon Product Analysis

Project Overview

This project focuses on analyzing Amazon electronics products (laptops, smartphones, headphones) by scraping data from the Amazon website using Selenium.
The scraped data is stored directly in a MySQL database and also exported as a CSV file.
The final step involves connecting MySQL to Power BI to create an interactive dashboard for data visualization and insights.

Technologies Used

  • Python (for web scraping and data processing)
  • Selenium (for automating Amazon website scraping)
  • MySQL (for storing scraped data efficiently)
  • Pandas (for data cleaning and CSV creation)
  • Power BI (for visualization and dashboard creation)

Prerequisites

Before running this project, ensure you have the following installed:

  • Python (≥ 3.7)
  • MySQL Server
  • Power BI Desktop
  • Required Python libraries:
pip install -r requirements.txt

Project Workflow

Step 1: Web Scraping with Selenium

  • Extract product details such as name, price, ratings, discount, and category from Amazon
  • Automate the process to retrieve data dynamically

Step 2: Data Storage in MySQL

  • Establish a connection between Jupyter Notebook and MySQL
  • Ensure that all scraped data is stored in a structured database format

Step 3: CSV File Generation

  • Clean and structure the data using Pandas
  • Export the dataset into a CSV file for further analysis

CSV File

Scraped_Data

Step 4: Power BI Dashboard Creation

  • Connect MySQL to Power BI for data visualization
  • Build an interactive dashboard with multiple insights and filters

Project Dashboard

Power BI Dashboard

Amazon Dashboard

Key Insights from Analysis

  • Average rating across all products is 3.93, indicating a generally positive response
  • Total brands in the dataset are 514, showcasing a diverse market
  • Median product price is 13K, meaning half the products are below this price
  • Average discount percentage is 39%, suggesting frequent price reductions
  • Most expensive products include MSI and Dell laptops, with the highest price reaching 0.50M
  • Majority of products (438) have a "Very Good" rating, followed by 419 in the "Good" category
  • Laptops have the highest average price (57K), while headphones are the cheapest (4K)
  • Microsoft has the highest average product price (0.18M) among brands
  • Rating vs. Discount scatter plot shows that discounts vary widely across ratings, with no strong correlation

Connect MySQL and Import Data

  • Ensure MySQL is set up and running
  • Use the provided scripts to store and retrieve data

Open Power BI and Load the Database

  • Load the MySQL database into Power BI
  • Explore the interactive dashboard

Challenges & Solutions

1. Challenge: Handling Amazon's Dynamic Content

  • Problem: Amazon loads product details dynamically, making direct scraping difficult
  • Solution: Used Selenium with proper waits to ensure elements are fully loaded before extraction

2. Challenge: Data Storage & Handling Large Volumes

  • Problem: Managing large datasets efficiently in MySQL
  • Solution: Implemented indexing and batch inserts to optimize performance

3. Challenge: Cleaning & Structuring Data

  • Problem: Extracted raw data had inconsistencies
  • Solution: Used Pandas to clean, format, and standardize data before storing/exporting

4. Challenge: Power BI Integration with MySQL

  • Problem: Connecting MySQL to Power BI and handling real-time updates
  • Solution: Created automated refresh settings and optimized queries for better performance

Contribution

Contributions are always welcome!
To contribute:

  1. Fork the repository
  2. Create a new branch (feature-branch)
  3. Commit your changes
  4. Push the branch
  5. Open a Pull Request

For major changes, please open an issue first to discuss what you'd like to change.

Contact

Komal Gupta
LinkedIn

Feel free to contribute or reach out if you have any questions!

About

A data analytics project that scrapes Amazon electronics data (laptops, smartphones, headphones) using Selenium, stores it in MySQL, and creates interactive visualizations through Power BI dashboard."

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published