This project focuses on analyzing Amazon electronics products (laptops, smartphones, headphones) by scraping data from the Amazon website using Selenium.
The scraped data is stored directly in a MySQL database and also exported as a CSV file.
The final step involves connecting MySQL to Power BI to create an interactive dashboard for data visualization and insights.
- Python (for web scraping and data processing)
- Selenium (for automating Amazon website scraping)
- MySQL (for storing scraped data efficiently)
- Pandas (for data cleaning and CSV creation)
- Power BI (for visualization and dashboard creation)
Before running this project, ensure you have the following installed:
- Python (≥ 3.7)
- MySQL Server
- Power BI Desktop
- Required Python libraries:
pip install -r requirements.txt
- Extract product details such as name, price, ratings, discount, and category from Amazon
- Automate the process to retrieve data dynamically
- Establish a connection between Jupyter Notebook and MySQL
- Ensure that all scraped data is stored in a structured database format
- Clean and structure the data using Pandas
- Export the dataset into a CSV file for further analysis
- Connect MySQL to Power BI for data visualization
- Build an interactive dashboard with multiple insights and filters
- Average rating across all products is 3.93, indicating a generally positive response
- Total brands in the dataset are 514, showcasing a diverse market
- Median product price is 13K, meaning half the products are below this price
- Average discount percentage is 39%, suggesting frequent price reductions
- Most expensive products include MSI and Dell laptops, with the highest price reaching 0.50M
- Majority of products (438) have a "Very Good" rating, followed by 419 in the "Good" category
- Laptops have the highest average price (57K), while headphones are the cheapest (4K)
- Microsoft has the highest average product price (0.18M) among brands
- Rating vs. Discount scatter plot shows that discounts vary widely across ratings, with no strong correlation
- Ensure MySQL is set up and running
- Use the provided scripts to store and retrieve data
- Load the MySQL database into Power BI
- Explore the interactive dashboard
- Problem: Amazon loads product details dynamically, making direct scraping difficult
- Solution: Used Selenium with proper waits to ensure elements are fully loaded before extraction
- Problem: Managing large datasets efficiently in MySQL
- Solution: Implemented indexing and batch inserts to optimize performance
- Problem: Extracted raw data had inconsistencies
- Solution: Used Pandas to clean, format, and standardize data before storing/exporting
- Problem: Connecting MySQL to Power BI and handling real-time updates
- Solution: Created automated refresh settings and optimized queries for better performance
Contributions are always welcome!
To contribute:
- Fork the repository
- Create a new branch (feature-branch)
- Commit your changes
- Push the branch
- Open a Pull Request
For major changes, please open an issue first to discuss what you'd like to change.
Komal Gupta
LinkedIn
Feel free to contribute or reach out if you have any questions!