This Python script scrapes product data from various Amazon India Best Sellers categories like Grocery, Electronics, Beauty, Health & Personal Care, and Baby Products. The script fetches product details such as rank, name, rating, price, and more, and outputs the data as a CSV file with the current date in the filename.
β
Scrapes from multiple Best Seller categories
β
Navigates nested subcategories recursively
β
Collects:
- Product Rank
- Product ID
- Product Name
- Rating
- Number of People Rated
- Price
- Product Image URL
- Full Category Hierarchy
β Exports to CSV (dated)
A CSV file named like:
With columns:
Category | Sub_Category_1 | Rank | Name | Rating | People | Prize | Image_Link |
---|
- Initial URLs: It begins with hardcoded Best Seller URLs for 5 categories from Amazon India.
- HTML Parsing: Uses
BeautifulSoup
to parse HTML. - AJAX Handling: Uses a combination of
GET
andPOST
to extract dynamic content using ACP path logic. - Category Tree: Recursively walks through category trees using
role="group"
androle="treeitem"
. - Data Cleaning: Handles edge cases, fills missing ranks, removes duplicates.
- Output: Final dataset is concatenated and exported as a dated CSV.