This repository contains Python-based scrapers for extracting product listings and detailed product information from Costco. These scrapers leverage the Crawlbase Crawling API to handle JavaScript rendering, CAPTCHA challenges, and anti-bot protections. The extracted data is processed using BeautifulSoup for HTML parsing and Pandas for structured storage.
➡ Read the full blog here to learn more.
The Costco Product Listing Scraper (costco_listing_scraper.py) extracts:
- Product Title
- Price
- Product URL
- Rating
- Thumbnail Image
The scraper supports pagination, ensuring comprehensive data extraction. The extracted data is saved in a JSON file.
The Costco Product Detail Scraper (costco_product_scraper.py) extracts detailed product information, including:
- Product Title
- Full Description
- Price
- Specifications
- Rating
- Image URL
The extracted data is saved in a JSON file.
Ensure that Python is installed on your system. Check the version using:
# Use python3 if required (for Linux/macOS)
python --versionNext, install the required dependencies:
pip install crawlbase beautifulsoup4- Crawlbase – Handles JavaScript rendering and bypasses bot protections.
- BeautifulSoup – Parses and extracts structured data from HTML.
- Sign up for Crawlbase here to get an API token.
- Use the JS token for Costco scraping, as the site uses JavaScript-rendered content.
Replace "CRAWLBASE_JS_TOKEN" in the script with your Crawlbase JS Token.
Run the Scraper
# For product listing scraping
python costco_listing_scraper.py
# For product detail scraping
python costco_product_scraper.pyThe scraped data will be saved in costco_product_listings.json or costco_product_details.json, depending on the script used.
- Expand scrapers to extract additional product details like discounted prices and available coupons.
- Optimize data storage and add support for CSV and database integration.
- Implement asynchronous requests to speed up data extraction.
- Enhance scraper efficiency with Crawlbase Smart Proxy to prevent blocks.
- Automate scheduled scraping for real-time price monitoring and product tracking.
- ✔ Bypasses anti-bot protections with Crawlbase.
- ✔ Handles JavaScript-rendered content seamlessly.
- ✔ Extracts accurate and structured product data efficiently.