Skip to content

Goodfirms.com Search Listing and Company Page Scraper. To handle JS rendering and CAPTCHAs, we are using Crawlbase Crawling API.

Notifications You must be signed in to change notification settings

ScraperHub/goodfirms-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

goodfirms-scraper

Description

This repository contains Python-based scrapers for extracting company data from GoodFirms. These scrapers leverage the Crawlbase Crawling API to handle JavaScript rendering, CAPTCHA challenges, and anti-bot protections. The extracted data provides valuable insights into various businesses, including company names, locations, ratings, services, and profile details.

➡ Read the full blog here to learn more.

Scrapers Overview

GoodFirms Search Listings Scraper

The GoodFirms Search Listings Scraper (goodfirms_serp_scraper.py) extracts structured company information from search listings, including:

  1. Company Name
  2. Location
  3. Service Category
  4. Rating
  5. Company Profile URL

It supports pagination, ensuring that multiple pages of search results can be scraped efficiently. Extracted data is stored in a structured JSON file.

GoodFirms Company Profile Scraper

The GoodFirms Company Profile Scraper (goodfirms_company_page_scraper.py) extracts detailed company data from individual profile pages, including:

  1. Company Name
  2. Description
  3. Hourly Rate
  4. Number of Employees
  5. Year Founded
  6. Services Offered

It takes profile URLs from the search listings scraper and extracts detailed business information, saving the data in a JSON file.

Environment Setup

Ensure that Python is installed on your system. Check the version using:

# Use python3 if you're on Linux/macOS
python --version

Install the required dependencies:

pip install crawlbase beautifulsoup4
  • Crawlbase – Handles JavaScript rendering and bypasses bot protections.
  • BeautifulSoup – Parses and extracts structured data from HTML.

Running the Scrapers

  1. Get Your Crawlbase Access Token

    • Sign up for Crawlbase here to get an API token.
    • Replace "YOUR_CRAWLBASE_TOKEN" in the script with your Crawlbase Token.
  2. Run the Scraper

# Use python3 if required (for Linux/macOS)
python SCRAPER_FILE_NAME.py

Replace "SCRAPER_FILE_NAME.py" with the actual script name (goodfirms_serp_scraper.py or goodfirms_company_page_scraper.py).

To-Do List

  • Extend scrapers to extract additional company details like contact information and portfolios.
  • Optimize the scraping process for better performance.
  • Implement multi-threading for large-scale data extraction.

Why Use This Scraper?

  • Bypasses anti-bot protections using Crawlbase.
  • Handles JavaScript-rendered content efficiently.
  • Extracts structured company data for business analysis.

About

Goodfirms.com Search Listing and Company Page Scraper. To handle JS rendering and CAPTCHAs, we are using Crawlbase Crawling API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Languages