goodfirms-scraper

Description

This repository contains Python-based scrapers for extracting company data from GoodFirms. These scrapers leverage the Crawlbase Crawling API to handle JavaScript rendering, CAPTCHA challenges, and anti-bot protections. The extracted data provides valuable insights into various businesses, including company names, locations, ratings, services, and profile details.

➡ Read the full blog here to learn more.

Scrapers Overview

GoodFirms Search Listings Scraper

The GoodFirms Search Listings Scraper (goodfirms_serp_scraper.py) extracts structured company information from search listings, including:

Company Name
Location
Service Category
Rating
Company Profile URL

It supports pagination, ensuring that multiple pages of search results can be scraped efficiently. Extracted data is stored in a structured JSON file.

GoodFirms Company Profile Scraper

The GoodFirms Company Profile Scraper (goodfirms_company_page_scraper.py) extracts detailed company data from individual profile pages, including:

Company Name
Description
Hourly Rate
Number of Employees
Year Founded
Services Offered

It takes profile URLs from the search listings scraper and extracts detailed business information, saving the data in a JSON file.

Environment Setup

Ensure that Python is installed on your system. Check the version using:

# Use python3 if you're on Linux/macOS
python --version

Install the required dependencies:

pip install crawlbase beautifulsoup4

Crawlbase – Handles JavaScript rendering and bypasses bot protections.
BeautifulSoup – Parses and extracts structured data from HTML.

Running the Scrapers

Get Your Crawlbase Access Token
- Sign up for Crawlbase here to get an API token.
- Replace "YOUR_CRAWLBASE_TOKEN" in the script with your Crawlbase Token.
Run the Scraper

# Use python3 if required (for Linux/macOS)
python SCRAPER_FILE_NAME.py

Replace "SCRAPER_FILE_NAME.py" with the actual script name (goodfirms_serp_scraper.py or goodfirms_company_page_scraper.py).

To-Do List

Extend scrapers to extract additional company details like contact information and portfolios.
Optimize the scraping process for better performance.
Implement multi-threading for large-scale data extraction.

Why Use This Scraper?

Bypasses anti-bot protections using Crawlbase.
Handles JavaScript-rendered content efficiently.
Extracts structured company data for business analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
goodfirms_comapny_page_scraper.py		goodfirms_comapny_page_scraper.py
goodfirms_serp_scraper.py		goodfirms_serp_scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

goodfirms-scraper

Description

Scrapers Overview

GoodFirms Search Listings Scraper

GoodFirms Company Profile Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use This Scraper?

About

Uh oh!

Releases

Languages

ScraperHub/goodfirms-scraper

Folders and files

Latest commit

History

Repository files navigation

goodfirms-scraper

Description

Scrapers Overview

GoodFirms Search Listings Scraper

GoodFirms Company Profile Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use This Scraper?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages