This repository contains a Python-based scraper that extracts data from Google Search Results (SERP) using the Crawlbase Crawling API. The scraper uses the Google SERP Scraper provided by Crawlbase, which bypasses CAPTCHA challenges, anti-bot protections, and handles JavaScript-rendered content seamlessly.
The extracted data is parsed and saved in JSON format, including relevant information such as search result titles, URLs, snippets, related searches, and ads.
➡ For detailed instructions, visit the full blog here.
The scraper extracts the following information from Google Search Results:
- Search Results – Includes the position, title, URL, and description for each result.
- Related Searches – Suggestions for related searches on Google.
- Ads – Paid advertisements appearing on the results page (if available).
- People Also Ask – FAQs related to the search query (if available).
- Snack Pack – Local business listings or other special results (if available).
The scraper also handles pagination and saves the results in a clean JSON file for easy further processing.
-
Python 3.x (check your version with python --version)
-
Required libraries:
- crawlbase – For handling JavaScript-rendered content and bypassing CAPTCHAs
- json – For parsing and saving data in JSON format.
To install the required libraries, run:
pip install crawlbase
- Get Your Crawlbase Access Token
- Sign up for Crawlbase here to obtain your API token.
- Use the JavaScript (JS) token since Google uses JavaScript-rendered content.
- Update the Scraper with Your Token
- In the
google_serp_scraper.py
script, replace"YOUR_CRAWLBASE_TOKEN"
with your Crawlbase API token.
- Clone this repository or download the script.
- Open a terminal and navigate to the folder containing the script.
- Run the scraper:
python google_serp_scraper.py
This will scrape the Google Search results for the query specified in the script (e.g., "web scraping tools"
) and save the results in a JSON file (google_search_results.json
).
You can customize the scraper by changing the search query and adjusting the maximum number of pages to scrape.
- Change the query by modifying the query variable in the script.
- Adjust the number of pages to scrape by modifying the max_pages variable.
- Handle more complex Google result types (e.g., images, videos).
- Implement better error handling for failed requests.
- Add support for saving data in additional formats like CSV or databases.
- Bypasses CAPTCHAs and anti-bot protections with Crawlbase.
- Handles JavaScript-rendered content seamlessly.
- Extracts structured Google Search results efficiently.
- Supports easy pagination for scraping multiple pages.