SearchCrawl

A FastAPI project providing a search and crawl API, with optional content extraction using LLMs. Simply provide a search query, and it automatically searches and crawls websites. If desired, you can also extract structured content from the crawled pages using your custom instructions with LLMs.

Features

Search, Crawl, and Extract in a Single Step Perform search queries, crawl resulting websites, and extract content using custom instructions with LLMs—all in one request. You can specify the format for passing crawled results to the LLM. By default, the entire page content is provided in Markdown format.
Undetected search Powered by SearXNG for stealthy, meta search.
Undetected crawl Powered by Patchright for stealthy web crawling, with support for JavaScript-rendered content.
Flexible crawl scope Follow pagination links, internal links, or all links based on configuration. Supports multi-page crawling with configurable depth, page limits, and concurrency.
Cache system Stores search and crawl results persistently with a 24-hour default TTL, preventing frequent requests from triggering IP bans. Cache settings are configurable.
OpenAPI support Provides an OpenAPI specification. This means you can automatically generate API clients in many languages (e.g., Python, TypeScript, Java) using tools like openapi-generator-cli.
Prebuilt Python client A ready-to-use Python API client is included.

API Endpoints

Search API

/search: Search for websites by query.
/search-crawl: Combine search and crawl functionality.
/search-crawl-extract: Search, crawl, and extract structured data in one step.

Crawl API

/crawl: Crawl a website using a crawl request.
/crawl-many: Crawl multiple websites concurrently using a crawl many request.
/crawl-extract: Crawl and immediately extract structured data.

Getting Started

1: Prepare compose.yaml

Create a compose.yaml in your project. Remote include requires Docker Compose >= v2.21.0:

# compose.yaml
include:
  - https://github.com/CatBraaain/search-crawl.git

Alternative: Traditional way or older Docker Compose

git clone https://github.com/CatBraaain/search-crawl
cd search-crawl

2: Prepare .env

Set environment variables for extract function:

# .env
LLM_MODEL="xxxxxxxxxx"
LLM_API_KEY="xxxxxxxxxx"

The model name should follow the LiteLLM documentation Examples: "openai/gpt-5", "gemini/gemini-2.5-pro", "anthropic/claude-4", "deepseek/deepseek-chat"

3: Run Server

Run the service:

docker compose up --wait

If Docker Compose version < v2.34.0

SET COMPOSE_EXPERIMENTAL_GIT_REMOTE=True
docker compose up --wait

Test the API

Request via curl

docker compose up --wait
# Linux / macOS
curl http://localhost:8000/search --json '{"q":"hello world"}'
# Windows (PowerShell)
curl http://localhost:8000/search --json "{\"q\":\"hello world\"}"

Request via Python SDK

Install the Python client:

uv init
uv add git+https://github.com/CatBraaain/search-crawl.git#subdirectory=search_crawl_client

Run examples from the examples directory

Search + Crawl:

uv run examples/search_crawl.py

Expected output:

URL: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
TITLE: "Hello, World!" program - Wikipedia
MARKDOWN:
Traditional first example of a computer programming language
A **"Hello, World!" program** is usually a simple [computer program](/wiki/Computer_program "Computer program") that emits (or displays) t...

Search + Crawl + Extract:

uv run examples/search_crawl_extract.py

Expected output:

population=8005176000 source_url='https://worldpopulationreview.com'

OpenAPI Document

After starting the service, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
.github/workflows		.github/workflows
examples		examples
search_crawl_client		search_crawl_client
searxng		searxng
src/search_crawl		src/search_crawl
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml
justfile		justfile
lefthook.yaml		lefthook.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SearchCrawl

Features

API Endpoints

Search API

Crawl API

Getting Started

1: Prepare compose.yaml

2: Prepare .env

3: Run Server

Test the API

Request via curl

Request via Python SDK

Search + Crawl:

Search + Crawl + Extract:

OpenAPI Document

About

Uh oh!

Releases

Packages

Languages

CatBraaain/search-crawl

Folders and files

Latest commit

History

Repository files navigation

SearchCrawl

Features

API Endpoints

Search API

Crawl API

Getting Started

1: Prepare compose.yaml

2: Prepare .env

3: Run Server

Test the API

Request via curl

Request via Python SDK

Search + Crawl:

Search + Crawl + Extract:

OpenAPI Document

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages