A FastAPI project providing a search and crawl API, with optional content extraction using LLMs. Simply provide a search query, and it automatically searches and crawls websites. If desired, you can also extract structured content from the crawled pages using your custom instructions with LLMs.
-
Search, Crawl, and Extract in a Single Step Perform search queries, crawl resulting websites, and extract content using custom instructions with LLMs—all in one request. You can specify the format for passing crawled results to the LLM. By default, the entire page content is provided in Markdown format.
-
Undetected search Powered by SearXNG for stealthy, meta search.
-
Undetected crawl Powered by Patchright for stealthy web crawling, with support for JavaScript-rendered content.
-
Flexible crawl scope Follow pagination links, internal links, or all links based on configuration. Supports multi-page crawling with configurable depth, page limits, and concurrency.
-
Cache system Stores search and crawl results persistently with a 24-hour default TTL, preventing frequent requests from triggering IP bans. Cache settings are configurable.
-
OpenAPI support Provides an OpenAPI specification. This means you can automatically generate API clients in many languages (e.g., Python, TypeScript, Java) using tools like
openapi-generator-cli
. -
Prebuilt Python client A ready-to-use Python API client is included.
/search
: Search for websites by query./search-crawl
: Combine search and crawl functionality./search-crawl-extract
: Search, crawl, and extract structured data in one step.
/crawl
: Crawl a website using a crawl request./crawl-many
: Crawl multiple websites concurrently using a crawl many request./crawl-extract
: Crawl and immediately extract structured data.
Create a compose.yaml
in your project. Remote include requires Docker Compose >= v2.21.0:
# compose.yaml
include:
- https://github.com/CatBraaain/search-crawl.git
Alternative: Traditional way or older Docker Compose
git clone https://github.com/CatBraaain/search-crawl
cd search-crawl
Set environment variables for extract function:
# .env
LLM_MODEL="xxxxxxxxxx"
LLM_API_KEY="xxxxxxxxxx"
The model name should follow the LiteLLM documentation Examples: "openai/gpt-5", "gemini/gemini-2.5-pro", "anthropic/claude-4", "deepseek/deepseek-chat"
Run the service:
docker compose up --wait
If Docker Compose version < v2.34.0
SET COMPOSE_EXPERIMENTAL_GIT_REMOTE=True
docker compose up --wait
docker compose up --wait
# Linux / macOS
curl http://localhost:8000/search --json '{"q":"hello world"}'
# Windows (PowerShell)
curl http://localhost:8000/search --json "{\"q\":\"hello world\"}"
Install the Python client:
uv init
uv add git+https://github.com/CatBraaain/search-crawl.git#subdirectory=search_crawl_client
Run examples from the examples directory
uv run examples/search_crawl.py
Expected output:
URL: https://en.wikipedia.org/wiki/%22Hello,_World!%22_program
TITLE: "Hello, World!" program - Wikipedia
MARKDOWN:
Traditional first example of a computer programming language
A **"Hello, World!" program** is usually a simple [computer program](/wiki/Computer_program "Computer program") that emits (or displays) t...
uv run examples/search_crawl_extract.py
Expected output:
population=8005176000 source_url='https://worldpopulationreview.com'
After starting the service, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc