Crawlio is a Python SDK for accessing the Crawlio API — a powerful service for web scraping, crawling, and content analysis. It supports single-page scraping, full-site crawling, batch operations, and structured search across results.
👉 Visit Crawlio | 📚 View API Docs
pip install crawlio-py
from crawlio.client import Crawlio
from crawlio.types import ScrapeOptions
client = Crawlio(api_key="your-api-key")
options: ScrapeOptions = {
"url": "https://example.com",
"markdown": True
}
result = client.scrape(options)
print(result["markdown"])
You must pass your Crawlio api_key
when instantiating the client:
from crawlio.client import Crawlio
client = Crawlio(api_key="your_api_key")
Scrape a single webpage.
from crawlio.types import ScrapeOptions
client.scrape({
"url": "https://example.com",
"exclude": ["nav", "footer"],
"markdown": True
})
Start a full-site crawl.
from crawlio.types import CrawlOptions
client.crawl({
"url": "https://example.com",
"count": 10,
"sameSite": True
})
Check the status of a crawl job.
client.crawl_status("crawl123")
Get results from a completed crawl.
client.crawl_results("crawl123")
Search through previously scraped content.
client.search("privacy policy", {"site": "example.com"})
Submit multiple URLs for scraping at once.
client.batch_scrape({
"url": ["https://a.com", "https://b.com"],
"options": {"markdown": True}
})
Check the status of a batch scrape.
client.batch_scrape_status("batch456")
Retrieve results of a completed batch scrape.
client.batch_scrape_result("batch456")
All exceptions inherit from CrawlioError
.
Exception Class | Description |
---|---|
CrawlioError |
Base error class |
CrawlioRateLimit |
Too many requests |
CrawlioLimitExceeded |
API usage limit exceeded |
CrawlioAuthenticationError |
Invalid or missing API key |
CrawlioInternalServerError |
Server error |
CrawlioFailureError |
Other client or server failure |
Example:
from crawlio.exception import CrawlioError
try:
result = client.scrape({"url": "https://example.com"})
except CrawlioError as e:
print(f"Error: {e}, Details: {e.response}")
{
"jobId": "abc123",
"html": "<html>...</html>",
"markdown": "## Title",
"meta": { "title": "Example" },
"urls": ["https://example.com/about"],
"url": "https://example.com"
}
MIT License