Documentation | Python SDK | JavaScript SDK | Playground
Turn any website into LLM-ready clean data.
Scrapester is a powerful web scraping tool that converts website content into clean, markdown-formatted data perfect for LLM processing. With support for both single-page scraping and full website crawling, Scrapester makes it easy to gather web content in a structured, consistent format.
- π Smart Content Extraction: Automatically removes noise and extracts meaningful content
- π Markdown Output: Clean, structured content perfect for LLMs
- π·οΈ Website Crawling: Scrape entire websites with configurable depth and limits
- π Multiple SDKs: Official Python and JavaScript support
- β‘ High Performance: Built for speed and reliability
- π‘οΈ Error Handling: Robust error handling and rate limiting protection
pip install scrapester
npm install scrapester
# or
yarn add scrapester
from scrapester import ScrapesterApp
# Initialize the client
app = ScrapesterApp(api_key="your-api-key")
# Scrape a single page
result = app.scrape("https://example.com")
print(result.markdown)
# Crawl an entire website
results = app.crawl(
"https://example.com",
options={
"max_pages": 10,
"max_depth": 2
}
)
import { ScrapesterApp } from 'scrapester';
// Initialize the client
const app = new ScrapesterApp('your-api-key');
// Scrape a single page
const result = await app.scrape('https://example.com');
console.log(result.markdown);
// Crawl an entire website
const results = await app.crawl('https://example.com', {
maxPages: 10,
maxDepth: 2
});
Scrapester returns clean, structured data in the following format:
interface CrawlerResponse {
url: string; // The scraped URL
markdown: string; // Clean, markdown-formatted content
metadata: { // Page metadata
title: string,
description: string,
// ... other meta tags
};
timestamp: string; // ISO timestamp of when the page was scraped
}
new ScrapesterApp(
apiKey: string,
baseUrl?: string, // default: "http://localhost:8000"
timeout?: number // default: 600 seconds
)
Scrapes a single URL and returns clean, markdown-formatted content.
Crawls a website starting from the given URL. Options include:
maxPages
: Maximum number of pages to crawlmaxDepth
: Maximum crawl depthincludePatterns
: URL patterns to includeexcludePatterns
: URL patterns to exclude
Scrapester provides detailed error information through the APIError
class:
class APIError extends Error {
statusCode?: number;
response?: object;
}
Common error scenarios:
429
: Rate limit exceeded400
: Invalid request401
: Invalid API key500
: Server error
# Python
pytest tests/
# JavaScript
npm test
# Python
pip install -e ".[dev]"
# JavaScript
npm install
npm run build
We welcome contributions! Please see our Contributing Guidelines for details.
- π Documentation
- π¬ Discord Community
- π§ Email Support
This project is licensed under the MIT License - see the LICENSE file for details.