Scrapester

Documentation | Python SDK | JavaScript SDK | Playground

Turn any website into LLM-ready clean data.

Overview

Scrapester is a powerful web scraping tool that converts website content into clean, markdown-formatted data perfect for LLM processing. With support for both single-page scraping and full website crawling, Scrapester makes it easy to gather web content in a structured, consistent format.

Features

🔍 Smart Content Extraction: Automatically removes noise and extracts meaningful content
📝 Markdown Output: Clean, structured content perfect for LLMs
🕷️ Website Crawling: Scrape entire websites with configurable depth and limits
🚀 Multiple SDKs: Official Python and JavaScript support
⚡ High Performance: Built for speed and reliability
🛡️ Error Handling: Robust error handling and rate limiting protection

Installation

Python

pip install scrapester

JavaScript/TypeScript

npm install scrapester
# or
yarn add scrapester

Quick Start

Python

from scrapester import ScrapesterApp

# Initialize the client
app = ScrapesterApp(api_key="your-api-key")

# Scrape a single page
result = app.scrape("https://example.com")
print(result.markdown)

# Crawl an entire website
results = app.crawl(
    "https://example.com",
    options={
        "max_pages": 10,
        "max_depth": 2
    }
)

JavaScript/TypeScript

import { ScrapesterApp } from 'scrapester';

// Initialize the client
const app = new ScrapesterApp('your-api-key');

// Scrape a single page
const result = await app.scrape('https://example.com');
console.log(result.markdown);

// Crawl an entire website
const results = await app.crawl('https://example.com', {
    maxPages: 10,
    maxDepth: 2
});

Response Format

Scrapester returns clean, structured data in the following format:

interface CrawlerResponse {
    url: string;          // The scraped URL
    markdown: string;     // Clean, markdown-formatted content
    metadata: {          // Page metadata
        title: string,
        description: string,
        // ... other meta tags
    };
    timestamp: string;   // ISO timestamp of when the page was scraped
}

API Reference

ScrapesterApp

Constructor

new ScrapesterApp(
    apiKey: string,
    baseUrl?: string,    // default: "http://localhost:8000"
    timeout?: number     // default: 600 seconds
)

Methods

scrape(url: string)

Scrapes a single URL and returns clean, markdown-formatted content.

crawl(url: string, options?)

Crawls a website starting from the given URL. Options include:

maxPages: Maximum number of pages to crawl
maxDepth: Maximum crawl depth
includePatterns: URL patterns to include
excludePatterns: URL patterns to exclude

Error Handling

Scrapester provides detailed error information through the APIError class:

class APIError extends Error {
    statusCode?: number;
    response?: object;
}

Common error scenarios:

429: Rate limit exceeded
400: Invalid request
401: Invalid API key
500: Server error

Development

Running Tests

# Python
pytest tests/

# JavaScript
npm test

Building from Source

# Python
pip install -e ".[dev]"

# JavaScript
npm install
npm run build

Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Support

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
apps		apps
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrapester

Overview

Features

Installation

Python

JavaScript/TypeScript

Quick Start

Python

JavaScript/TypeScript

Response Format

API Reference

ScrapesterApp

Constructor

Methods

scrape(url: string)

crawl(url: string, options?)

Error Handling

Development

Running Tests

Building from Source

Contributing

Support

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Bugsterapp/scrapester

Folders and files

Latest commit

History

Repository files navigation

Scrapester

Overview

Features

Installation

Python

JavaScript/TypeScript

Quick Start

Python

JavaScript/TypeScript

Response Format

API Reference

ScrapesterApp

Constructor

Methods

scrape(url: string)

crawl(url: string, options?)

Error Handling

Development

Running Tests

Building from Source

Contributing

Support

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages