crawlio-js is a Node.js SDK for interacting with the Crawlio web scraping and crawling API. It provides programmatic access to scraping, crawling, and batch processing endpoints with built-in error handling.
npm install crawlio.js
import { Crawlio } from 'crawlio.js'
const client = new Crawlio({ apiKey: 'your-api-key' })
const result = await client.scrape({ url: 'https://example.com' })
console.log(result.html)
Creates a new Crawlio client.
Options:
Name | Type | Required | Description |
---|---|---|---|
apiKey | string |
✅ | Your Crawlio API key |
baseUrl | string |
❌ | API base URL (default: https://crawlio.xyz ) |
Scrapes a single page.
await client.scrape({ url: 'https://example.com' })
ScrapeOptions:
Name | Type | Required | Description |
---|---|---|---|
url | string |
✅ | Target URL |
exclude | string[] |
✅ | CSS selectors to exclude |
includeOnly | string[] |
❌ | CSS selectors to include |
markdown | boolean |
❌ | Convert HTML to Markdown |
returnUrls | boolean |
❌ | Return all discovered URLs |
workflow | Workflow[] |
❌ | Custom workflow steps to execute |
normalizeBase64 | boolean |
❌ | Normalize base64 content |
cookies | CookiesInfo[] |
❌ | Cookies to include in the request |
userAgent | string |
❌ | Custom User-Agent header for the request |
Initiates a site-wide crawl.
CrawlOptions:
Name | Type | Required | Description |
---|---|---|---|
url | string |
✅ | Root URL to crawl |
count | number |
✅ | Number of pages to crawl |
sameSite | boolean |
❌ | Limit crawl to same domain |
patterns | string[] |
❌ | URL patterns to match |
exclude | string[] |
❌ | CSS selectors to exclude |
includeOnly | string[] |
❌ | CSS selectors to include |
Checks the status of a crawl job.
Gets results from a completed crawl.
Performs a search on scraped content.
SearchOptions:
Name | Type | Description |
---|---|---|
site | string |
Limit search to a specific domain |
Initiates scraping for multiple URLs in one request.
BatchScrapeOptions:
Name | Type | Description |
---|---|---|
url | string[] |
List of URLs |
options | Omit<ScrapeOptions, 'url'> |
Common options for all URLs |
Checks the status of a batch scrape job.
Fetches results from a completed batch scrape.
All Crawlio errors extend from CrawlioError
. You can catch and inspect these for more context.
CrawlioError
CrawlioRateLimit
CrawlioLimitExceeded
CrawlioAuthenticationError
CrawlioInternalServerError
CrawlioFailureError
{
jobId: string
html: string
markdown: string
meta: Record<string, string>
urls?: string[]
url: string
}
{
id: string
status: 'IN_QUEUE' | 'RUNNING' | 'LIMIT_EXCEEDED' | 'ERROR' | 'SUCCESS'
error: number
success: number
total: number
}
{
name: string
value: string
path: string
expires?: number
httpOnly: boolean
secure: boolean
domain: string
sameSite: 'Strict' | 'Lax' | 'None'
}