Ferret CLI is a command-line interface for the Ferret web scraping system. Ferret uses its own query language called FQL (Ferret Query Language) - a SQL-like language designed specifically for web scraping, browser automation, and data extraction tasks.
- About Ferret CLI
- What is FQL?
- Key Features
- Installation
- Quick Start
- Options
- Configuration
- Browser Management
- Advanced Usage
- Examples
- Troubleshooting
- Development
- Contributors
FQL (Ferret Query Language) is a declarative language that combines the familiar syntax of SQL with powerful web automation capabilities. It allows you to:
- Navigate web pages and interact with elements
- Extract data from HTML documents
- Handle dynamic content and JavaScript-heavy sites
- Manage browser sessions and cookies
- Perform complex data transformations
- Execute parallel scraping operations
- ๐ Fast and Efficient: Built-in concurrency and optimized execution
- ๐ Browser Automation: Full Chrome/Chromium browser control
- ๐ Dynamic Content: Handle SPAs and JavaScript-heavy sites
- ๐ Data Processing: Built-in functions for data manipulation
- ๐ ๏ธ Flexible Runtime: Run locally or on remote workers
- ๐พ Session Management: Persistent cookies and browser state
- ๐ง Configuration: Extensive customization options
Documentation is available at our website.
You can download the latest binaries from here.
go install github.com/MontFerret/cli/ferret@latest
curl https://raw.githubusercontent.com/MontFerret/cli/master/install.sh | sh
The simplest way to get started is with the interactive REPL:
ferret exec
Welcome to Ferret REPL
Please use `exit` or `Ctrl-D` to exit this program.
>>> RETURN "Hello, Ferret!"
"Hello, Ferret!"
Create a simple script (example.fql
) to scrape a webpage:
// Navigate to a website and extract data
LET page = DOCUMENT("https://news.ycombinator.com")
LET items = (
FOR item IN ELEMENTS(page, ".athing")
LET title = ELEMENT(item, ".storylink")
RETURN {
title: title.innerText,
url: title.href
}
)
RETURN items
Run the script:
ferret exec example.fql
ferret exec my-script.fql
For JavaScript-heavy sites, use browser automation:
# Open browser window for debugging
ferret exec --browser-open my-script.fql
# Run headlessly for production
ferret exec --browser-headless my-script.fql
Example browser automation script:
// Browser automation example
LET page = DOCUMENT("https://example.com", { driver: "cdp" })
CLICK(page, "#search-button")
WAIT_ELEMENT(page, "#results")
RETURN ELEMENTS(page, ".result-item")
Pass dynamic values to your scripts:
ferret exec -p 'url:"https://example.com"' -p 'limit:10' my-script.fql
Use parameters in your FQL script:
LET page = DOCUMENT(@url) // Use the url parameter
LET items = ELEMENTS(page, ".item")
RETURN items
Execute scripts on remote Ferret workers:
ferret exec --runtime 'https://my-worker.com' my-script.fql
Usage:
ferret [flags]
ferret [command]
Available Commands:
browser Manage Ferret browsers
config Manage Ferret configs
exec Execute a FQL script or launch REPL
help Help about any command
update Update Ferret CLI
version Show the CLI version information
Flags:
-h, --help help for ferret
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
Use "ferret [command] --help" for more information about a command.
Ferret CLI can be configured using the config
command or configuration files.
# Set a global configuration value
ferret config set browser.address "http://localhost:9222"
# Set user agent
ferret config set browser.userAgent "MyBot 1.0"
# Set default runtime
ferret config set runtime.type "builtin"
# List all configuration values
ferret config list
# Get a specific value
ferret config get browser.address
Configuration files are stored in:
- Linux/macOS:
~/.config/ferret/config.yaml
- Windows:
%APPDATA%\ferret\config.yaml
Key | Description | Default |
---|---|---|
browser-address |
Chrome DevTools Protocol address | http://127.0.0.1:9222 |
user-agent |
Default User-Agent header | System default |
browser-cookies |
Keep cookies between queries | false |
runtime |
Runtime type (builtin/url) | builtin |
log-level |
Logging level | info |
# Open a new browser instance
ferret browser open
# Open with specific debugging address
ferret browser open --address "http://localhost:9223"
# Close browser
ferret browser close
# Close specific browser by address
ferret browser close --address "http://localhost:9223"
// E-commerce product scraping with error handling
LET page = DOCUMENT("https://shop.example.com/products")
LET products = (
FOR product IN ELEMENTS(page, ".product-card")
LET name = ELEMENT(product, ".product-name")
LET price = ELEMENT(product, ".price")
LET image = ELEMENT(product, ".product-image")
// Handle missing elements gracefully
RETURN name != NONE ? {
name: TRIM(name.innerText),
price: REGEX_MATCH(price.innerText, /\$[\d.]+/)[0],
image: image.src,
url: CONCAT("https://shop.example.com", product.href)
} : NONE
)
// Filter out null results
LET validProducts = (
FOR product IN products
FILTER product != NONE
RETURN product
)
RETURN validProducts
// Login form automation
LET page = DOCUMENT("https://example.com/login", { driver: "cdp" })
// Fill in form fields
INPUT(page, "#username", "myuser")
INPUT(page, "#password", "mypassword")
// Submit form and wait for navigation
CLICK(page, "#login-button")
WAIT_NAVIGATION(page)
// Extract user data after login
RETURN {
loggedIn: ELEMENT(page, ".user-menu") != NONE,
username: ELEMENT(page, ".username").innerText
}
// Scrape multiple pages in parallel
LET urls = [
"https://news.ycombinator.com",
"https://reddit.com/r/programming",
"https://dev.to"
]
LET results = (
FOR url IN urls
LET page = DOCUMENT(url)
RETURN {
url: url,
title: ELEMENT(page, "title").innerText,
headlines: (
FOR headline IN ELEMENTS(page, "h1, h2, h3")
RETURN headline.innerText
)
}
)
RETURN results
// Combine web scraping with API calls
LET page = DOCUMENT("https://github.com/trending")
LET repos = ELEMENTS(page, ".Box-row")
LET details = (
FOR repo IN repos[0:5]
LET repoName = ELEMENT(repo, "h1 a").innerText
LET apiUrl = CONCAT("https://api.github.com/repos/", repoName)
// Make API call
LET apiData = DOCUMENT(apiUrl, { driver: "http" })
RETURN {
name: repoName,
description: ELEMENT(repo, "p").innerText,
stars: apiData.stargazers_count,
language: apiData.language
}
)
RETURN details
๐ Extract table data
// Extract data from HTML tables
LET page = DOCUMENT("https://example.com/data-table")
LET table = ELEMENT(page, "table")
LET headers = (
FOR header IN ELEMENTS(table, "thead th")
RETURN header.innerText
)
LET rows = ELEMENTS(table, "tbody tr")
LET data = (
FOR row IN rows
LET cells = (
FOR cell IN ELEMENTS(row, "td")
RETURN cell.innerText
)
LET record = {}
FOR i IN RANGE(0, LENGTH(headers))
SET_KEY(record, headers[i], cells[i])
RETURN record
)
RETURN data
๐ฑ Mobile viewport simulation
// Test mobile-responsive sites
LET page = DOCUMENT("https://example.com", {
driver: "cdp",
viewport: {
width: 375,
height: 667,
mobile: true
},
userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)"
})
// Check mobile-specific elements
LET mobileMenu = ELEMENT(page, ".mobile-menu")
LET desktopMenu = ELEMENT(page, ".desktop-menu")
RETURN {
isMobile: mobileMenu != NONE,
isDesktop: desktopMenu != NONE,
viewport: {
width: page.viewport.width,
height: page.viewport.height
}
}
Browser connection failed
# Check if Chrome is running with remote debugging
google-chrome --remote-debugging-port=9222
# Or use Ferret's browser management
ferret browser open
Script execution timeout
// Increase timeouts for slow pages
LET page = DOCUMENT("https://slow-site.com", {
driver: "cdp",
timeout: 30000 // 30 seconds
})
Element not found errors
// Use WAIT_ELEMENT for dynamic content
LET page = DOCUMENT("https://spa.example.com", { driver: "cdp" })
WAIT_ELEMENT(page, "#dynamic-content", 10000)
LET element = ELEMENT(page, "#dynamic-content")
Memory issues with large datasets
// Process data in chunks using supported syntax
LET items = ELEMENTS(page, ".item")
LET batchSize = 100
FOR i IN RANGE(0, LENGTH(items), batchSize)
FOR item IN items
// Process individual items...
RETURN item.innerText
Enable debug logging for troubleshooting:
ferret exec --log-level debug my-script.fql
- Use CSS selectors efficiently: Specific selectors are faster than broad ones
- Minimize DOM queries: Store elements in variables when reusing
- Use headless mode:
--browser-headless
is faster for production - Implement timeouts: Always set appropriate timeouts for reliability
- Handle errors gracefully: Use conditional logic to handle missing elements
# Clone the repository
git clone https://github.com/MontFerret/cli.git
cd cli
# Install dependencies
go mod download
# Build the binary
make compile
# Run tests
make test
- Fork the repository
- Create a feature branch:
git checkout -b my-new-feature
- Make your changes and add tests
- Run the test suite:
make test
- Submit a pull request
# Install development tools
make install-tools
# Format code
make fmt
# Run linters
make lint
# Run all checks
make build