Ferret CLI

About Ferret CLI

Ferret CLI is a command-line interface for the Ferret web scraping system. Ferret uses its own query language called FQL (Ferret Query Language) - a SQL-like language designed specifically for web scraping, browser automation, and data extraction tasks.

🚀 Fast and Efficient: Built-in concurrency and optimized execution
🌐 Browser Automation: Full Chrome/Chromium browser control
🔄 Dynamic Content: Handle SPAs and JavaScript-heavy sites
📊 Data Processing: Built-in functions for data manipulation
🛠️ Flexible Runtime: Run locally or on remote workers
💾 Session Management: Persistent cookies and browser state
🔧 Configuration: Extensive customization options

Documentation is available at our website.

Installation

Binary

You can download the latest binaries from here.

Source (Go >= 1.18)

go install github.com/MontFerret/cli/ferret@latest

Shell

curl https://raw.githubusercontent.com/MontFerret/cli/master/install.sh | sh

Quick start

Your First FQL Query

The simplest way to get started is with the interactive REPL:

ferret exec
Welcome to Ferret REPL

Please use `exit` or `Ctrl-D` to exit this program.
>>> RETURN "Hello, Ferret!"
"Hello, Ferret!"

Basic Web Scraping

Create a simple script (example.fql) to scrape a webpage:

// Navigate to a website and extract data
LET page = DOCUMENT("https://news.ycombinator.com")
LET items = (
    FOR item IN ELEMENTS(page, ".athing")
        LET title = ELEMENT(item, ".storylink")
        RETURN {
            title: title.innerText,
            url: title.href
        }
)
RETURN items

Run the script:

ferret exec example.fql

Script execution

ferret exec my-script.fql

Browser Automation

For JavaScript-heavy sites, use browser automation:

# Open browser window for debugging
ferret exec --browser-open my-script.fql

# Run headlessly for production
ferret exec --browser-headless my-script.fql

Example browser automation script:

// Browser automation example
LET page = DOCUMENT("https://example.com", { driver: "cdp" })
CLICK(page, "#search-button")
WAIT_ELEMENT(page, "#results")
RETURN ELEMENTS(page, ".result-item")

Query Parameters

Pass dynamic values to your scripts:

ferret exec -p 'url:"https://example.com"' -p 'limit:10' my-script.fql

Use parameters in your FQL script:

LET page = DOCUMENT(@url)  // Use the url parameter
LET items = ELEMENTS(page, ".item")
RETURN items

Remote Runtime

Execute scripts on remote Ferret workers:

ferret exec --runtime 'https://my-worker.com' my-script.fql

Options

Usage:
  ferret [flags]
  ferret [command]

Available Commands:
  browser     Manage Ferret browsers
  config      Manage Ferret configs
  exec        Execute a FQL script or launch REPL
  help        Help about any command
  update      Update Ferret CLI
  version     Show the CLI version information

Flags:
  -h, --help               help for ferret
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")

Use "ferret [command] --help" for more information about a command.

Configuration

Ferret CLI can be configured using the config command or configuration files.

Setting Configuration Values

# Set a global configuration value
ferret config set browser.address "http://localhost:9222"

# Set user agent
ferret config set browser.userAgent "MyBot 1.0"

# Set default runtime
ferret config set runtime.type "builtin"

Viewing Configuration

# List all configuration values
ferret config list

# Get a specific value  
ferret config get browser.address

Configuration File Locations

Configuration files are stored in:

Linux/macOS: ~/.config/ferret/config.yaml
Windows: %APPDATA%\ferret\config.yaml

Available Configuration Options

Key	Description	Default
`browser-address`	Chrome DevTools Protocol address	`http://127.0.0.1:9222`
`user-agent`	Default User-Agent header	System default
`browser-cookies`	Keep cookies between queries	`false`
`runtime`	Runtime type (builtin/url)	`builtin`
`log-level`	Logging level	`info`

Browser Management

Starting a Browser Instance

# Open a new browser instance
ferret browser open

# Open with specific debugging address
ferret browser open --address "http://localhost:9223"

Closing Browser Instances

# Close browser
ferret browser close

# Close specific browser by address
ferret browser close --address "http://localhost:9223"

Advanced Usage

Complex Data Extraction

// E-commerce product scraping with error handling
LET page = DOCUMENT("https://shop.example.com/products")
LET products = (
    FOR product IN ELEMENTS(page, ".product-card")
        LET name = ELEMENT(product, ".product-name")
        LET price = ELEMENT(product, ".price")
        LET image = ELEMENT(product, ".product-image")
        
        // Handle missing elements gracefully
        RETURN name != NONE ? {
            name: TRIM(name.innerText),
            price: REGEX_MATCH(price.innerText, /\$[\d.]+/)[0],
            image: image.src,
            url: CONCAT("https://shop.example.com", product.href)
        } : NONE
)
// Filter out null results  
LET validProducts = (
    FOR product IN products
        FILTER product != NONE
        RETURN product
)
RETURN validProducts

Working with Forms

// Login form automation
LET page = DOCUMENT("https://example.com/login", { driver: "cdp" })

// Fill in form fields
INPUT(page, "#username", "myuser")
INPUT(page, "#password", "mypassword")  

// Submit form and wait for navigation
CLICK(page, "#login-button")
WAIT_NAVIGATION(page)

// Extract user data after login
RETURN {
    loggedIn: ELEMENT(page, ".user-menu") != NONE,
    username: ELEMENT(page, ".username").innerText
}

Parallel Processing

// Scrape multiple pages in parallel
LET urls = [
    "https://news.ycombinator.com",
    "https://reddit.com/r/programming", 
    "https://dev.to"
]

LET results = (
    FOR url IN urls
        LET page = DOCUMENT(url)
        RETURN {
            url: url,
            title: ELEMENT(page, "title").innerText,
            headlines: (
                FOR headline IN ELEMENTS(page, "h1, h2, h3")
                RETURN headline.innerText
            )
        }
)

RETURN results

Working with APIs

// Combine web scraping with API calls
LET page = DOCUMENT("https://github.com/trending")
LET repos = ELEMENTS(page, ".Box-row")

LET details = (
    FOR repo IN repos[0:5]
        LET repoName = ELEMENT(repo, "h1 a").innerText
        LET apiUrl = CONCAT("https://api.github.com/repos/", repoName)
        
        // Make API call
        LET apiData = DOCUMENT(apiUrl, { driver: "http" })
        
        RETURN {
            name: repoName,
            description: ELEMENT(repo, "p").innerText,
            stars: apiData.stargazers_count,
            language: apiData.language
        }
)

RETURN details

Examples

Web Scraping Examples

📊 Extract table data

// Extract data from HTML tables
LET page = DOCUMENT("https://example.com/data-table")
LET table = ELEMENT(page, "table")
LET headers = (
    FOR header IN ELEMENTS(table, "thead th")
    RETURN header.innerText
)
LET rows = ELEMENTS(table, "tbody tr")

LET data = (
    FOR row IN rows
        LET cells = (
            FOR cell IN ELEMENTS(row, "td")
            RETURN cell.innerText
        )
        LET record = {}
        
        FOR i IN RANGE(0, LENGTH(headers))
            SET_KEY(record, headers[i], cells[i])
        
        RETURN record
)

RETURN data

📱 Mobile viewport simulation

// Test mobile-responsive sites
LET page = DOCUMENT("https://example.com", {
    driver: "cdp",
    viewport: {
        width: 375,
        height: 667,
        mobile: true
    },
    userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)"
})

// Check mobile-specific elements
LET mobileMenu = ELEMENT(page, ".mobile-menu")
LET desktopMenu = ELEMENT(page, ".desktop-menu")

RETURN {
    isMobile: mobileMenu != NONE,
    isDesktop: desktopMenu != NONE,
    viewport: {
        width: page.viewport.width,
        height: page.viewport.height
    }
}

Troubleshooting

Common Issues

Browser connection failed

# Check if Chrome is running with remote debugging
google-chrome --remote-debugging-port=9222

# Or use Ferret's browser management
ferret browser open

Script execution timeout

// Increase timeouts for slow pages
LET page = DOCUMENT("https://slow-site.com", {
    driver: "cdp", 
    timeout: 30000  // 30 seconds
})

Element not found errors

// Use WAIT_ELEMENT for dynamic content
LET page = DOCUMENT("https://spa.example.com", { driver: "cdp" })
WAIT_ELEMENT(page, "#dynamic-content", 10000)
LET element = ELEMENT(page, "#dynamic-content")

Memory issues with large datasets

// Process data in chunks using supported syntax
LET items = ELEMENTS(page, ".item")
LET batchSize = 100

FOR i IN RANGE(0, LENGTH(items), batchSize)
    FOR item IN items
        // Process individual items...
        RETURN item.innerText

Debug Mode

Enable debug logging for troubleshooting:

ferret exec --log-level debug my-script.fql

Performance Tips

Use CSS selectors efficiently: Specific selectors are faster than broad ones
Minimize DOM queries: Store elements in variables when reusing
Use headless mode: --browser-headless is faster for production
Implement timeouts: Always set appropriate timeouts for reliability
Handle errors gracefully: Use conditional logic to handle missing elements

Development

Building from Source

# Clone the repository
git clone https://github.com/MontFerret/cli.git
cd cli

# Install dependencies
go mod download

# Build the binary
make compile

# Run tests
make test

Contributing

Fork the repository
Create a feature branch: git checkout -b my-new-feature
Make your changes and add tests
Run the test suite: make test
Submit a pull request

Development Commands

# Install development tools
make install-tools

# Format code
make fmt

# Run linters
make lint

# Run all checks
make build

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github		.github
assets		assets
browser		browser
cmd		cmd
config		config
ferret		ferret
internal/selfupdate		internal/selfupdate
logger		logger
repl		repl
runtime		runtime
.editorconfig		.editorconfig
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
revive.toml		revive.toml
versions.sh		versions.sh

Uh oh!

License

MontFerret/cli

Folders and files

Latest commit

History

Repository files navigation

Ferret CLI

About Ferret CLI

Table of Contents

What is FQL?

Key Features

Installation

Binary

Source (Go >= 1.18)

Shell

Quick start

Your First FQL Query

Basic Web Scraping

Script execution

Browser Automation

Query Parameters

Remote Runtime

Options

Configuration

Setting Configuration Values

Viewing Configuration

Configuration File Locations

Available Configuration Options

Browser Management

Starting a Browser Instance

Closing Browser Instances

Advanced Usage

Complex Data Extraction

Working with Forms

Parallel Processing

Working with APIs

Examples

Web Scraping Examples

Troubleshooting

Common Issues

Debug Mode

Performance Tips

Development

Building from Source

Contributing

Development Commands

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Sponsor this project

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages