Skip to content

MontFerret/cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

89 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Ferret CLI

Go Report Status Build Status Discord Chat Ferret release Apache-2.0 License

lab

About Ferret CLI

Ferret CLI is a command-line interface for the Ferret web scraping system. Ferret uses its own query language called FQL (Ferret Query Language) - a SQL-like language designed specifically for web scraping, browser automation, and data extraction tasks.

Table of Contents

What is FQL?

FQL (Ferret Query Language) is a declarative language that combines the familiar syntax of SQL with powerful web automation capabilities. It allows you to:

  • Navigate web pages and interact with elements
  • Extract data from HTML documents
  • Handle dynamic content and JavaScript-heavy sites
  • Manage browser sessions and cookies
  • Perform complex data transformations
  • Execute parallel scraping operations

Key Features

  • ๐Ÿš€ Fast and Efficient: Built-in concurrency and optimized execution
  • ๐ŸŒ Browser Automation: Full Chrome/Chromium browser control
  • ๐Ÿ”„ Dynamic Content: Handle SPAs and JavaScript-heavy sites
  • ๐Ÿ“Š Data Processing: Built-in functions for data manipulation
  • ๐Ÿ› ๏ธ Flexible Runtime: Run locally or on remote workers
  • ๐Ÿ’พ Session Management: Persistent cookies and browser state
  • ๐Ÿ”ง Configuration: Extensive customization options

Documentation is available at our website.

Installation

Binary

You can download the latest binaries from here.

Source (Go >= 1.18)

go install github.com/MontFerret/cli/ferret@latest

Shell

curl https://raw.githubusercontent.com/MontFerret/cli/master/install.sh | sh

Quick start

Your First FQL Query

The simplest way to get started is with the interactive REPL:

ferret exec
Welcome to Ferret REPL

Please use `exit` or `Ctrl-D` to exit this program.
>>> RETURN "Hello, Ferret!"
"Hello, Ferret!"

Basic Web Scraping

Create a simple script (example.fql) to scrape a webpage:

// Navigate to a website and extract data
LET page = DOCUMENT("https://news.ycombinator.com")
LET items = (
    FOR item IN ELEMENTS(page, ".athing")
        LET title = ELEMENT(item, ".storylink")
        RETURN {
            title: title.innerText,
            url: title.href
        }
)
RETURN items

Run the script:

ferret exec example.fql

Script execution

ferret exec my-script.fql

Browser Automation

For JavaScript-heavy sites, use browser automation:

# Open browser window for debugging
ferret exec --browser-open my-script.fql

# Run headlessly for production
ferret exec --browser-headless my-script.fql

Example browser automation script:

// Browser automation example
LET page = DOCUMENT("https://example.com", { driver: "cdp" })
CLICK(page, "#search-button")
WAIT_ELEMENT(page, "#results")
RETURN ELEMENTS(page, ".result-item")

Query Parameters

Pass dynamic values to your scripts:

ferret exec -p 'url:"https://example.com"' -p 'limit:10' my-script.fql

Use parameters in your FQL script:

LET page = DOCUMENT(@url)  // Use the url parameter
LET items = ELEMENTS(page, ".item")
RETURN items

Remote Runtime

Execute scripts on remote Ferret workers:

ferret exec --runtime 'https://my-worker.com' my-script.fql

Options

Usage:
  ferret [flags]
  ferret [command]

Available Commands:
  browser     Manage Ferret browsers
  config      Manage Ferret configs
  exec        Execute a FQL script or launch REPL
  help        Help about any command
  update      Update Ferret CLI
  version     Show the CLI version information

Flags:
  -h, --help               help for ferret
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")

Use "ferret [command] --help" for more information about a command.

Configuration

Ferret CLI can be configured using the config command or configuration files.

Setting Configuration Values

# Set a global configuration value
ferret config set browser.address "http://localhost:9222"

# Set user agent
ferret config set browser.userAgent "MyBot 1.0"

# Set default runtime
ferret config set runtime.type "builtin"

Viewing Configuration

# List all configuration values
ferret config list

# Get a specific value  
ferret config get browser.address

Configuration File Locations

Configuration files are stored in:

  • Linux/macOS: ~/.config/ferret/config.yaml
  • Windows: %APPDATA%\ferret\config.yaml

Available Configuration Options

Key Description Default
browser-address Chrome DevTools Protocol address http://127.0.0.1:9222
user-agent Default User-Agent header System default
browser-cookies Keep cookies between queries false
runtime Runtime type (builtin/url) builtin
log-level Logging level info

Browser Management

Starting a Browser Instance

# Open a new browser instance
ferret browser open

# Open with specific debugging address
ferret browser open --address "http://localhost:9223"

Closing Browser Instances

# Close browser
ferret browser close

# Close specific browser by address
ferret browser close --address "http://localhost:9223"

Advanced Usage

Complex Data Extraction

// E-commerce product scraping with error handling
LET page = DOCUMENT("https://shop.example.com/products")
LET products = (
    FOR product IN ELEMENTS(page, ".product-card")
        LET name = ELEMENT(product, ".product-name")
        LET price = ELEMENT(product, ".price")
        LET image = ELEMENT(product, ".product-image")
        
        // Handle missing elements gracefully
        RETURN name != NONE ? {
            name: TRIM(name.innerText),
            price: REGEX_MATCH(price.innerText, /\$[\d.]+/)[0],
            image: image.src,
            url: CONCAT("https://shop.example.com", product.href)
        } : NONE
)
// Filter out null results  
LET validProducts = (
    FOR product IN products
        FILTER product != NONE
        RETURN product
)
RETURN validProducts

Working with Forms

// Login form automation
LET page = DOCUMENT("https://example.com/login", { driver: "cdp" })

// Fill in form fields
INPUT(page, "#username", "myuser")
INPUT(page, "#password", "mypassword")  

// Submit form and wait for navigation
CLICK(page, "#login-button")
WAIT_NAVIGATION(page)

// Extract user data after login
RETURN {
    loggedIn: ELEMENT(page, ".user-menu") != NONE,
    username: ELEMENT(page, ".username").innerText
}

Parallel Processing

// Scrape multiple pages in parallel
LET urls = [
    "https://news.ycombinator.com",
    "https://reddit.com/r/programming", 
    "https://dev.to"
]

LET results = (
    FOR url IN urls
        LET page = DOCUMENT(url)
        RETURN {
            url: url,
            title: ELEMENT(page, "title").innerText,
            headlines: (
                FOR headline IN ELEMENTS(page, "h1, h2, h3")
                RETURN headline.innerText
            )
        }
)

RETURN results

Working with APIs

// Combine web scraping with API calls
LET page = DOCUMENT("https://github.com/trending")
LET repos = ELEMENTS(page, ".Box-row")

LET details = (
    FOR repo IN repos[0:5]
        LET repoName = ELEMENT(repo, "h1 a").innerText
        LET apiUrl = CONCAT("https://api.github.com/repos/", repoName)
        
        // Make API call
        LET apiData = DOCUMENT(apiUrl, { driver: "http" })
        
        RETURN {
            name: repoName,
            description: ELEMENT(repo, "p").innerText,
            stars: apiData.stargazers_count,
            language: apiData.language
        }
)

RETURN details

Examples

Web Scraping Examples

๐Ÿ“Š Extract table data
// Extract data from HTML tables
LET page = DOCUMENT("https://example.com/data-table")
LET table = ELEMENT(page, "table")
LET headers = (
    FOR header IN ELEMENTS(table, "thead th")
    RETURN header.innerText
)
LET rows = ELEMENTS(table, "tbody tr")

LET data = (
    FOR row IN rows
        LET cells = (
            FOR cell IN ELEMENTS(row, "td")
            RETURN cell.innerText
        )
        LET record = {}
        
        FOR i IN RANGE(0, LENGTH(headers))
            SET_KEY(record, headers[i], cells[i])
        
        RETURN record
)

RETURN data
๐Ÿ“ฑ Mobile viewport simulation
// Test mobile-responsive sites
LET page = DOCUMENT("https://example.com", {
    driver: "cdp",
    viewport: {
        width: 375,
        height: 667,
        mobile: true
    },
    userAgent: "Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X)"
})

// Check mobile-specific elements
LET mobileMenu = ELEMENT(page, ".mobile-menu")
LET desktopMenu = ELEMENT(page, ".desktop-menu")

RETURN {
    isMobile: mobileMenu != NONE,
    isDesktop: desktopMenu != NONE,
    viewport: {
        width: page.viewport.width,
        height: page.viewport.height
    }
}

Troubleshooting

Common Issues

Browser connection failed

# Check if Chrome is running with remote debugging
google-chrome --remote-debugging-port=9222

# Or use Ferret's browser management
ferret browser open

Script execution timeout

// Increase timeouts for slow pages
LET page = DOCUMENT("https://slow-site.com", {
    driver: "cdp", 
    timeout: 30000  // 30 seconds
})

Element not found errors

// Use WAIT_ELEMENT for dynamic content
LET page = DOCUMENT("https://spa.example.com", { driver: "cdp" })
WAIT_ELEMENT(page, "#dynamic-content", 10000)
LET element = ELEMENT(page, "#dynamic-content")

Memory issues with large datasets

// Process data in chunks using supported syntax
LET items = ELEMENTS(page, ".item")
LET batchSize = 100

FOR i IN RANGE(0, LENGTH(items), batchSize)
    FOR item IN items
        // Process individual items...
        RETURN item.innerText

Debug Mode

Enable debug logging for troubleshooting:

ferret exec --log-level debug my-script.fql

Performance Tips

  1. Use CSS selectors efficiently: Specific selectors are faster than broad ones
  2. Minimize DOM queries: Store elements in variables when reusing
  3. Use headless mode: --browser-headless is faster for production
  4. Implement timeouts: Always set appropriate timeouts for reliability
  5. Handle errors gracefully: Use conditional logic to handle missing elements

Development

Building from Source

# Clone the repository
git clone https://github.com/MontFerret/cli.git
cd cli

# Install dependencies
go mod download

# Build the binary
make compile

# Run tests
make test

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b my-new-feature
  3. Make your changes and add tests
  4. Run the test suite: make test
  5. Submit a pull request

Development Commands

# Install development tools
make install-tools

# Format code
make fmt

# Run linters
make lint

# Run all checks
make build

Contributors