Skip to content

grahama1970/web_llm_interactor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

web_llm_interactor πŸ€–

A toolkit for automating interactions with web-based Large Language Models (LLMs) like Qwen, Perplexity, and more. This project leverages AppleScript to control a real Chrome browser (bypassing bot detection) and Python to extract structured responses from the resulting HTML.

Why web_llm_interactor? πŸ’‘

Many cutting-edge LLMs are only accessible through browser-based interfaces, lacking public or affordable APIs. This restricts programmatic access for agents, scripts, or CLI workflows, unlike models with REST APIs.

web_llm_interactor solves this by enabling seamless interaction with web-based LLMs as if they had API endpoints. It automates browser actions using AppleScript to mimic human behavior, submits queries, waits for responses, and extracts structured data (e.g., JSON) from the page. This makes web-only LLMs fully compatible with your automation workflows.

How It Works βš™οΈ

graph TD
    A["πŸ‘€ User/Agent calls CLI"] --> B["🍎 AppleScript activates Chrome"]
    B --> C["🍎 AppleScript injects JavaScript"]
    C --> D["πŸ€– LLM processes query"]
    D --> E["🍎 AppleScript polls for response"]
    E --> F["🍎 AppleScript saves HTML"]
    F --> G["🐍 Python parses HTML, extracts JSON"]
    G --> H["πŸ’» CLI formats data"]
    H --> I["πŸ“Š Structured JSON returned"]
    I --> A
    
    classDef userAction fill:#f8f8f8,stroke:#505050,stroke-width:1.5px
    classDef appleScript fill:#f8f8f8,stroke:#505050,stroke-width:1.5px
    classDef llmAction fill:#f8f8f8,stroke:#505050,stroke-width:1.5px
    classDef dataProcess fill:#f8f8f8,stroke:#505050,stroke-width:1.5px
    
    class A,I userAction
    class B,C,E,F appleScript
    class D llmAction
    class G,H dataProcess
Loading

Features ✨

  • Bypass Bot Detection: Uses AppleScript to control a real Chrome browser, mimicking human interactions
  • Adaptive Response Polling: Intelligently waits for responses by monitoring HTML length changes
  • Structured Output: Extracts responses as JSON with customizable required fields
  • Automatic Form Submission: Uses multiple strategies to send messages (form submit, button click, Enter key)
  • Multiple LLM Support: Works with Qwen, Perplexity, and other browser-based LLMs
  • CLI Interface: Simple command-line interface for easy integration
  • Focus Management: Properly returns focus to your editor after processing
  • Customizable Fields: Specify which fields must be present in extracted JSON

Installation πŸ› οΈ

Install with UV (Recommended)

# Clone the repository
git clone https://github.com/grahama1970/web-llm-interactor.git
cd web-llm-interactor

# Install with UV
uv pip install -e .

# Or use the installation script
./scripts/install_cpu_uv.sh

Install with PIP

# Clone the repository
git clone https://github.com/grahama1970/web-llm-interactor.git
cd web-llm-interactor

# Install in development mode
pip install -e .

Requirements (managed by pyproject.toml):

  • pyperclip
  • python-dotenv
  • loguru
  • typer
  • beautifulsoup4
  • html2text
  • bleach
  • json-repair
  • (MacOS with AppleScript support)

Usage πŸš€

⚠️ IMPORTANT: Before running any commands, make sure to open the target LLM website in Google Chrome! ⚠️

See USAGE.md for detailed usage instructions and examples.

Command-Line Interface

# Basic usage with default settings (Qwen.ai)
web-llm-interactor ask "What is the capital of Georgia?"

# Specify a different LLM site
web-llm-interactor ask "What is the capital of France?" --url "https://chat.qwen.ai/"

# Specify custom output HTML path
web-llm-interactor ask "What is the tallest mountain?" --output-html "./responses/mountain.html"

# Get all JSON objects, not just the last one
web-llm-interactor ask "List the largest oceans" --all

# Customize required JSON fields
web-llm-interactor ask "Explain quantum computing" --fields "question,answer"

# Skip adding JSON format instructions
web-llm-interactor ask "What's the weather in Tokyo?" --no-json-format

# Configure polling behavior
web-llm-interactor ask "What are the three branches of government?" --poll-interval 3 --stable-polls 2 --timeout 60

Direct AppleScript Usage

# Basic usage
osascript src/web_llm_interactor/send_enter_save_source.applescript "What is the capital of Georgia?" "https://chat.qwen.ai/" "./output.html"

# Get all responses
osascript src/web_llm_interactor/send_enter_save_source.applescript "What is the capital of Florida?" "https://chat.qwen.ai/" "./output.html" "--all"

# Specify required fields
osascript src/web_llm_interactor/send_enter_save_source.applescript "Explain quantum computing" "https://chat.qwen.ai/" "./output.html" "--fields" "question,answer"

Python Integration

import subprocess
import json

def ask_web_llm(question, url="https://chat.qwen.ai/", custom_fields=None, get_all=False):
    """Query a web-based LLM and get a structured JSON response."""
    cmd = ["web-llm-interactor", "ask", question, "--url", url]
    
    if get_all:
        cmd.append("--all")
    
    if custom_fields:
        cmd.extend(["--fields", custom_fields])
    
    result = subprocess.check_output(cmd, text=True)
    return json.loads(result)

# Example usage
response = ask_web_llm("What is the capital of Idaho?")
print(f"Question: {response['question']}")
print(f"Answer: {response['answer']}")

# Get response with custom fields
custom_response = ask_web_llm(
    "Explain quantum computing in simple terms",
    custom_fields="question,answer"
)
print(custom_response["answer"])

Why AppleScript Instead of Selenium? πŸ›‘οΈ

  • Stealth: AppleScript controls a real Chrome browser, making interactions indistinguishable from a human user
  • Reliability: Unlike Selenium, which is often detected via browser fingerprinting or navigator.webdriver, this approach works with sites that block bots
  • Simplicity: No need for complex browser drivers or additional configurations

How Polling Works

The system uses a simple but effective approach to detect when an LLM has finished responding:

  1. Record the initial HTML length when the message is sent
  2. Poll the page at regular intervals (configurable with --poll-interval)
  3. When HTML grows significantly from initial state (>500 characters), start tracking stability
  4. When HTML length stays the same for N consecutive polls (configurable with --stable-polls), consider the response complete
  5. If maximum wait time is reached (configurable with --timeout), proceed with current content

This approach is more efficient than fixed wait times and works across different LLM interfaces.

Project Structure πŸ“‚

web_llm_interactor/
β”œβ”€β”€ src/
β”‚   └── web_llm_interactor/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ cli.py                        # Command-line interface
β”‚       β”œβ”€β”€ send_enter_save_source.applescript  # Browser automation script
β”‚       β”œβ”€β”€ extract_json_from_html.py     # HTML-to-JSON extractor
β”‚       β”œβ”€β”€ file_utils.py                 # File handling utilities
β”‚       └── json_utils.py                 # JSON parsing utilities
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ demo.sh                           # Demo script showing usage examples
β”‚   └── cleanup.sh                        # Script to clean up temporary files
β”œβ”€β”€ README.md
└── pyproject.toml

Troubleshooting πŸ”

  • No Chrome Tab Found: Make sure you have Chrome open with the correct URL (e.g., https://chat.qwen.ai/). This is a required step before running any command!
  • Empty Response: Try increasing the timeout with --timeout 60
  • JSON Extraction Failed: Ensure the LLM is responding with properly formatted JSON or specify required fields with --fields
  • Response Too Slow: Adjust polling parameters with --poll-interval and --stable-polls
  • Command Not Found: Ensure you've installed the package with uv pip install . and are using the correct command: web-llm-interactor ask "Your question"

For more detailed troubleshooting, see USAGE.md.

License πŸ“œ

MIT License


web_llm_interactor empowers agents and CLI workflows to harness web-only LLMs, delivering API-like functionality with minimal setup. 🌟

About

A Python tool for interacting with web-based LLMs

Resources

Stars

Watchers

Forks

Packages

No packages published