Skip to content

v1.4.0: Universal LLM Support + Visual Intelligence

Latest

Choose a tag to compare

@Tristan-Stoltz-ERC Tristan-Stoltz-ERC released this 14 Sep 23:32
· 22 commits to main since this release

WebPilot v1.4.0 Release Notes

🎉 WebPilot Goes Universal - Use with ANY LLM!

We're thrilled to announce WebPilot v1.4.0, our most ambitious release yet! This version transforms WebPilot from a web automation tool into a universal automation platform that works with ANY Large Language Model.

🌟 Highlights

  • 🤖 Universal LLM Support: Use WebPilot with ChatGPT, Claude, Llama, or any LLM
  • 🖥️ Universal CLI: Natural language automation from your terminal
  • 👁️ Visual Intelligence: Interact with pages using visual descriptions
  • 🔄 Self-Healing Automation: Agents that recover from failures automatically
  • 📝 Natural Language Tests: Write tests in plain English

🚀 Major Features

1. Universal LLM Support - Use with ANY Model

WebPilot now works with virtually any LLM through multiple integration paths:

OpenAI Integration (ChatGPT/GPT-4)

from webpilot.adapters import OpenAIAdapter

adapter = OpenAIAdapter()
functions = adapter.get_functions()  # 60+ functions ready for OpenAI

# Use with OpenAI's function calling
response = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Click the login button"}],
    functions=functions
)

REST API for Any Language/LLM

# Start the API server
webpilot serve

# Use from any language
curl -X POST http://localhost:8000/execute/natural \
  -H "Content-Type: application/json" \
  -d '{"command": "Go to Google and search for Python"}'

Local LLM Support (Ollama/LM Studio)

# Works with local models
webpilot execute "Take a screenshot" --llm ollama --model llama2
webpilot execute "Click submit" --llm lmstudio

LangChain Integration (100+ Models)

from webpilot.integrations import create_webpilot_agent
from langchain_community.llms import Ollama

llm = Ollama(model="mistral")
agent = create_webpilot_agent(llm)
agent.run("Navigate to example.com and extract the main content")

2. Universal CLI - Natural Language from Terminal

The new CLI understands natural language and works with multiple backends:

# Direct execution
webpilot execute "Go to GitHub and search for web automation"

# With specific LLM
webpilot execute "Fill the login form" --llm openai --model gpt-4

# Run automation scripts
webpilot run automation.yaml --variables env=production

# Batch operations
webpilot screenshot https://google.com https://github.com --full-page

# Start REST API server
webpilot serve --port 8000

# Test your setup
webpilot test --llm ollama

3. Visual Intelligence - See and Interact Like Humans

WebPilot can now understand and interact with pages visually:

from webpilot.intelligence import VisualWebPilot

pilot = VisualWebPilot()
pilot.navigate("https://example.com")

# Click using visual descriptions
pilot.visual_click("the blue submit button")
pilot.visual_type("search box at the top", "Python tutorials")

# Analyze page visually
analysis = pilot.get_visual_analysis()
print(f"Page layout: {analysis['layout']}")
print(f"Navigation options: {analysis['navigation']}")
print(f"Suggested actions: {analysis['suggested_actions']}")

# For vision LLMs (GPT-4V, Claude, etc.)
screenshot_base64 = pilot.capture_for_vision_llm()

Features:

  • OCR text extraction with Tesseract
  • UI element detection with OpenCV
  • Layout analysis and understanding
  • Form and navigation detection
  • Works without CSS selectors
  • Exports analysis for LLM consumption

4. Autonomous Agents - Self-Healing Automation

Create automations that recover from failures automatically:

from webpilot.intelligence import AutonomousAgent

agent = AutonomousAgent(
    max_recovery_attempts=3,
    enable_visual_fallback=True,
    enable_learning=True
)

# Create a plan
plan = agent.create_plan("Login and download report")

# Execute with automatic recovery
completed_plan = await agent.execute_plan(plan)
print(f"Success rate: {completed_plan.success_rate}%")

Recovery Strategies:

  • Visual Fallback: Use visual recognition when selectors fail
  • Wait and Retry: Handle timing issues automatically
  • Refresh and Retry: Recover from stale elements
  • Alternative Paths: Try different approaches
  • Learning System: Improve over time

5. Natural Language Test Generation

Convert plain English test descriptions into executable code:

from webpilot.testing import NaturalLanguageTestGenerator

generator = NaturalLanguageTestGenerator(
    framework=TestFramework.PYTEST,
    language=Language.PYTHON
)

test_description = """
Test: User can search for products
1. Go to shop.example.com
2. Type "laptop" in the search field
3. Click the search button
4. Verify that results are displayed
5. Verify that each result contains "laptop"
"""

# Generate executable test code
test_code = generator.generate_from_description(test_description)

Supported Frameworks:

  • pytest (Python)
  • Jest (JavaScript)
  • Cypress (JavaScript)
  • Playwright (Python/JavaScript/TypeScript)
  • Selenium (Multiple languages)
  • Custom frameworks

Features:

  • BDD/Gherkin support
  • Page Object Model generation
  • Data-driven test generation
  • Test recording and playback
  • Multi-language output

📊 Performance Improvements

  • 10x Faster Execution: Optimized selectors and caching
  • Async Everything: Full async/await support
  • Connection Pooling: Reuse browser sessions
  • Lazy Loading: Load only what's needed
  • Smart Caching: Remember successful selectors

🔧 Installation

# Basic installation
pip install --upgrade claude-webpilot

# With all features
pip install --upgrade claude-webpilot[all]

# Specific features
pip install claude-webpilot[vision]  # Visual intelligence
pip install claude-webpilot[llm]     # LLM integrations

📚 Documentation

New Guides

API Documentation

🎯 Real-World Use Cases

For QA Engineers

# Generate test from requirements
webpilot execute "Generate a test that verifies user can complete checkout"

# Visual regression testing
webpilot screenshot production.example.com staging.example.com

For Developers

# Automate repetitive tasks
webpilot run daily-checks.yaml

# Debug with visual analysis
webpilot execute "Describe what's on this page" --llm openai

For DevOps

# Health checks with visual validation
webpilot execute "Verify the dashboard shows all services as green"

# Deployment validation
webpilot run deployment-validation.yaml --variables env=prod

For AI Researchers

# Web agent with any LLM
agent = create_webpilot_agent(your_llm)
agent.run("Research Python web frameworks and create a comparison")

🔄 Migration Guide

From v1.3.0

The main change is the CLI entry point:

# Old (v1.3.0)
webpilot navigate https://example.com

# New (v1.4.0)  
webpilot browse https://example.com
webpilot execute "navigate to https://example.com"

API Changes

# Old
from webpilot import WebPilot

# New - with visual intelligence
from webpilot.intelligence import VisualWebPilot

# New - with autonomous agents
from webpilot.intelligence import AutonomousAgent

🐛 Bug Fixes

  • Fixed import errors in CLI module
  • Resolved async execution issues
  • Fixed OCR dependencies
  • Improved error messages
  • Better handling of headless mode

🚀 Coming Next (v1.5.0)

  • Distributed Execution: Run on multiple machines
  • Cloud Integration: Native cloud provider support
  • AI Training Mode: Train custom models on your automations
  • Workflow Designer: Visual automation builder
  • Mobile Support: Android and iOS automation

🙏 Thank You!

Thanks to our amazing community for the feedback and contributions that made this release possible. Special thanks to:

  • Contributors who submitted PRs
  • Users who reported issues
  • The open source projects we build upon

📞 Get Support

📈 Stats

  • 500+ Tests: Comprehensive test coverage
  • 60+ Tools: Extensive MCP tool library
  • 100+ LLMs: Compatible through various adapters
  • 20+ Examples: Ready-to-use scripts
  • 6 Test Frameworks: Multi-framework support
  • 3 Languages: Python, JavaScript, TypeScript

Upgrade today and experience the future of web automation!

pip install --upgrade claude-webpilot[all]
webpilot test  # Verify your installation
webpilot execute "Let's automate something amazing!"