WebPilot v1.4.0 Release Notes

🎉 WebPilot Goes Universal - Use with ANY LLM!

We're thrilled to announce WebPilot v1.4.0, our most ambitious release yet! This version transforms WebPilot from a web automation tool into a universal automation platform that works with ANY Large Language Model.

🌟 Highlights

🤖 Universal LLM Support: Use WebPilot with ChatGPT, Claude, Llama, or any LLM
🖥️ Universal CLI: Natural language automation from your terminal
👁️ Visual Intelligence: Interact with pages using visual descriptions
🔄 Self-Healing Automation: Agents that recover from failures automatically
📝 Natural Language Tests: Write tests in plain English

🚀 Major Features

1. Universal LLM Support - Use with ANY Model

WebPilot now works with virtually any LLM through multiple integration paths:

OpenAI Integration (ChatGPT/GPT-4)

from webpilot.adapters import OpenAIAdapter

adapter = OpenAIAdapter()
functions = adapter.get_functions()  # 60+ functions ready for OpenAI

# Use with OpenAI's function calling
response = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Click the login button"}],
    functions=functions
)

REST API for Any Language/LLM

# Start the API server
webpilot serve

# Use from any language
curl -X POST http://localhost:8000/execute/natural \
  -H "Content-Type: application/json" \
  -d '{"command": "Go to Google and search for Python"}'

Local LLM Support (Ollama/LM Studio)

# Works with local models
webpilot execute "Take a screenshot" --llm ollama --model llama2
webpilot execute "Click submit" --llm lmstudio

LangChain Integration (100+ Models)

from webpilot.integrations import create_webpilot_agent
from langchain_community.llms import Ollama

llm = Ollama(model="mistral")
agent = create_webpilot_agent(llm)
agent.run("Navigate to example.com and extract the main content")

2. Universal CLI - Natural Language from Terminal

The new CLI understands natural language and works with multiple backends:

# Direct execution
webpilot execute "Go to GitHub and search for web automation"

# With specific LLM
webpilot execute "Fill the login form" --llm openai --model gpt-4

# Run automation scripts
webpilot run automation.yaml --variables env=production

# Batch operations
webpilot screenshot https://google.com https://github.com --full-page

# Start REST API server
webpilot serve --port 8000

# Test your setup
webpilot test --llm ollama

3. Visual Intelligence - See and Interact Like Humans

WebPilot can now understand and interact with pages visually:

from webpilot.intelligence import VisualWebPilot

pilot = VisualWebPilot()
pilot.navigate("https://example.com")

# Click using visual descriptions
pilot.visual_click("the blue submit button")
pilot.visual_type("search box at the top", "Python tutorials")

# Analyze page visually
analysis = pilot.get_visual_analysis()
print(f"Page layout: {analysis['layout']}")
print(f"Navigation options: {analysis['navigation']}")
print(f"Suggested actions: {analysis['suggested_actions']}")

# For vision LLMs (GPT-4V, Claude, etc.)
screenshot_base64 = pilot.capture_for_vision_llm()

Features:

OCR text extraction with Tesseract
UI element detection with OpenCV
Layout analysis and understanding
Form and navigation detection
Works without CSS selectors
Exports analysis for LLM consumption

4. Autonomous Agents - Self-Healing Automation

Create automations that recover from failures automatically:

from webpilot.intelligence import AutonomousAgent

agent = AutonomousAgent(
    max_recovery_attempts=3,
    enable_visual_fallback=True,
    enable_learning=True
)

# Create a plan
plan = agent.create_plan("Login and download report")

# Execute with automatic recovery
completed_plan = await agent.execute_plan(plan)
print(f"Success rate: {completed_plan.success_rate}%")

Recovery Strategies:

Visual Fallback: Use visual recognition when selectors fail
Wait and Retry: Handle timing issues automatically
Refresh and Retry: Recover from stale elements
Alternative Paths: Try different approaches
Learning System: Improve over time

5. Natural Language Test Generation

Convert plain English test descriptions into executable code:

from webpilot.testing import NaturalLanguageTestGenerator

generator = NaturalLanguageTestGenerator(
    framework=TestFramework.PYTEST,
    language=Language.PYTHON
)

test_description = """
Test: User can search for products
1. Go to shop.example.com
2. Type "laptop" in the search field
3. Click the search button
4. Verify that results are displayed
5. Verify that each result contains "laptop"
"""

# Generate executable test code
test_code = generator.generate_from_description(test_description)

Supported Frameworks:

pytest (Python)
Jest (JavaScript)
Cypress (JavaScript)
Playwright (Python/JavaScript/TypeScript)
Selenium (Multiple languages)
Custom frameworks

Features:

BDD/Gherkin support
Page Object Model generation
Data-driven test generation
Test recording and playback
Multi-language output

📊 Performance Improvements

10x Faster Execution: Optimized selectors and caching
Async Everything: Full async/await support
Connection Pooling: Reuse browser sessions
Lazy Loading: Load only what's needed
Smart Caching: Remember successful selectors

🔧 Installation

# Basic installation
pip install --upgrade claude-webpilot

# With all features
pip install --upgrade claude-webpilot[all]

# Specific features
pip install claude-webpilot[vision]  # Visual intelligence
pip install claude-webpilot[llm]     # LLM integrations

📚 Documentation

New Guides

API Documentation

🎯 Real-World Use Cases

For QA Engineers

# Generate test from requirements
webpilot execute "Generate a test that verifies user can complete checkout"

# Visual regression testing
webpilot screenshot production.example.com staging.example.com

For Developers

# Automate repetitive tasks
webpilot run daily-checks.yaml

# Debug with visual analysis
webpilot execute "Describe what's on this page" --llm openai

For DevOps

# Health checks with visual validation
webpilot execute "Verify the dashboard shows all services as green"

# Deployment validation
webpilot run deployment-validation.yaml --variables env=prod

For AI Researchers

# Web agent with any LLM
agent = create_webpilot_agent(your_llm)
agent.run("Research Python web frameworks and create a comparison")

🔄 Migration Guide

From v1.3.0

The main change is the CLI entry point:

# Old (v1.3.0)
webpilot navigate https://example.com

# New (v1.4.0)  
webpilot browse https://example.com
webpilot execute "navigate to https://example.com"

API Changes

# Old
from webpilot import WebPilot

# New - with visual intelligence
from webpilot.intelligence import VisualWebPilot

# New - with autonomous agents
from webpilot.intelligence import AutonomousAgent

🐛 Bug Fixes

Fixed import errors in CLI module
Resolved async execution issues
Fixed OCR dependencies
Improved error messages
Better handling of headless mode

🚀 Coming Next (v1.5.0)

Distributed Execution: Run on multiple machines
Cloud Integration: Native cloud provider support
AI Training Mode: Train custom models on your automations
Workflow Designer: Visual automation builder
Mobile Support: Android and iOS automation

🙏 Thank You!

Thanks to our amazing community for the feedback and contributions that made this release possible. Special thanks to:

Contributors who submitted PRs
Users who reported issues
The open source projects we build upon

📞 Get Support

GitHub Issues: Report bugs or request features
Discussions: Ask questions and share ideas
Documentation: Full documentation

📈 Stats

500+ Tests: Comprehensive test coverage
60+ Tools: Extensive MCP tool library
100+ LLMs: Compatible through various adapters
20+ Examples: Ready-to-use scripts
6 Test Frameworks: Multi-framework support
3 Languages: Python, JavaScript, TypeScript

Upgrade today and experience the future of web automation!

pip install --upgrade claude-webpilot[all]
webpilot test  # Verify your installation
webpilot execute "Let's automate something amazing!"

v1.4.0: Universal LLM Support + Visual Intelligence