Skip to content

AppTester Agent via Browser Automation #196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
planger opened this issue May 1, 2025 · 0 comments
Open

AppTester Agent via Browser Automation #196

planger opened this issue May 1, 2025 · 0 comments
Assignees

Comments

@planger
Copy link

planger commented May 1, 2025

Description

This feature enhances the AI-powered application testing agent by enabling it to autonomously build, launch, interact with, and validate scenarios in a running web application using natural language commands provided through the chat.

In particular, we want to develop an AI agent, named @AppTester, which handles the complete validation flow - from building and launching the application to inspecting it for a user-specified test scenario through the Playwright MCP server.

In this first iteration, the user needs to explicitly specify the scenario to be tested, e.g. Open the AI Configuration View, navigate to Token Usage tab and click reset, or available in the context (e.g. via #191). If the user or context doesn't specify the scenario detailed enough to run the scenario, the agent should tell the user what's missing.

The AI agent will support the following:

  • Build the application via tasks (see PR #15504 for more details)
  • Launch the application using launch configurations (to be developed in the context of this task)
  • Connect to a running application using Playwright MCP
  • Perform UI interactions (Playwright MCP should already support this)
    • Navigate through the UI
    • Invoke UI elements (clicking buttons, entering text)
    • Inspect styling and layout
    • Execute sequences of interactions to evaluate behavior
  • Shut down the application after testing (to be developed in the context of this task)

This will enable developers to quickly validate UI changes and interaction behaviors without manual testing, and can help other AI coding agents iterate on UI improvements with real-time validation.

Example Use Case

Feature: End-to-End Testing with Build and Launch
  As a developer
  I want the AI to handle building and launching my application
  So that I can validate changes without manual setup

  Scenario: Testing after code changes
    Given I have made changes to application code
    When I ask in the chat "@AppTester Can you check if the reset button in the Token Usage tab of the AI Configuration view now works?"
    Then the AI agent should:
      | Identify the appropriate build task       |
      | Run the build process                     |
      | Launch the application                    |
      | Navigate to the view and test the reset button    |
    
    And the AI should respond with:
      | Confirmation of successful build          |
      | Analysis of the tested functionality    |
      | Details of any issues encountered         |
      
Feature: Behavior Check After UI Interaction
  As a developer
  I want the AI to perform UI interactions and verify behavior
  So that I can confirm functionality without manual testing

  Scenario: Validating button functionality
    Given I have a running application
    When I ask in the chat "@AppTester Connect to localhost:8000 and verify that if I open the AI configuration view, go to Token Usage and click the 'Reset' button, that the values become 0."
    Then the AI agent should:
      | Connect to the application                  |
      | Find and open to the AI configuration view            |
      | Navigate to the Token Usage tab            |
      | Click the Reset button                   |
      | Check the resulting state           |
    
    And the AI should respond with:
      | Confirmation of the interaction outcome |
      | Details about the changed state         |

Hints and Suggested Architecture

The implementation should follow an extensible design with the following components:

  • Agent Implementation: All functionality should be encapsulated in a new agent that can be addressed in the chat, e.g., @AppTester. The functionality below should be implemented as tool calls available to this agent (build tasks, launch, MCP Playwright, stop earlier launch).

  • Playwright MCP Server Integration: Install and register the Playwright MCP server and add its tool functions to the agent's system prompt.

  • Build Task Integration: Add the existing tool functions for listing and running tasks (PR #15504) to manage the application build processes to the agent's system prompt.

  • Launch Configuration Support: Implement a dedicated set of tool functions, similar to (PR #15504) for launching the application under development. Add this tool function to the AppTester agent to enable the agent to start the application in the appropriate context.

  • Application Lifecycle Management: Consider storing state about the application lifecycle in the chat model so it can be retrieved in following tool calls. This would enable stopping the application after the test via another tool function to be created in the context of this story. The chat model is available as parameter of the tool call handler.

Dependencies

  • PR #15504: AI tools for listing and running tasks (merged)
  • Issue #191: Task management system for tracking requirements and user-facing scenarios

Follow-up Stories:

@planger planger self-assigned this May 1, 2025
@planger planger changed the title Agent for evaluating E2E scenarios (see use case) AI Agent for Application Interaction and UI Inspection via Browser Automation May 1, 2025
@planger planger changed the title AI Agent for Application Interaction and UI Inspection via Browser Automation AppTester Agent via Browser Automation May 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant