Skip to content

AppTester Agent with Screenshot Support #205

Open
@planger

Description

@planger

PO @planger

Description

This feature extends the AI application testing agent introduced in #196 with screenshot capabilities, enabling visual validation of UI elements and layouts. Screenshots provide visual confirmation of application behavior and styling, enhancing the agent's ability to validate UI changes without manual inspection.

The screenshot functionality will be powered by Playwright MCP, which supports taking screenshots. The Playwright MCP server must be started with the --vision flag to enable this feature.

Note: Playwright MCP currently only supports either screenshot or snapshot mode, not both simultaneously. See playwright-mcp#420.

Open Question: To use both, we may want to investigate registering the MCP server two times (once with --vision, once without) separately and offer two different tool functions.

The screenshot functionality will enable the agent to:

  • Capture screenshots of the full page or specific elements
  • Send these base64-encoded images to the LLM (see PR #15410 for more details); finish the PR for this use case as part of this issue, if necessary.
  • Analyze and comment on visual states before and after interactions
  • Validate styling changes with visual confirmation

Example Use Case

Feature: UI Change Validation
  As a developer
  I want to get AI feedback on UI styling changes with visual evidence
  So that I can confirm visual changes without manual inspection

  Scenario: Confirming styling changes
    Given I have a running application with recent styling changes
    When I ask in the chat "@AppTester Connect to localhost:8000: Can you confirm if the token usage in the AI Configuration view is now aligned left?"
    Then the AI agent should:
      | Connect to the application               |
      | Navigate to the relevant view            |
      | Take a screenshot of the area            |
      | Visually analyse the screenshot          |

    And the AI should respond with:
      | Analysis of the styling                  |
      | Comment on whether the styling is correct|

Hints and Suggested Architecture

The implementation should extend the AI application testing agent (#196) and ensure that:

  • The Playwright MCP server is launched with the --vision flag to enable screenshot functionality
  • The agent can receive and forward base64-encoded screenshots to the LLM using the image support provided by PR #15410

Related Issues and Dependencies

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions