Skip to content

AppTester Agent with Screenshot Support #205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
planger opened this issue May 7, 2025 · 0 comments
Open

AppTester Agent with Screenshot Support #205

planger opened this issue May 7, 2025 · 0 comments
Assignees

Comments

@planger
Copy link

planger commented May 7, 2025

Description

This feature extends the AI application testing agent introduced in #196 with screenshot capabilities, enabling visual validation of UI elements and layouts. Screenshots provide visual confirmation of application behavior and styling, enhancing the agent's ability to validate UI changes without manual inspection.

The screenshot functionality will be powered by Playwright MCP, which supports taking screenshots. The Playwright MCP server must be started with the --vision flag to enable this feature.

Note: Playwright MCP currently only supports either screenshot or snapshot mode, not both simultaneously. See playwright-mcp#420.

Open Question: To use both, we may want to investigate registering the MCP server two times (once with --vision, once without) separately and offer two different tool functions.

The screenshot functionality will enable the agent to:

  • Capture screenshots of the full page or specific elements
  • Send these base64-encoded images to the LLM (see PR #15410 for more details); finish the PR for this use case as part of this issue, if necessary.
  • Analyze and comment on visual states before and after interactions
  • Validate styling changes with visual confirmation

Example Use Case

Feature: UI Change Validation
  As a developer
  I want to get AI feedback on UI styling changes with visual evidence
  So that I can confirm visual changes without manual inspection

  Scenario: Confirming styling changes
    Given I have a running application with recent styling changes
    When I ask in the chat "@AppTester Connect to localhost:8000: Can you confirm if the token usage in the AI Configuration view is now aligned left?"
    Then the AI agent should:
      | Connect to the application               |
      | Navigate to the relevant view            |
      | Take a screenshot of the area            |
      | Visually analyse the screenshot          |

    And the AI should respond with:
      | Analysis of the styling                  |
      | Comment on whether the styling is correct|

Hints and Suggested Architecture

The implementation should extend the AI application testing agent (#196) and ensure that:

  • The Playwright MCP server is launched with the --vision flag to enable screenshot functionality
  • The agent can receive and forward base64-encoded screenshots to the LLM using the image support provided by PR #15410

Related Issues and Dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant