Skip to content

AppTester Agent with DOM Access #206

Open
eclipse-theia/theia
#15734
@planger

Description

@planger

PO @planger

Description

This feature enables advanced DOM inspection and custom script execution capabilities for the AI-powered application testing agent by leveraging a hybrid architecture: Puppeteer will be used to launch and manage the browser process, while Playwright MCP will connect to the browser instance using the Chrome DevTools Protocol (CDP) endpoint.

Playwright MCP does not support extending its protocol with custom tools or direct DOM access - it only provides static snapshots or screenshots. By integrating Puppeteer for browser process control and DOM access, we gain full flexibility to execute custom logic while still benefiting from Playwright MCP's structured interface for interaction and snapshotting.

Goal

  • Launch a browser process using Puppeteer
  • Expose the DevTools protocol endpoint via --remote-debugging-port
  • Connect the Playwright MCP server to the browser instance using --cdp-endpoint
  • Enable:
    • Full DOM access through Puppeteer (not possible with Playwright MCP)
    • Programmatic control of browser lifecycle
    • Seamless integration with Playwright MCP for vision or snapshot features

Example Use Case

Feature: DOM Validation via Puppeteer and Playwright MCP
  As a developer
  I want the AI agent to access and inspect the live DOM tree
  So that I can validate structure and behavior of my application in detail

  Scenario: Validating token value DOM state
    Given I have a running application
    When I ask in the chat "@AppTester Connect to localhost:8000 and read the innerText of the token value element with selector '.token-value'"
    Then the AI agent should:
      | Start the browser with Puppeteer                |
      | Expose the remote debugging port                |
      | Connect Playwright MCP to the running browser   |
      | Use Puppeteer to query the DOM using selector   |
    
    And the AI should respond with:
      | The innerText of the requested element          |
      | DOM snippet if requested                        |

Hints and Suggested Architecture

  • Browser Launching: Puppeteer will be responsible for launching the browser process on the backend and exposing the CDP endpoint (--remote-debugging-port).
  • Playwright MCP CDP Connection: Connect the MCP instance to the Puppeteer-launched browser via --cdp-endpoint. Note: Programmatic MCP launch is TBD.
  • Browser Lifecycle Management: Puppeteer manages starting and closing the browser process. Ensure integration into the agent lifecycle to avoid orphaned processes.
  • Webpack Compatibility: Avoid importing playwright-core directly into frontend or Webpack-bundled environments as it includes SVG and HTML resources that are incompatible with Webpack.
  • Backend Integration: All logic must be implemented on the backend. Tools should invoke service classes for modularity and reuse.

Tool API (Initial Proposal)

  • queryDom(selector?: string): string
    → returns the DOM sub-tree from the matching selector as HTML

Additional tool APIs can be added to support future requirements.

Related Issues and Dependencies

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions