ScreenEnv

A powerful Python library for creating and managing isolated desktop environments using Docker containers. ScreenEnv provides a sandboxed Ubuntu desktop environment with XFCE4 that you can programmatically control for GUI automation, testing, and development.

Features

🖥️ Isolated Desktop Environment: Full Ubuntu desktop with XFCE4 running in Docker
🎮 GUI Automation: Complete mouse and keyboard control
🌐 Web Automation: Built-in browser automation with Playwright
📹 Screen Recording: Capture video recordings of all actions
📸 Screenshot Capabilities: Desktop and browser screenshots
🖱️ Mouse Control: Click, drag, scroll, and mouse movement
⌨️ Keyboard Input: Text typing and key combinations
🪟 Window Management: Launch, activate, and close applications
📁 File Operations: Upload, download, and file management
🐚 Terminal Access: Execute commands and capture output
🤖 MCP Server Support: Model Context Protocol integration for AI/LLM automation
🐳 Docker Ready: Pre-built Docker image with all dependencies

Quick Start

Installation

Clone the repository:

git clone https://github.com/huggingface/screenenv
cd screenenv

Install the package (choose one):

latest release:

pip install screenenv
# or
uv pip install screenenv

from source:

pip install .
# or
uv sync

Basic Usage

from screenenv import Sandbox

# Create a sandbox environment
sandbox = Sandbox()

try:
    # Launch a terminal
    sandbox.launch("xfce4-terminal")

    # Type some text
    sandbox.write("echo 'Hello from ScreenEnv!'")
    sandbox.press("Enter")

    # Take a screenshot
    screenshot = sandbox.screenshot()
    with open("screenshot.png", "wb") as f:
        f.write(screenshot)

finally:
    # Clean up
    sandbox.close()

For usage, see the source code in examples/sandbox_demo.py

MCP Server Support

ScreenEnv includes full support for the Model Context Protocol (MCP), enabling seamless integration with AI/LLM systems for desktop automation.

What is MCP?

The Model Context Protocol (MCP) is a standard for AI assistants to interact with external tools and data sources. ScreenEnv's MCP server provides desktop automation capabilities that can be used by any MCP-compatible AI system.

MCP Server Features

30+ Automation Tools: Complete desktop control via MCP
Streamable HTTP Transport: Efficient communication protocol

Starting the MCP Server

from screenenv import MCPRemoteServer

# Start MCP server
server = MCPRemoteServer()

print(f"MCP Server URL: {server.server_url}")
print(f"Server Configuration: {server.mcp_server_json}")

MCP Client Usage

import asyncio
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from screenenv import MCPRemoteServer

async def mcp_automation():
    # Start MCP server
    server = MCPRemoteServer(headless=False)

    try:
        # Connect to MCP server
        async with streamablehttp_client(server.server_url) as (
            read_stream, write_stream, _
        ):
            async with ClientSession(read_stream, write_stream) as session:
                await session.initialize()

                # Launch terminal
                await session.call_tool("launch", {
                    "application": "xfce4-terminal",
                    "wait_for_window": True
                })

                # Type commands
                await session.call_tool("write", {"text": "echo 'Hello MCP!'"})
                await session.call_tool("press", {"key": ["Enter"]})

                # Take screenshot
                response = await session.call_tool("screenshot", {})
                screenshot_base64 = response.content[0].data

                screenshot_bytes = base64.b64decode(screenshot_base64)
                image = Image.open(io.BytesIO(screenshot_bytes))
                image.save("screenshot.png")
                ...

                print("MCP automation completed!")

    finally:
        server.close()

# Run the automation
asyncio.run(mcp_automation())

Available MCP Tools

System Operations

execute_command - Execute shell commands
get_platform - Get system platform information
get_screen_size - Get screen dimensions
get_desktop_path - Get desktop directory path
get_directory_tree - List directory contents
get_file - Get file contents
download_file - Download file from URL
start_recording - Start screen recording
end_recording - End screen recording

Application Management

wait - Wait for specified milliseconds
open - Open file or URL
launch - Launch application
get_current_window_id - Get current window ID
get_application_windows - Get windows for application
get_window_name - Get window name/title
get_window_size - Get window size
activate_window - Activate window
close_window - Close window
get_terminal_output - Get terminal output

GUI Automation

screenshot - Take screenshot
left_click - Left click at coordinates
double_click - Double click at coordinates
right_click - Right click at coordinates
middle_click - Middle click at coordinates
scroll - Scroll mouse wheel
move_mouse - Move mouse to coordinates
mouse_press - Press mouse button
mouse_release - Release mouse button
get_cursor_position - Get cursor position
write - Type text
press - Press keys
drag - Drag mouse from one position to another

MCP Server Configuration

# Advanced MCP server configuration
server = MCPRemoteServer(
    os_type="Ubuntu",
    provider_type="docker",
    headless=True,
    resolution=(1920, 1080),
    disk_size="32G",
    ram_size="4G",
    cpu_cores="4",
    session_password="your_password",
    stream_server=True,
    dpi=96,
    timeout=1000
)

Sandbox Instantiation

Basic Configuration

from screenenv import Sandbox

# Minimal configuration
sandbox = Sandbox()

# With custom settings
sandbox = Sandbox(
    os_type="Ubuntu",           # Currently only Ubuntu is supported
    provider_type="docker",     # Currently only Docker is supported
    headless=True,              # Run without VNC viewer
    resolution=(1920, 1080),
    disk_size="32G",
    ram_size="4G",
    cpu_cores="4",
    session_password="your_password",
    stream_server=True,
    dpi=96,
    timeout=1000
)

Core Features

Mouse Control

# Click operations
sandbox.left_click(x=100, y=200)
sandbox.right_click(x=300, y=400)
sandbox.double_click(x=500, y=600)

# Mouse movement
sandbox.move_mouse(x=800, y=900)

# Drag and drop
sandbox.drag(fr=(100, 100), to=(200, 200))

# Scrolling
sandbox.scroll(direction="down", amount=3)

sandbox.mouse_release(button="left")

sandbox.mouse_press(button="left")
sandbox.mouse_release(button="left")

Keyboard Input

# Type text
sandbox.write("Hello, World!", delay_in_ms=50)

# Key combinations
sandbox.press(["Ctrl", "C"])  # Copy
sandbox.press(["Ctrl", "V"])  # Paste
sandbox.press(["Alt", "Tab"]) # Switch windows
sandbox.press("Enter")        # Single key

Application Management

# Launch applications
sandbox.launch("xfce4-terminal")
sandbox.launch("libreoffice --writer")
sandbox.open("https://www.google.com")

# Window management
windows = sandbox.get_application_windows("xfce4-terminal")
window_id = windows[0]
sandbox.activate_window(window_id)

window_id = sandbox.get_current_window_id() # get the current activate window id.
sandbox.window_size(window_id)
sandbox.get_window_title(window_id)
sandbox.close_window(window_id)

File Operations

# Upload files to sandbox
sandbox.upload_file_to_remote("local_file.txt", "/home/user/remote_file.txt")

# Download files from sandbox
sandbox.download_file_from_remote("/home/user/remote_file.txt", "local_file.txt")

# Download from URL
sandbox.download_url_file_to_remote("https://example.com/file.txt", "/home/user/file.txt")

Screenshots and Recording

# Start recording
sandbox.start_recording()

# Take screenshots
desktop_screenshot = sandbox.desktop_screenshot()

# Stop recording and save it locally to a file 'demo.mp4'
sandbox.end_recording("demo.mp4")

Terminal Operations

# Execute commands
response = sandbox.execute_command("ls -la")
print(response.output)

# Python commands
response = sandbox.execute_python_command("print('Hello')", ["os"])
print(response.output)

# Get terminal output
output = sandbox.get_terminal_output() # Only if a desktop terminal application is running. To get command output, use execute_command() instead.

Examples

Complete GUI Automation Demo

from screenenv import Sandbox
import time

def demo_automation():
    sandbox = Sandbox(headless=False)

    try:
        # Launch terminal
        sandbox.launch("xfce4-terminal")
        time.sleep(2)

        # Type commands
        sandbox.write("echo 'Starting automation demo'")
        sandbox.press("Enter")

        # Open web browser
        sandbox.open("https://www.python.org")
        time.sleep(3)

        # Take screenshot
        screenshot = sandbox.screenshot()
        with open("demo_screenshot.png", "wb") as f:
            f.write(screenshot)

    finally:
        sandbox.close()

if __name__ == "__main__":
    demo_automation()

Web Automation with Playwright

from screenenv import Sandbox

def web_automation():
    sandbox = Sandbox(headless=True)

    try:
        # Open website
        sandbox.open("https://www.example.com")

        # Take browser screenshot
        screenshot = sandbox.playwright_screenshot(full_page=True)
        with open("web_screenshot.png", "wb") as f:
            f.write(screenshot)

        playwright_browser = sandbox.playwright_browser()

    finally:
        sandbox.close()

MCP Server Demo

python -m examples.mcp_server_demo # or sudo -E python -m examples.mcp_server_demo if not in docker group

Sandbox Demo

python -m examples.sandbox_demo # or sudo -E python -m examples.sandbox_demo if not in docker group

Desktop Agent Demo

python -m examples.desktop_agent # or sudo -E python -m examples.desktop_agent if not in docker group

System Requirements

Docker: Must be installed and running
Python: 3.10 or higher
Playwright: For web automation features
Memory: At least 4GB RAM recommended

Sandbox Image

The sandbox uses a custom Ubuntu 22.04 Docker image with:

XFCE4 desktop environment
VNC server for remote access
Google Chrome/Chromium browser
LibreOffice suite
Python development tools
MCP server support

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github		.github
.vscode		.vscode
examples		examples
src/screenenv		src/screenenv
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

huggingface/screenenv

Folders and files

Latest commit

History

Repository files navigation

ScreenEnv

Features

Quick Start

Installation

Basic Usage

MCP Server Support

What is MCP?

MCP Server Features

Starting the MCP Server

MCP Client Usage

Available MCP Tools

System Operations

Application Management

GUI Automation

MCP Server Configuration

Sandbox Instantiation

Basic Configuration

Core Features

Mouse Control

Keyboard Input

Application Management

File Operations

Screenshots and Recording

Terminal Operations

Examples

Complete GUI Automation Demo

Web Automation with Playwright

MCP Server Demo

Sandbox Demo

Desktop Agent Demo

System Requirements

Sandbox Image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages