MCP Screenshot Tool

A powerful AI-powered screenshot capture and image analysis tool that integrates with the Model Context Protocol (MCP) ecosystem. Designed with a flexible three-layer architecture for maximum extensibility and maintainability.

Features

Screenshot Capture: Capture full screen, specific regions, or web pages
AI-Powered Analysis: Describe images using Vertex AI/Gemini models
Expert Verification: Analyze visualizations with customizable AI expertise
Screenshot History: Store and organize screenshots automatically
BM25 Text Search: Find screenshots by content using SQLite FTS5
Image Similarity: Find visually similar screenshots using perceptual hashing
Combined Search: Flexible search by text content and/or visual similarity with normalized weights
MCP Integration: Use as an MCP server for AI agents
CLI & JSON Support: Unix-like command interface with JSON output

Installation

# Clone the repository
git clone https://github.com/grahama1970/mcp-screenshot.git
cd mcp-screenshot

# Install with uv (recommended)
uv venv --python=3.10.11 .venv
source .venv/bin/activate
uv pip install -e .

# Or install with pip
pip install -e .

Configuration

Create a .env file in the project root:

VERTEX_AI_PROJECT=your-project-id
VERTEX_AI_LOCATION=us-central1
VERTEX_AI_SERVICE_ACCOUNT_FILE=./vertex_ai_service_account.json
VERTEX_AI_MODEL=vertex_ai/gemini-2.5-flash-preview-04-17

Add your Vertex AI service account JSON file to the project root

Quick Start

# Take a screenshot and describe it
mcp-screenshot capture
mcp-screenshot describe --file screenshot.jpg

# Verify a visualization with expert mode
mcp-screenshot verify chart.png --expert chart

# Get JSON output for scripting
mcp-screenshot --json describe --file image.jpg

Command Reference

Global Options

Option	Description	Example
`--json`	Output as JSON	`mcp-screenshot --json capture`
`--debug`	Enable debug logging	`mcp-screenshot --debug capture`
`--help`	Show help message	`mcp-screenshot --help`

capture - Screenshot Capture

Capture screenshots of screen regions or web pages.

Option	Short	Description	Default	Example
`--url`	`-u`	URL to capture	None	`--url https://example.com`
`--quality`	`-q`	Image quality (30-90)	70	`--quality 80`
`--region`	`-r`	Screen region	full	`--region left-half`
`--output`	`-o`	Output filename	timestamped	`--output screen.jpg`
`--output-dir`	`-d`	Output directory	./	`--output-dir ./screenshots`
`--wait`	`-w`	Wait seconds (for URLs)	3	`--wait 5`
`--zoom-center`	`-z`	Center point for zoom (x,y)	None	`--zoom-center 640,480`
`--zoom-factor`	`-zf`	Zoom factor (1.0-10.0)	1.0	`--zoom-factor 2.0`

Available regions: full, left-half, right-half, top-half, bottom-half, center

# Examples
mcp-screenshot capture                           # Full screen
mcp-screenshot capture --region left-half        # Left half of screen
mcp-screenshot capture --url https://d3js.org    # Web page
mcp-screenshot capture --output shot.jpg --quality 85

# Zoom examples
mcp-screenshot capture --zoom-center 800,600 --zoom-factor 2.0
mcp-screenshot capture --zoom-center 400,300 --zoom-factor 4.0 --output zoomed.jpg
mcp-screenshot capture --region center --zoom-center 640,360 --zoom-factor 1.5

describe - AI Image Description

Get AI-powered descriptions of images.

Option	Short	Description	Default	Example
`--url`	`-u`	URL to capture & describe	None	`--url https://example.com`
`--file`	`-f`	Image file to describe	None	`--file chart.png`
`--prompt`	`-p`	Custom prompt	"Describe this image"	`--prompt "What colors are used?"`
`--model`	`-m`	AI model	gemini-2.5-flash	`--model vertex_ai/gemini-2.0`
`--quality`	`-q`	Quality (for URLs)	70	`--quality 80`

# Examples
mcp-screenshot describe --file graph.png
mcp-screenshot describe --url https://d3js.org --prompt "What type of visualization is this?"
mcp-screenshot --json describe --file chart.jpg

verify - Visualization Verification

Analyze visualizations with expert modes and feature detection.

Option	Short	Description	Default	Example
`target`		URL or file path	Required	`chart.png`
`--expert`	`-e`	Expert mode	None	`--expert chart`
`--features`	`-f`	Expected features	None	`--features "axes,legend"`
`--prompt`	`-p`	Custom prompt	None	`--prompt "Analyze the color scheme"`
`--model`	`-m`	AI model	gemini-2.5-flash	`--model vertex_ai/gemini-2.0`
`--quality`	`-q`	Screenshot quality	70	`--quality 80`
`--wait`	`-w`	Wait seconds (URLs)	3	`--wait 5`

Expert modes: d3, chart, graph, data-viz, or custom string

# Examples
mcp-screenshot verify chart.png --expert chart
mcp-screenshot verify https://d3js.org/example --expert d3 --wait 5
mcp-screenshot verify graph.jpg --features "nodes,edges,labels"
mcp-screenshot verify ui.png --prompt "As a UX expert, evaluate this interface"

regions - List Screen Regions

Show available screen regions for capture.

mcp-screenshot regions
mcp-screenshot --json regions

history - Screenshot History

View and manage screenshot history.

Option	Short	Description	Default	Example
`--limit`	`-l`	Number of screenshots	10	`--limit 20`
`--region`	`-r`	Filter by region	None	`--region center`

# Show recent screenshots
mcp-screenshot history

# Show more results
mcp-screenshot history --limit 20

# Filter by region
mcp-screenshot history --region left-half

search - Search by Content

Search screenshots by text content using BM25 ranking.

Option	Short	Description	Default	Example
`query`		Search query	Required	`"error message"`
`--limit`	`-l`	Maximum results	10	`--limit 5`
`--region`	`-r`	Filter by region	None	`--region center`
`--from`		Start date	None	`--from 2024-01-01`
`--to`		End date	None	`--to 2024-06-30`

# Search by content
mcp-screenshot search "login form"

# Limit results
mcp-screenshot search "dashboard" --limit 5

# Filter by date
mcp-screenshot search "chart" --from 2024-01-01 --to 2024-06-30

similar - Find Similar Images

Find visually similar screenshots using perceptual hashing.

Option	Short	Description	Default	Example
`image`		Reference image	Required	`image.jpg`
`--threshold`	`-t`	Similarity threshold	0.8	`--threshold 0.9`
`--limit`	`-l`	Maximum results	10	`--limit 5`
`--region`	`-r`	Filter by region	None	`--region center`

# Find similar images
mcp-screenshot similar reference.jpg

# Adjust similarity threshold
mcp-screenshot similar logo.png --threshold 0.9

# Limit results
mcp-screenshot similar ui.jpg --limit 5

combined - Flexible Search by Text and/or Image

Search by text content and/or visual similarity with intelligent weight normalization.

Option	Short	Description	Default	Example
`--text`	`-t`	Text to search for	None	`--text "login form"`
`--image`	`-i`	Reference image	None	`--image login.jpg`
`--text-weight`	`-tw`	Text search weight	1.0	`--text-weight 7`
`--image-weight`	`-iw`	Image search weight	1.0	`--image-weight 3`
`--limit`	`-l`	Maximum results	10	`--limit 5`
`--region`	`-r`	Filter by region	None	`--region center`

# Combined search (text + image, equal weights)
mcp-screenshot combined --text "error message" --image error.jpg

# Text-only search
mcp-screenshot combined --text "login form"

# Image-only search 
mcp-screenshot combined --image reference.jpg

# Adjust weights (70% text, 30% image)
mcp-screenshot combined --text "dashboard" --image ui.png --text-weight 7 --image-weight 3

# For agents (get JSON output)
mcp-screenshot --json combined --text "button" --image button.png

stats - History Statistics

Show screenshot history statistics.

mcp-screenshot stats
mcp-screenshot --json stats

cleanup - Delete Old Screenshots

Clean up old screenshots from history.

Option	Short	Description	Default	Example
`--days`	`-d`	Days to keep	30	`--days 7`
`--force`	`-f`	Skip confirmation	False	`--force`

# Clean up screenshots older than 30 days
mcp-screenshot cleanup

# Clean up screenshots older than 7 days (forced)
mcp-screenshot cleanup --days 7 --force

version - Show Version

Display version information.

mcp-screenshot version

MCP Server Usage

Run as an MCP server for AI agent integration:

python -m mcp_screenshot.mcp.server

Configure in your MCP client (e.g., Claude Code):

{
  "mcpServers": {
    "screenshot": {
      "command": "python",
      "args": ["-m", "mcp_screenshot.mcp.server"],
      "env": {
        "VERTEX_AI_PROJECT": "your-project-id",
        "VERTEX_AI_LOCATION": "us-central1"
      }
    }
  }
}

MCP Tools Available

Tool	Description	Parameters
`capture_screenshot`	Capture a screenshot	quality, region, output_dir, zoom_center, zoom_factor
`describe_image`	Describe an image with AI	file_path, prompt, model
`verify_visualization`	Verify with expert mode	file_path, expert_mode, features
`search_screenshots`	Search screenshot history	query, limit, region, date_from, date_to
`find_similar_images`	Find visually similar screenshots	image_path, threshold, limit, region
`combined_search`	Search by text and image	text_query, image_path, text_weight, image_weight

History and Search

The tool automatically saves all screenshots and descriptions in a SQLite database.

# View recent history
mcp-screenshot history

# Search by content
mcp-screenshot search "error message"

# Find similar images
mcp-screenshot similar reference.jpg

# Combined search (text + image)
mcp-screenshot combined --text "login form" --image login.jpg

# Get statistics
mcp-screenshot stats

# Clean up old screenshots
mcp-screenshot cleanup --days 30

Technical Details

Database: SQLite with FTS5 virtual table for full-text search
Text Ranking: BM25 for relevance scoring (like Elasticsearch)
Image Similarity: Perceptual hashing using the ImageHash library
Combined Search: Weighted combination of text and image scores with automatic normalization
Storage: Files are organized in ~/.mcp_screenshot/
- Database: ~/.mcp_screenshot/history.db
- Screenshots: ~/.mcp_screenshot/screenshots/

Advanced Usage

Zoom Feature

The capture command supports zooming in on specific coordinates:

--zoom-center x,y: Specify the center point for zoom (e.g., 640,480)
--zoom-factor: Set the zoom multiplication factor (1.0 to 10.0)

Zoom works by:

Capturing the full screenshot
Cropping around the specified center point
Enlarging the cropped area by the zoom factor

# Zoom 2x into the center of the screen
mcp-screenshot capture --zoom-center 640,360 --zoom-factor 2.0

# Zoom 3x into a specific UI element
mcp-screenshot capture --zoom-center 1200,400 --zoom-factor 3.0 --output ui_detail.jpg

# Combine with regions for more control
mcp-screenshot capture --region center --zoom-center 640,360 --zoom-factor 1.5

Note: Zoom is only available for screen captures, not URL captures.

JSON Output for Scripting

# Capture and get JSON output
OUTPUT=$(mcp-screenshot --json capture)
FILE=$(echo $OUTPUT | jq -r '.file')

# Describe the captured image
mcp-screenshot --json describe --file "$FILE" | jq '.description'

# Find similar images
REFERENCE=$(mcp-screenshot --json capture | jq -r '.file')
SIMILAR=$(mcp-screenshot --json similar "$REFERENCE" | jq -r '.results[0].storage_path')

# Compare them
mcp-screenshot compare "$REFERENCE" "$SIMILAR"

# Search history
mcp-screenshot --json search "dashboard" > search_results.json

Expert Mode Examples

# D3.js visualization analysis
mcp-screenshot verify d3_chart.html --expert d3 --features "axes,data-points,tooltip"

# Chart analysis
mcp-screenshot verify sales_chart.png --expert chart --features "trend-line,labels,legend"

# Custom expertise
mcp-screenshot verify ui_mockup.png --prompt "As a UI/UX expert, evaluate the visual hierarchy"

Batch Processing

# Process multiple images
for img in screenshots/*.png; do
  mcp-screenshot --json describe --file "$img" > "${img%.png}_description.json"
done

Environment Variables

Variable	Description	Default
`VERTEX_AI_PROJECT`	Google Cloud project ID	Required
`VERTEX_AI_LOCATION`	Vertex AI location	us-central1
`VERTEX_AI_SERVICE_ACCOUNT_FILE`	Path to service account JSON	Required
`VERTEX_AI_MODEL`	AI model to use	vertex_ai/gemini-2.5-flash-preview-04-17
`MAX_WIDTH`	Max image width for AI	1280
`MAX_HEIGHT`	Max image height for AI	1024
`DEFAULT_QUALITY`	Default JPEG quality	70

Three-Layer Architecture

This project follows a strict three-layer architecture:

1. Core Layer (`src/mcp_screenshot/core/`)

Pure business logic
No UI or integration dependencies
Screenshot capture, image processing, AI analysis
History storage and search (SQLite, BM25, perceptual hashing)

2. CLI Layer (`src/mcp_screenshot/cli/`)

Command-line interface
Rich formatting and user interaction
Depends only on Core layer

3. MCP Layer (`src/mcp_screenshot/mcp/`)

MCP server implementation
Protocol-specific wrappers
Depends on Core layer

Requirements

Python 3.10+
Google Cloud account with Vertex AI enabled
Service account with Vertex AI permissions
Display environment for screen capture (optional for file operations)

Troubleshooting

Common Issues

"$DISPLAY not set" error
- This occurs in headless environments
- Use file-based operations or web capture instead
- Consider using Xvfb for headless screen capture
"No module named 'google'" error
- Install Google Cloud dependencies: pip install google-cloud-aiplatform
Authentication errors
- Ensure service account JSON has Vertex AI permissions
- Check VERTEX_AI_PROJECT environment variable
Image too large errors
- Images are automatically resized to MAX_WIDTH/MAX_HEIGHT
- Adjust quality settings with --quality flag

License

MIT License - see LICENSE file for details

Contributing

Follow the three-layer architecture
Add tests for all new features
Update documentation for CLI changes
Ensure all commands have JSON output support

Support

For issues, questions, or contributions, please visit the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/mcp_screenshot		src/mcp_screenshot
tests		tests
.env.bak		.env.bak
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyproject.toml.bak		pyproject.toml.bak
run_tests.sh		run_tests.sh
test_annotations.json		test_annotations.json
test_history.py		test_history.py
test_history_file.py		test_history_file.py
test_script.py		test_script.py
test_sqlite.py		test_sqlite.py
uv.lock		uv.lock

License

grahama1970/mcp-screenshot

Folders and files

Latest commit

History

Repository files navigation

MCP Screenshot Tool

Features

Installation

Configuration

Quick Start

Command Reference

Global Options

capture - Screenshot Capture

describe - AI Image Description

verify - Visualization Verification

regions - List Screen Regions

history - Screenshot History

search - Search by Content

similar - Find Similar Images

combined - Flexible Search by Text and/or Image

stats - History Statistics

cleanup - Delete Old Screenshots

version - Show Version

MCP Server Usage

MCP Tools Available

History and Search

Technical Details

Advanced Usage

Zoom Feature

JSON Output for Scripting

Expert Mode Examples

Batch Processing

Environment Variables

Three-Layer Architecture

1. Core Layer (src/mcp_screenshot/core/)

2. CLI Layer (src/mcp_screenshot/cli/)

3. MCP Layer (src/mcp_screenshot/mcp/)

Requirements

Troubleshooting

Common Issues

License

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

1. Core Layer (`src/mcp_screenshot/core/`)

2. CLI Layer (`src/mcp_screenshot/cli/`)

3. MCP Layer (`src/mcp_screenshot/mcp/`)

Packages