A powerful AI-powered screenshot capture and image analysis tool that integrates with the Model Context Protocol (MCP) ecosystem. Designed with a flexible three-layer architecture for maximum extensibility and maintainability.
- Screenshot Capture: Capture full screen, specific regions, or web pages
- AI-Powered Analysis: Describe images using Vertex AI/Gemini models
- Expert Verification: Analyze visualizations with customizable AI expertise
- Screenshot History: Store and organize screenshots automatically
- BM25 Text Search: Find screenshots by content using SQLite FTS5
- Image Similarity: Find visually similar screenshots using perceptual hashing
- Combined Search: Flexible search by text content and/or visual similarity with normalized weights
- MCP Integration: Use as an MCP server for AI agents
- CLI & JSON Support: Unix-like command interface with JSON output
# Clone the repository
git clone https://github.com/grahama1970/mcp-screenshot.git
cd mcp-screenshot
# Install with uv (recommended)
uv venv --python=3.10.11 .venv
source .venv/bin/activate
uv pip install -e .
# Or install with pip
pip install -e .
- Create a
.env
file in the project root:
VERTEX_AI_PROJECT=your-project-id
VERTEX_AI_LOCATION=us-central1
VERTEX_AI_SERVICE_ACCOUNT_FILE=./vertex_ai_service_account.json
VERTEX_AI_MODEL=vertex_ai/gemini-2.5-flash-preview-04-17
- Add your Vertex AI service account JSON file to the project root
# Take a screenshot and describe it
mcp-screenshot capture
mcp-screenshot describe --file screenshot.jpg
# Verify a visualization with expert mode
mcp-screenshot verify chart.png --expert chart
# Get JSON output for scripting
mcp-screenshot --json describe --file image.jpg
Option | Short | Description | Example |
---|---|---|---|
--json |
Output as JSON | mcp-screenshot --json capture |
|
--debug |
Enable debug logging | mcp-screenshot --debug capture |
|
--help |
Show help message | mcp-screenshot --help |
Capture screenshots of screen regions or web pages.
Option | Short | Description | Default | Example |
---|---|---|---|---|
--url |
-u |
URL to capture | None | --url https://example.com |
--quality |
-q |
Image quality (30-90) | 70 | --quality 80 |
--region |
-r |
Screen region | full | --region left-half |
--output |
-o |
Output filename | timestamped | --output screen.jpg |
--output-dir |
-d |
Output directory | ./ | --output-dir ./screenshots |
--wait |
-w |
Wait seconds (for URLs) | 3 | --wait 5 |
--zoom-center |
-z |
Center point for zoom (x,y) | None | --zoom-center 640,480 |
--zoom-factor |
-zf |
Zoom factor (1.0-10.0) | 1.0 | --zoom-factor 2.0 |
Available regions: full, left-half, right-half, top-half, bottom-half, center
# Examples
mcp-screenshot capture # Full screen
mcp-screenshot capture --region left-half # Left half of screen
mcp-screenshot capture --url https://d3js.org # Web page
mcp-screenshot capture --output shot.jpg --quality 85
# Zoom examples
mcp-screenshot capture --zoom-center 800,600 --zoom-factor 2.0
mcp-screenshot capture --zoom-center 400,300 --zoom-factor 4.0 --output zoomed.jpg
mcp-screenshot capture --region center --zoom-center 640,360 --zoom-factor 1.5
Get AI-powered descriptions of images.
Option | Short | Description | Default | Example |
---|---|---|---|---|
--url |
-u |
URL to capture & describe | None | --url https://example.com |
--file |
-f |
Image file to describe | None | --file chart.png |
--prompt |
-p |
Custom prompt | "Describe this image" | --prompt "What colors are used?" |
--model |
-m |
AI model | gemini-2.5-flash | --model vertex_ai/gemini-2.0 |
--quality |
-q |
Quality (for URLs) | 70 | --quality 80 |
# Examples
mcp-screenshot describe --file graph.png
mcp-screenshot describe --url https://d3js.org --prompt "What type of visualization is this?"
mcp-screenshot --json describe --file chart.jpg
Analyze visualizations with expert modes and feature detection.
Option | Short | Description | Default | Example |
---|---|---|---|---|
target |
URL or file path | Required | chart.png |
|
--expert |
-e |
Expert mode | None | --expert chart |
--features |
-f |
Expected features | None | --features "axes,legend" |
--prompt |
-p |
Custom prompt | None | --prompt "Analyze the color scheme" |
--model |
-m |
AI model | gemini-2.5-flash | --model vertex_ai/gemini-2.0 |
--quality |
-q |
Screenshot quality | 70 | --quality 80 |
--wait |
-w |
Wait seconds (URLs) | 3 | --wait 5 |
Expert modes: d3, chart, graph, data-viz, or custom string
# Examples
mcp-screenshot verify chart.png --expert chart
mcp-screenshot verify https://d3js.org/example --expert d3 --wait 5
mcp-screenshot verify graph.jpg --features "nodes,edges,labels"
mcp-screenshot verify ui.png --prompt "As a UX expert, evaluate this interface"
Show available screen regions for capture.
mcp-screenshot regions
mcp-screenshot --json regions
View and manage screenshot history.
Option | Short | Description | Default | Example |
---|---|---|---|---|
--limit |
-l |
Number of screenshots | 10 | --limit 20 |
--region |
-r |
Filter by region | None | --region center |
# Show recent screenshots
mcp-screenshot history
# Show more results
mcp-screenshot history --limit 20
# Filter by region
mcp-screenshot history --region left-half
Search screenshots by text content using BM25 ranking.
Option | Short | Description | Default | Example |
---|---|---|---|---|
query |
Search query | Required | "error message" |
|
--limit |
-l |
Maximum results | 10 | --limit 5 |
--region |
-r |
Filter by region | None | --region center |
--from |
Start date | None | --from 2024-01-01 |
|
--to |
End date | None | --to 2024-06-30 |
# Search by content
mcp-screenshot search "login form"
# Limit results
mcp-screenshot search "dashboard" --limit 5
# Filter by date
mcp-screenshot search "chart" --from 2024-01-01 --to 2024-06-30
Find visually similar screenshots using perceptual hashing.
Option | Short | Description | Default | Example |
---|---|---|---|---|
image |
Reference image | Required | image.jpg |
|
--threshold |
-t |
Similarity threshold | 0.8 | --threshold 0.9 |
--limit |
-l |
Maximum results | 10 | --limit 5 |
--region |
-r |
Filter by region | None | --region center |
# Find similar images
mcp-screenshot similar reference.jpg
# Adjust similarity threshold
mcp-screenshot similar logo.png --threshold 0.9
# Limit results
mcp-screenshot similar ui.jpg --limit 5
Search by text content and/or visual similarity with intelligent weight normalization.
Option | Short | Description | Default | Example |
---|---|---|---|---|
--text |
-t |
Text to search for | None | --text "login form" |
--image |
-i |
Reference image | None | --image login.jpg |
--text-weight |
-tw |
Text search weight | 1.0 | --text-weight 7 |
--image-weight |
-iw |
Image search weight | 1.0 | --image-weight 3 |
--limit |
-l |
Maximum results | 10 | --limit 5 |
--region |
-r |
Filter by region | None | --region center |
# Combined search (text + image, equal weights)
mcp-screenshot combined --text "error message" --image error.jpg
# Text-only search
mcp-screenshot combined --text "login form"
# Image-only search
mcp-screenshot combined --image reference.jpg
# Adjust weights (70% text, 30% image)
mcp-screenshot combined --text "dashboard" --image ui.png --text-weight 7 --image-weight 3
# For agents (get JSON output)
mcp-screenshot --json combined --text "button" --image button.png
Show screenshot history statistics.
mcp-screenshot stats
mcp-screenshot --json stats
Clean up old screenshots from history.
Option | Short | Description | Default | Example |
---|---|---|---|---|
--days |
-d |
Days to keep | 30 | --days 7 |
--force |
-f |
Skip confirmation | False | --force |
# Clean up screenshots older than 30 days
mcp-screenshot cleanup
# Clean up screenshots older than 7 days (forced)
mcp-screenshot cleanup --days 7 --force
Display version information.
mcp-screenshot version
Run as an MCP server for AI agent integration:
python -m mcp_screenshot.mcp.server
Configure in your MCP client (e.g., Claude Code):
{
"mcpServers": {
"screenshot": {
"command": "python",
"args": ["-m", "mcp_screenshot.mcp.server"],
"env": {
"VERTEX_AI_PROJECT": "your-project-id",
"VERTEX_AI_LOCATION": "us-central1"
}
}
}
}
Tool | Description | Parameters |
---|---|---|
capture_screenshot |
Capture a screenshot | quality, region, output_dir, zoom_center, zoom_factor |
describe_image |
Describe an image with AI | file_path, prompt, model |
verify_visualization |
Verify with expert mode | file_path, expert_mode, features |
search_screenshots |
Search screenshot history | query, limit, region, date_from, date_to |
find_similar_images |
Find visually similar screenshots | image_path, threshold, limit, region |
combined_search |
Search by text and image | text_query, image_path, text_weight, image_weight |
The tool automatically saves all screenshots and descriptions in a SQLite database.
# View recent history
mcp-screenshot history
# Search by content
mcp-screenshot search "error message"
# Find similar images
mcp-screenshot similar reference.jpg
# Combined search (text + image)
mcp-screenshot combined --text "login form" --image login.jpg
# Get statistics
mcp-screenshot stats
# Clean up old screenshots
mcp-screenshot cleanup --days 30
- Database: SQLite with FTS5 virtual table for full-text search
- Text Ranking: BM25 for relevance scoring (like Elasticsearch)
- Image Similarity: Perceptual hashing using the ImageHash library
- Combined Search: Weighted combination of text and image scores with automatic normalization
- Storage: Files are organized in
~/.mcp_screenshot/
- Database:
~/.mcp_screenshot/history.db
- Screenshots:
~/.mcp_screenshot/screenshots/
- Database:
The capture command supports zooming in on specific coordinates:
--zoom-center x,y
: Specify the center point for zoom (e.g., 640,480)--zoom-factor
: Set the zoom multiplication factor (1.0 to 10.0)
Zoom works by:
- Capturing the full screenshot
- Cropping around the specified center point
- Enlarging the cropped area by the zoom factor
# Zoom 2x into the center of the screen
mcp-screenshot capture --zoom-center 640,360 --zoom-factor 2.0
# Zoom 3x into a specific UI element
mcp-screenshot capture --zoom-center 1200,400 --zoom-factor 3.0 --output ui_detail.jpg
# Combine with regions for more control
mcp-screenshot capture --region center --zoom-center 640,360 --zoom-factor 1.5
Note: Zoom is only available for screen captures, not URL captures.
# Capture and get JSON output
OUTPUT=$(mcp-screenshot --json capture)
FILE=$(echo $OUTPUT | jq -r '.file')
# Describe the captured image
mcp-screenshot --json describe --file "$FILE" | jq '.description'
# Find similar images
REFERENCE=$(mcp-screenshot --json capture | jq -r '.file')
SIMILAR=$(mcp-screenshot --json similar "$REFERENCE" | jq -r '.results[0].storage_path')
# Compare them
mcp-screenshot compare "$REFERENCE" "$SIMILAR"
# Search history
mcp-screenshot --json search "dashboard" > search_results.json
# D3.js visualization analysis
mcp-screenshot verify d3_chart.html --expert d3 --features "axes,data-points,tooltip"
# Chart analysis
mcp-screenshot verify sales_chart.png --expert chart --features "trend-line,labels,legend"
# Custom expertise
mcp-screenshot verify ui_mockup.png --prompt "As a UI/UX expert, evaluate the visual hierarchy"
# Process multiple images
for img in screenshots/*.png; do
mcp-screenshot --json describe --file "$img" > "${img%.png}_description.json"
done
Variable | Description | Default |
---|---|---|
VERTEX_AI_PROJECT |
Google Cloud project ID | Required |
VERTEX_AI_LOCATION |
Vertex AI location | us-central1 |
VERTEX_AI_SERVICE_ACCOUNT_FILE |
Path to service account JSON | Required |
VERTEX_AI_MODEL |
AI model to use | vertex_ai/gemini-2.5-flash-preview-04-17 |
MAX_WIDTH |
Max image width for AI | 1280 |
MAX_HEIGHT |
Max image height for AI | 1024 |
DEFAULT_QUALITY |
Default JPEG quality | 70 |
This project follows a strict three-layer architecture:
- Pure business logic
- No UI or integration dependencies
- Screenshot capture, image processing, AI analysis
- History storage and search (SQLite, BM25, perceptual hashing)
- Command-line interface
- Rich formatting and user interaction
- Depends only on Core layer
- MCP server implementation
- Protocol-specific wrappers
- Depends on Core layer
- Python 3.10+
- Google Cloud account with Vertex AI enabled
- Service account with Vertex AI permissions
- Display environment for screen capture (optional for file operations)
-
"$DISPLAY not set" error
- This occurs in headless environments
- Use file-based operations or web capture instead
- Consider using Xvfb for headless screen capture
-
"No module named 'google'" error
- Install Google Cloud dependencies:
pip install google-cloud-aiplatform
- Install Google Cloud dependencies:
-
Authentication errors
- Ensure service account JSON has Vertex AI permissions
- Check VERTEX_AI_PROJECT environment variable
-
Image too large errors
- Images are automatically resized to MAX_WIDTH/MAX_HEIGHT
- Adjust quality settings with
--quality
flag
MIT License - see LICENSE file for details
- Follow the three-layer architecture
- Add tests for all new features
- Update documentation for CLI changes
- Ensure all commands have JSON output support
For issues, questions, or contributions, please visit the GitHub repository.