A Next.js application demonstrating DigitalOcean's AI platform capabilities, featuring:
- Gradient Integration: Chat with multiple LLM models powered by DigitalOcean's Gradient platform
- Playwright Browser Automation: Remote browser control through MCP (Model Context Protocol)
- DigitalOcean Spaces: Automatic file upload and optimization for media content
- Interactive Web Tools: Screenshot capture and browser automation capabilities
- Multi-Model Support: Access to various LLMs through DigitalOcean's Gradient (requires models with tool support - see Limitations section)
- Browser Control: AI can navigate websites, take screenshots, fill forms, and interact with web pages (OpenAI models recommended)
- Visual AI: Support for vision capabilities - AI can see and understand screenshots
- PDF Processing: AI can read and process PDF documents
- Media Support: Display images, videos, audio, PDFs, and documents inline
- Multi-Browser Support: Chromium, Firefox, Safari (WebKit), and Microsoft Edge
- Device Emulation: Simulate various devices (iPhones, iPads, Android devices)
- Resolution Presets: Common desktop and mobile resolutions
- Full Page Screenshots: Capture entire scrollable pages
- High Quality Mode: Toggle between compressed and high-quality screenshots
- Responsive Design: Full-width messages with proper mobile support
- Resizable Sidebar: Drag to resize between 280px-600px
- Syntax Highlighting: Code blocks with VS Code Dark+ theme
- Message Styling:
- User messages: Blue background (#3b82f6)
- Assistant messages: Green background (#22c55e)
- Collapsible Content: Large outputs automatically collapse with expand/collapse controls
- Debug Mode: Toggle to view raw message JSON for development
- Model Parameters: Adjustable temperature, max tokens, top P, and frequency penalty
- Streaming Responses: Real-time token streaming with visual indicators
- Error Handling: Graceful error display with retry capabilities
- Token Optimization: Replace large base64 strings with presigned URLs, automatically uploaded to DigitalOcean Spaces
- Concurrent Processing: Batch uploads with concurrency limits for performance
- MCP Protocol Support: Full implementation of Model Context Protocol for tool integration
- Streamable HTTP Transport: Real-time communication with MCP servers
- Keyboard Shortcuts:
- OS-aware shortcut display (shows ⌘ on Mac, Ctrl on others): Clear chat and start new conversation
-
Next.js Web App (Port 3000)
- Main application with chat and screenshotter interfaces
- Server-side API routes for AI and browser operations
- React components with TypeScript
-
Playwright Server (Port 8081)
- Headless browser instance management
- WebSocket API for browser control
- Supports Chromium, Firefox, WebKit, and Edge
-
Playwright MCP Server (Port 8080)
- Model Context Protocol implementation
- Bridges AI tools with browser automation
- Provides screenshot, navigation, and interaction capabilities
/api/chat
- Main chat endpoint with streaming responses/api/gradient-models
- Fetch available AI models/api/screenshot
- Direct screenshot API/api/devices
- Get device emulation profiles
The chat interface with browser automation requires LLM models that support function calling/tools. Not all models available through Gradient support this feature.
The following models have been tested and confirmed to work with browser automation tools:
Model ID | Provider | Description | Performance |
---|---|---|---|
openai-gpt-41 |
OpenAI | GPT-4 Optimized | Best overall performance |
openai-gpt-4o |
OpenAI | GPT-4 Optimized | Better than mini, but not as good as 41 |
openai-gpt-4o-mini |
OpenAI | GPT-4 Optimized Mini | Cost-effective, fast |
alibaba-qwen3-32b |
Alibaba | Qwen 3 32B | Excellent open model |
deepseek-r1-distill-llama-70b |
DeepSeek | R1 Distilled Llama 70B | Powerful open model |
llama3.3-70b-instruct |
Meta | Llama 3.3 70B Instruct | High-quality open model |
mistral-nemo-instruct-2407 |
Mistral | Nemo Instruct 2407 | Efficient, good tool support |
Note: Other models that support function calling may also work but have not been fully tested.
The following models have limitations with browser automation in this template:
- Anthropic Claude models (Claude 3 Opus, Sonnet, Haiku) - While these models do support tools, the current implementation uses the AI SDK's OpenAI-compatible provider which doesn't properly support tool calling for Anthropic models through Gradient
- Most open-source models without function calling support
- Text-only models without tool capabilities
- Browser automation only works with tool-supporting models
- Without tool support, the chat will function as a standard LLM chat without browser control
- Screenshot tool requires Playwright servers to be running and accessible
- File uploads require configured DigitalOcean Spaces access
- Browser sessions are not maintained between messages - each browser action starts fresh (AI SDK limitation)
- This template uses the AI SDK with an OpenAI-compatible provider to communicate with Gradient
- Tool calling implementation follows OpenAI's function calling format
- The Playwright MCP server supports sessions for maintaining browser state across requests, but the AI SDK doesn't yet support MCP session management
- Future updates may add:
- Native support for Anthropic models once the AI SDK's provider properly supports their tool format through Gradient
- Session support once the AI SDK implements MCP session management
The application uses DigitalOcean Spaces (S3-compatible object storage) to optimize token usage by automatically uploading base64-encoded files and replacing them with presigned URLs.
-
Automatic Detection: The system detects base64 data in:
- Message content (images and files)
- Tool inputs (before execution)
- Tool outputs (after execution)
-
S3 Upload: Base64 data is uploaded to S3 with the structure:
/uploads/{uuid}/{original-filename}
-
URL Replacement: Base64 data is replaced with presigned URLs that expire after 7 days
-
Supported Formats: Most file types are supported, including:
- Images (PNG, JPEG, GIF, WebP, SVG)
- Videos (MP4, WebM)
- Audio (MP3, WAV, OGG)
- Documents (PDF, JSON, TXT, HTML, CSS, JS)
- Concurrent uploads with batching (max 10 simultaneous)
- Non-blocking async operations
- 7-day presigned URL expiration
- Automatic MIME type detection
- Node.js 22.14.0 or higher (< 23)
- Yarn 1.22.22
- Docker and Docker Compose (for running Playwright servers)
- DigitalOcean account with:
-
Clone and install:
git clone https://github.com/digitalocean/template-app-platform-gradient-cua-chat cd template-app-platform-playwright-mcp-cua yarn install
-
Configure environment:
cp .env.example .env.local
-
Update
.env.local
with your credentials (see Environment Variables section below for details) -
Start Playwright servers:
Using Docker Compose Locally (recommended):
docker-compose up -d
-
Start the app:
yarn dev
-
Access the application:
- Homepage: http://localhost:3000
- Chat: http://localhost:3000/chat
- Screenshotter: http://localhost:3000/screenshotter
-
Stop services (when using Docker Compose):
docker-compose down
The application requires several environment variables for different services. Copy .env.example
to .env.local
and configure:
# Base URL for the Next.js application
# Set to your deployed app URL in production
NEXT_PUBLIC_BASE_URL="http://localhost:3000"
Gradient is DigitalOcean's AI platform for running LLMs.
# Get your API key from: https://cloud.digitalocean.com/ai-ml/inference
# How to create: https://docs.digitalocean.com/products/gradientai-platform/how-to/use-serverless-inference/#create
GRADIENT_API_KEY=your_gradient_api_key_here
# Gradient inference endpoint (typically doesn't need changes)
GRADIENT_BASE_URL="https://inference.do-ai.run/v1"
Spaces is DigitalOcean's S3-compatible object storage for uploading chat media.
# Create a Space: https://docs.digitalocean.com/products/spaces/how-to/create/
# Available regions: nyc3, ams3, sfo3, sgp1, fra1, syd1
DO_SPACES_ENDPOINT=https://nyc3.digitaloceanspaces.com
DO_SPACES_REGION=nyc3
# Generate keys: https://cloud.digitalocean.com/account/api/spaces
DO_SPACES_ACCESS_KEY=your_spaces_access_key_here
DO_SPACES_SECRET_KEY=your_spaces_secret_key_here
# Your Space name (must be globally unique)
DO_SPACES_BUCKET=your_bucket_name_here
The Playwright MCP server enables browser automation in chat:
# Default ports for local services (these are the defaults if not specified)
PLAYWRIGHT_SERVER_ENDPOINT=http://localhost:8081
PLAYWRIGHT_MCP_ENDPOINT=http://localhost:8080/mcp
Note: If these environment variables are not set, the application will automatically use the local development defaults shown above.
Option 1 - External Access (through public internet):
PLAYWRIGHT_SERVER_ENDPOINT=https://my-app-name.ondigitalocean.app/playwright-server
PLAYWRIGHT_MCP_ENDPOINT=https://my-app-name.ondigitalocean.app/playwright-mcp/mcp
Option 2 - Internal App Network Access (recommended for performance & security):
PLAYWRIGHT_SERVER_ENDPOINT=http://playwright-server:8081
PLAYWRIGHT_MCP_ENDPOINT=http://playwright-mcp:8080/mcp
- DigitalOcean account with billing enabled
- GitHub account with the repository forked
- The following DigitalOcean services configured:
Click the button above or use this link:
Fork this repository to your GitHub account so App Platform can access it.
- Go to DigitalOcean App Platform
- Click "Create App"
- Choose "GitHub" as your source
- Select your forked repository
You can either:
- Use the UI to configure components
- Upload the provided
.do/app.yaml
spec file
The app requires 3 components:
- Web Service: The Next.js application
- Worker 1: Playwright browser server
- Worker 2: Playwright MCP server
Configure these environment variables in the App Platform settings (see Environment Variables section above for details):
Required Secrets:
GRADIENT_API_KEY
- Your Gradient API keyDO_SPACES_ACCESS_KEY
- Your Spaces access keyDO_SPACES_SECRET_KEY
- Your Spaces secret keyDO_SPACES_BUCKET
- Your Spaces bucket nameDO_SPACES_ENDPOINT
- Your Spaces endpoint (e.g., https://nyc3.digitaloceanspaces.com)DO_SPACES_REGION
- Your Spaces region (e.g., nyc3)
Required for Production (choose one option):
- For internal networking (recommended):
PLAYWRIGHT_SERVER_ENDPOINT=http://playwright-server:8081
PLAYWRIGHT_MCP_ENDPOINT=http://playwright-mcp:8080/mcp
- For external access:
PLAYWRIGHT_SERVER_ENDPOINT=https://your-app-name.ondigitalocean.app/playwright-server
PLAYWRIGHT_MCP_ENDPOINT=https://your-app-name.ondigitalocean.app/playwright-mcp/mcp
- Instance Size: Basic XXS (512 MB RAM, 1 vCPU)
- HTTP Port: 3000
- Routes: /
- Instance Size: Professional XS (1 GB RAM, 1 vCPU)
- Internal Port: 8081
- Dockerfile: Dockerfile.playwright
- Instance Size: Professional XS (1 GB RAM, 1 vCPU)
- Internal Port: 8080
- Dockerfile: Dockerfile.mcp
Click "Create Resources" to start the deployment. The initial build may take 10-15 minutes.
- Check that all 3 components show as running and healthy
- Visit your app URL to see the homepage
- Test the Chat interface
- Test the Screenshotter tool
Use the App Platform metrics to monitor:
- CPU and memory usage
- Request rates
- Error logs
If the build fails:
- Check the build logs for errors
- Ensure the correct values are in the arguments to the runners
- Verify the Dockerfiles are correct
If services can't communicate:
- Use internal hostnames (playwright-server, playwright-mcp)
- Check the internal ports are correct
- Verify environment variables point to internal URLs
- API Keys: Always use App Platform secrets for sensitive values
- Network: Use internal networking between components
- Spaces: Configure bucket policies to restrict access
- Updates: Keep dependencies updated for security patches
├── app/
│ ├── api/ # API routes
│ │ ├── chat/ # Main chat endpoint
│ │ ├── gradient-models/ # Model listing
│ │ ├── screenshot/ # Screenshot API
│ │ └── devices/ # Device profiles
│ ├── chat/ # Chat interface
│ ├── screenshotter/ # Screenshot tool
│ └── page.tsx # Homepage
├── components/
│ ├── chat/ # Chat UI components
│ │ ├── ChatSidebar.tsx
│ │ ├── Message.tsx
│ │ └── MessagesArea.tsx
│ └── media-renderers/ # Media display components
│ ├── MediaRenderer.tsx # Main router
│ ├── PDFRenderer.tsx # PDF viewer
│ └── DocumentRenderer.tsx # Documents
├── lib/
│ ├── mcp-transport.ts # MCP WebSocket client
│ ├── tool-handlers.tsx # Tool result rendering
│ └── s3-utils.ts # Spaces upload logic
├── hooks/ # React hooks
├── Dockerfile.mcp # MCP server image
└── Dockerfile.playwright # Browser server image
# Development with Turbopack
yarn dev
# Production build
yarn build
yarn start
# Testing
yarn test # Run all tests
yarn test:watch # Watch mode
yarn test:coverage # Coverage report
# Linting
yarn lint
The project includes comprehensive test coverage:
- Unit tests for components
- API route tests
- Hook tests
- Utility function tests
Run yarn test:coverage
to see the full coverage report.
-
"Bad Request" errors in chat
- Most common cause: Max Tokens setting in Advanced Settings is too high
- See "Max Tokens Configuration" section below for detailed explanation
-
"Cannot connect to Playwright server"
- Ensure both Playwright containers are running
- Check ports 8080 and 8081 are not in use (when running locally)
- Verify environment variables are set correctly
-
"Gradient API error"
- Verify your API key is correct
- Check you have access to Gradient
- Ensure you're not exceeding rate limits
-
"Spaces upload failed"
- Verify bucket exists and is accessible
- Check access keys have write permissions
- Ensure bucket name is globally unique
-
"Screenshot timeout"
- Check Playwright server is running and reachable
- Try different browser options
- Check if the site requires authentication
The most common cause of "Bad Request" errors in the chat interface is incorrect Max Tokens settings in the Advanced Settings panel.
The number of tokens a model generates is determined by:
generated_tokens = min(request.max_tokens, (model_context_length - prompt_token_length))
Where:
request.max_tokens
- The value you set in Advanced Settingsmodel_context_length
- The model's total context window (varies by model)prompt_token_length
- Tokens used by your messages + system prompt + tool definitions
-
Setting Max Tokens too high
- If you set Max Tokens to 32,000 but your prompt uses 30,000 tokens, the model can only generate 2,000 tokens
- If Max Tokens exceeds available space, you'll get a "Bad Request" error
-
Solution
- Start with a lower Max Tokens value (e.g., 4,096)
- If you get "max tokens reached" warnings, gradually increase it
- Monitor the token usage shown in the chat interface
-
Model-Specific Context Limits
- Each model has different context lengths
- Check the model's documentation for its specific limit
- Leave room for both input and output tokens
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
For issues specific to:
- App Platform: DigitalOcean Support
- This application: GitHub Issues
This is a template application provided by DigitalOcean. See LICENSE for details.