HARvest MCP Server enables AI coding agents to programmatically analyze API interactions and generate API wrappers. By analyzing browser network traffic (HAR files), it generates executable code that reproduces entire API workflows including authentication, dependency chains, and data extraction.
- 🧠 AI-Powered Analysis: Uses LLM function calling for intelligent request analysis
- 📊 Dependency Graph Management: Builds and manages complex API dependency chains
- 🔧 Interactive Debugging: Granular control with manual intervention capabilities
- ⚡ High Performance: <30ms analysis, supports 12+ concurrent sessions
- 🔍 Real-Time Inspection: Live access to analysis state via MCP resources
- 🛡️ Type-Safe: Full TypeScript implementation with stricter tsconfig flags and Biome rules.
- ✅ Comprehensive Testing: 100% coverage driven by a strict test-first (TDD) workflow
┌─────────────────────────────────────────────────────────────────┐
│ MCP Server (STDIO) │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Tools │ │ Resources │ │ Prompts │ │
│ │ │ │ │ │ │ │
│ │ session_* │ │ dag.json │ │ full_run │ │
│ │ analysis_* │ │ log.txt │ │ │ │
│ │ debug_* │ │ status.json │ │ │ │
│ │ codegen_* │ │ code.ts │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
# Clone the repository
git clone <repository-url>
cd harvest-mcp
# Install dependencies
bun install
# Build the project
bun run build
-
Start the server:
bun run start
-
Connect with MCP client:
# Using MCP Inspector (if available) mcp-inspector --transport stdio --command "bun run start"
-
Basic workflow:
// 1. Create a session const session = await tools.session.start({ harPath: './path/to/traffic.har', cookiePath: './path/to/cookies.json', // optional prompt: 'Login to the application' }); // 2. Start primary workflow analysis await tools.analysis_start_primary_workflow({ sessionId: session.id }); // 3. Process dependencies while (!(await tools.analysis.is_complete({ sessionId: session.id }))) { await tools.analysis.process_next_node({ sessionId: session.id }); } // 4. Generate code const code = await tools.codegen.generate_wrapper_script({ sessionId: session.id });
Initializes a new analysis session.
Parameters:
harPath
(string): Path to HAR filecookiePath
(string, optional): Path to cookie fileprompt
(string): Description of the action to analyzeinputVariables
(object, optional): Pre-defined input variables
Returns:
{
"sessionId": "uuid-string"
}
Lists all active sessions.
Returns:
{
"sessions": [
{
"id": "uuid",
"prompt": "Login to application",
"createdAt": "2025-01-01T00:00:00Z",
"isComplete": false,
"nodeCount": 3
}
]
}
Deletes a session and cleans up resources.
Parameters:
sessionId
(string): Session to delete
Discover all workflows in the HAR file and automatically start analysis of the highest-priority workflow using the modern multi-workflow system.
Parameters:
sessionId
(string): Session ID
Returns:
{
"success": true,
"workflow": {
"id": "workflow-uuid",
"name": "Login Flow",
"description": "User authentication workflow"
},
"masterNode": {
"url": "https://api.example.com/login",
"method": "POST"
}
}
Processes the next unresolved node in the dependency graph.
Parameters:
sessionId
(string): Session ID
Returns:
{
"processedNodeId": "uuid",
"foundDependencies": [
{
"type": "request",
"nodeId": "uuid",
"extractedPart": "auth_token"
}
],
"status": "completed"
}
Checks if the dependency analysis is finished.
Parameters:
sessionId
(string): Session ID
Returns:
{
"isComplete": true
}
Returns nodes with unresolved dependencies.
Parameters:
sessionId
(string): Session ID
Returns:
{
"unresolvedNodes": [
{
"nodeId": "uuid",
"unresolvedParts": ["auth_token", "session_id"]
}
]
}
Gets detailed information about a specific node.
Parameters:
sessionId
(string): Session IDnodeId
(string): Node to inspect
Returns:
{
"nodeType": "curl",
"content": "curl -X POST ...",
"dynamicParts": ["auth_token"],
"extractedParts": ["user_id"],
"inputVariables": {"username": "user@example.com"}
}
Lists all available requests from the HAR file.
Parameters:
sessionId
(string): Session ID
Returns:
{
"requests": [
{
"method": "POST",
"url": "https://api.example.com/auth",
"responsePreview": "{\"token\":\"abc123\"...}"
}
]
}
Manually creates a dependency link in the DAG.
Parameters:
sessionId
(string): Session IDconsumerNodeId
(string): Node that needs the dependencyproviderNodeId
(string): Node that provides the dependencyprovidedPart
(string): The variable being provided
Generates the final TypeScript wrapper script.
Parameters:
sessionId
(string): Session ID (analysis must be complete)
Returns:
// Generated TypeScript code
async function authLogin(): Promise<ApiResponse> { /* ... */ }
async function searchDocuments(): Promise<ApiResponse> { /* ... */ }
async function loginAndSearchDocuments(): Promise<ApiResponse> { /* ... */ }
Real-time JSON representation of the dependency graph.
MIME Type: application/json
Example:
{
"nodes": {
"node-1": {
"type": "master_curl",
"content": "curl -X POST ...",
"dynamicParts": [],
"extractedParts": ["result"]
}
},
"edges": []
}
Plain-text analysis log with timestamps.
MIME Type: text/plain
Current analysis status and progress.
MIME Type: application/json
Generated TypeScript wrapper script (available after code generation).
MIME Type: text/typescript
Complete automated analysis from HAR to code generation.
Arguments:
harPath
(string): Path to HAR filecookiePath
(string, optional): Path to cookie fileprompt
(string): Description of the actioninputVariables
(object, optional): Pre-defined variables
Returns: Complete workflow results including generated code and analysis summary.
// Automated login analysis
const result = await prompts.harvest.full_run({
harPath: './login-traffic.har',
cookiePath: './cookies.json',
prompt: 'Login to the dashboard and navigate to settings'
});
console.log(result.generatedCode); // Complete TypeScript implementation
// Create session
const session = await tools.session.start({
harPath: './complex-workflow.har',
prompt: 'Multi-step document processing'
});
// Start analysis with modern workflow system
await tools.analysis_start_primary_workflow({ sessionId: session.id });
// Monitor progress
while (true) {
const complete = await tools.analysis.is_complete({ sessionId: session.id });
if (complete.isComplete) break;
try {
await tools.analysis.process_next_node({ sessionId: session.id });
} catch (error) {
// Handle failed dependency resolution
const unresolved = await tools.debug.get_unresolved_nodes({ sessionId: session.id });
console.log('Manual intervention needed:', unresolved);
// Example manual fix
await tools.debug.force_dependency({
sessionId: session.id,
consumerNodeId: 'problematic-node-id',
providerNodeId: 'auth-node-id',
providedPart: 'auth_token'
});
}
}
// Generate final code
const code = await tools.codegen.generate_wrapper_script({ sessionId: session.id });
// Monitor analysis state in real-time
const sessionId = 'your-session-id';
// View dependency graph
const dagResource = await resources.read(`harvest://${sessionId}/dag.json`);
console.log('Current DAG:', JSON.parse(dagResource.content));
// Check analysis logs
const logResource = await resources.read(`harvest://${sessionId}/log.txt`);
console.log('Analysis log:', logResource.content);
// Monitor status
const statusResource = await resources.read(`harvest://${sessionId}/status.json`);
const status = JSON.parse(statusResource.content);
console.log(`Progress: ${status.totalNodes - status.nodesRemaining}/${status.totalNodes} nodes processed`);
Based on comprehensive benchmarking:
- Analysis Speed: <60ms for typical workflows
- Tool Response Time: <1ms for non-LLM operations
- Memory Usage: ~16MB per active session
- Concurrent Sessions: Supports 12+ simultaneous sessions
- Bulk Operations: 15 sessions + 20 operations in ~200ms
- Bun: Package manager and runtime
- TypeScript: Strict typing enabled
- Node.js 18+: For compatibility
# Development server with hot reload
bun run dev
# Run tests
bun test # All tests
bun test:unit # Unit tests only
bun test:integration # Integration tests
bun test:e2e # End-to-end tests
bun test:coverage # With coverage report
# Code quality
bun run check # Lint and format check
bun run check:fix # Auto-fix issues
bun run typecheck # TypeScript validation
# Build
bun run build # Production build
src/
├── core/ # Core business logic
│ ├── SessionManager.ts # Stateful session management
│ ├── DAGManager.ts # Dependency graph operations
│ ├── HARParser.ts # HAR file processing
│ ├── LLMClient.ts # OpenAI integration
│ └── CodeGenerator.ts # TypeScript code generation
├── agents/ # Analysis agents
│ ├── DynamicPartsAgent.ts
│ ├── InputVariablesAgent.ts
│ └── DependencyAgent.ts
├── models/ # Data models
│ └── Request.ts # HTTP request modeling
├── types/ # TypeScript definitions
│ └── index.ts # Centralized types
└── server.ts # MCP server entry point
tests/
├── unit/ # Unit tests
├── integration/ # Integration tests
├── e2e/ # End-to-end tests
└── fixtures/ # Test data
-
New Analysis Agent:
// src/agents/MyNewAgent.ts export class MyNewAgent { static async analyze(session: HarvestSession): Promise<AnalysisResult> { // Implementation } }
-
New MCP Tool:
// In src/server.ts server.tool('my.new_tool', MyToolSchema, async (params) => { // Tool implementation return { content: [{ type: 'text', text: 'result' }] }; });
-
New Resource:
// Add to resource handler case 'my_resource.json': return { contents: [{ type: 'text', text: JSON.stringify(data) }], mimeType: 'application/json' };
- Unit Tests: Test individual functions and classes
- Integration Tests: Test MCP tool workflows
- E2E Tests: Test complete user scenarios
- Performance Tests: Validate speed and memory requirements
All tests use Vitest with comprehensive mocking for LLM calls.
HARvest MCP Server supports multiple LLM providers. Configure your preferred provider using environment variables:
# LLM Provider Selection (optional)
# Supported values: openai, gemini
# If not set, auto-detects based on available API keys
LLM_PROVIDER=openai
# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key-here
# Google Gemini Configuration
GOOGLE_API_KEY=your-google-api-key-here
# Model Configuration (optional)
# Overrides the default model for your provider
LLM_MODEL=gpt-4o # OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo
# Gemini: gemini-1.5-pro, gemini-1.5-flash, gemini-1.0-pro
If LLM_PROVIDER
is not set, the system automatically selects a provider based on available API keys:
- If
OPENAI_API_KEY
is present → Uses OpenAI - If only
GOOGLE_API_KEY
is present → Uses Gemini - If neither is present → Throws configuration error
OpenAI Models:
gpt-4o
(default) - Latest GPT-4 Optimizedgpt-4o-mini
- Smaller, faster variantgpt-4-turbo
- GPT-4 Turbogpt-4
- Standard GPT-4gpt-3.5-turbo
- Fast, cost-effective
Gemini Models:
gemini-1.5-pro
(default) - Most capablegemini-1.5-flash
- Faster, lighter variantgemini-1.0-pro
- Previous generation
// SessionManager configuration
const MAX_SESSIONS = 100; // Maximum concurrent sessions
const SESSION_TIMEOUT = 30 * 60 * 1000; // 30 minutes
const CLEANUP_INTERVAL = 5 * 60 * 1000; // 5 minutes
1. LLM API Failures
Error: URL identification failed: API call failed
- Check API key environment variables (
OPENAI_API_KEY
orGOOGLE_API_KEY
) - Verify the correct provider is selected (check
LLM_PROVIDER
) - Verify internet connectivity
- Check API status for your provider (OpenAI or Google)
2. HAR File Parsing Errors
Error: Failed to parse HAR file: Invalid JSON
- Ensure HAR file is valid JSON
- Check file permissions
- Verify file path is correct
3. Session Not Found
Error: Session not found: uuid
- Session may have expired (30 min timeout)
- Check session ID is correct
- Verify session was created successfully
4. Memory Issues
Out of memory errors
- Sessions automatically clean up after 30 minutes
- Manually delete unused sessions
- Check for memory leaks in custom code
Enable detailed logging:
# Enable MCP debug logging
DEBUG=mcp:* bun run start
# Server-side logging (stderr)
bun run start 2> debug.log
Use the built-in performance tests:
# Run performance benchmarks
bun test tests/integration/performance.test.ts
# Monitor memory usage
bun test tests/integration/performance.test.ts --reporter=verbose
- Local Execution Only: Server uses STDIO transport, no network binding
- File Path Validation: All file paths are validated to prevent traversal
- No Code Execution: Server only generates code, never executes it
- Session Isolation: Each session is completely isolated
- Automatic Cleanup: Session data is cleared on timeout/termination
Current test metrics:
- Total Tests: 279 tests across 24 files
- Pass Rate: 100.0% (test-driven development power)
- Coverage Areas: Unit, integration, E2E, and performance tests
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature
- Make changes with tests:
bun test
- Ensure code quality:
bun run check
- Submit a pull request
- TypeScript Strict Mode: No
any
types allowed - Test Coverage: >90% target coverage
- Performance: Tools must respond in <200ms
- Documentation: All public APIs documented
The approach adopted on this project - HAR file creation, DAG transversing and script generation - is inspired on the amazing work of Integuru agent.
- Issues: GitHub Issues
- Documentation: This README and inline code documentation
- Examples: See
tests/
directory for comprehensive examples