An MCP (Model Context Protocol) server that scrapes documentation websites and generates context
files. Built with proven x-crawl patterns for reliable web scraping.
- π·οΈ Smart Documentation Crawling - Uses x-crawl with documentation-specific enhancements
- π§ Platform Detection - Automatically detects GitBook, Docusaurus, VuePress, and other platforms
- π Clean Content Extraction - Removes navigation, ads, and preserves formatting
- π context Generation - Creates both
llms.txt
andllms-full.txt
formats - β‘ MCP Integration - Works seamlessly with any MCP client
- π¦ Local AI Processing - Full Ollama integration for privacy-focused AI features
- πΎ Robust File Operations - Enterprise-grade file writing with comprehensive error handling
- π§ Enhanced Tool Descriptions - LLM-optimized tool schemas with detailed usage guidance
Scrape an entire documentation website and extract content.
{
"url": "https://docs.example.com",
"options": {
"maxPages": 50,
"maxDepth": 3,
"outputFormat": "both",
"delayMs": 1000
}
}
Preview content extraction from a single page.
{
"url": "https://docs.example.com/getting-started"
}
Detect the documentation platform type for optimization.
{
"url": "https://docs.example.com"
}
Generate context format from crawled content.
{
"crawlResults": [...],
"options": {
"format": "full",
"includeSourceUrls": true
}
}
To install context-generator-mcp for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @pinkpixel-dev/context-generator-mcp --client claude
npm install
npm run build
npm start
npm run dev
Add to your MCP client configuration:
{
"context-generator": {
"command": "node",
"args": ["/path/to/context-generator-server/dist/index.js"]
}
}
For enhanced crawling and content processing, you can configure AI providers.
Benefits:
- π Privacy: All data stays on your machine
- π° Cost-effective: No API fees
- β‘ Fast: Local processing
- π Offline: Works without internet
Quick Setup:
# 1. Install Ollama: https://ollama.com/download
# 2. Pull a model
ollama pull llama3.1
# 3. Configure environment
echo "OLLAMA_MODEL=llama3.1" >> .env
# 4. Test the integration
npm run test:ollama
MCP Configuration:
{
"context-generator": {
"command": "node",
"args": ["/path/to/context-generator-server/dist/index.js"],
"env": {
"OLLAMA_MODEL": "llama3.1"
}
}
}
Benefits:
- π Powerful: Latest GPT models
- βοΈ No setup: Cloud-based processing
- π§ Maintenance-free: Always updated
Setup:
{
"context-generator": {
"command": "node",
"args": ["/path/to/context-generator-server/dist/index.js"],
"env": {
"OPENAI_API_KEY": "sk-your-openai-key",
"OPENAI_MODEL": "gpt-4"
}
}
}
{
"context-generator": {
"command": "node",
"args": ["/path/to/context-generator-server/dist/index.js"],
"env": {
"OPENAI_API_KEY": "sk-your-openai-key",
"OPENAI_MODEL": "gpt-4",
"OLLAMA_MODEL": "llama3.1",
"OLLAMA_BASE_URL": "http://localhost:11434"
}
}
}
Variable | Description | Default | Required |
---|---|---|---|
OpenAI | |||
OPENAI_API_KEY |
OpenAI API key for AI-assisted crawling | - | β (for OpenAI) |
OPENAI_MODEL |
OpenAI model to use | gpt-3.5-turbo |
β |
Ollama | |||
OLLAMA_MODEL |
Ollama model name (e.g., llama3.1 , codellama ) |
llama3.1 |
β (for Ollama) |
OLLAMA_BASE_URL |
Custom Ollama server URL | http://localhost:11434 |
β |
OLLAMA_API_KEY |
API key for hosted Ollama instances | - | β |
π Detailed Setup: See OLLAMA_SETUP.md for complete installation and configuration instructions.
π§ͺ Testing: Use
npm run test:ollama
to validate your Ollama setup.
β οΈ Note: AI integration is optional. The server works without these variables, but AI-enhanced features won't be available.