Advanced Anti-Detection Web Scraping API with Comprehensive Fingerprinting Control
π― Unified Solution: Website + API on a single domain
π‘οΈ Advanced Anti-Detection: Canvas/WebGL/Audio spoofing, behavioral simulation
π§ Human-like Behavior: Bezier mouse movements, keyboard dynamics, natural scrolling
π Deploy Anywhere: Docker, Node.js+PM2, or Development
The future of intelligent web scraping is here
π― Revolutionary Features Coming:
- π€ AI-Powered Admin Panel - Intelligent task management & automation
- π¨ Modern React Frontend - Sleek, responsive dashboard interface
- π§ Smart Automation - AI-driven scraping strategies & optimization
- π Advanced Analytics - Real-time insights & performance metrics
- π Workflow Builder - Visual scraping pipeline creation
- ποΈ Enterprise Controls - Advanced user management & permissions
Transform your web scraping experience with the next generation of HeadlessX
- Canvas Fingerprinting Control - Dynamic noise injection with consistent seeds
- WebGL Spoofing - GPU vendor/model spoofing with realistic profiles
- Audio Context Manipulation - Hardware audio fingerprint database
- WebRTC Leak Prevention - Complete IP leak protection
- Hardware Fingerprint Spoofing - CPU, memory, and performance masking
- Bezier Mouse Movement - Natural acceleration and deceleration patterns
- Keyboard Dynamics - Realistic dwell time and flight time variations
- Natural Scroll Patterns - Reader, scanner, browser behavioral profiles
- Attention Model Simulation - Human-like focus and interaction patterns
- Micro-movement Injection - Sub-pixel accuracy for maximum realism
- Cloudflare Bypass - Advanced challenge solving and TLS fingerprinting
- DataDome Evasion - Resource blocking and behavioral pattern matching
- Incapsula/Akamai - Generic WAF bypass with adaptive techniques
- HTTP/2 Fingerprinting - Stream prioritization and header ordering
- 50+ Chrome Profiles - Desktop, mobile, and tablet configurations
- Hardware Consistency - CPU, GPU, memory, and sensor correlation
- Geolocation Intelligence - Timezone, language, and locale matching
- Profile Validation - Real-time consistency checking and scoring
Choose your deployment:
Method | Command | Best For |
---|---|---|
π³ Docker | docker-compose up -d |
Production, easy deployment |
π§ Auto Setup | chmod +x scripts/setup.sh && sudo ./scripts/setup.sh |
VPS/Server with full control |
π» Development | npm install && npm start |
Local development, testing |
Access your HeadlessX v1.3.0:
π Website: https://your-subdomain.yourdomain.com
π API: https://your-subdomain.yourdomain.com/api
π‘οΈ Stealth: https://your-subdomain.yourdomain.com/api/render/stealth
π§ͺ Testing: https://your-subdomain.yourdomain.com/api/test-fingerprint
π± Profiles: https://your-subdomain.yourdomain.com/api/profiles
π§ Health: https://your-subdomain.yourdomain.com/api/health
π Status: https://your-subdomain.yourdomain.com/api/status?token=YOUR_AUTH_TOKEN
HeadlessX v1.3.0 introduces advanced anti-detection capabilities with comprehensive fingerprinting control, behavioral simulation, and WAF bypass techniques while maintaining the modular architecture from v1.2.0.
- π‘οΈ Advanced Anti-Detection: Canvas, WebGL, Audio, WebRTC fingerprinting control
- π Behavioral Simulation: Human-like mouse movement with Bezier curves and keyboard dynamics
- π WAF Bypass: Cloudflare, DataDome, and advanced evasion techniques
- π± Device Profiling: Comprehensive desktop and mobile device profiles with hardware spoofing
- π§ͺ Testing Framework: Comprehensive anti-detection testing and validation
- π§ Separation of Concerns: Enhanced modules for fingerprinting, behavioral, and evasion services
- π Better Performance: Optimized browser management with intelligent profile-based pooling
- π οΈ Developer Experience: Development tools, profile generators, and interactive testing
- π¦ Production Ready: Enhanced error handling, advanced detection analytics, and profile validation
- π Security: Advanced authentication, profile management, and secure fingerprint storage
- π Monitoring: Real-time detection monitoring, success rate analytics, and performance benchmarking
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Routes βββββΆβ Controllers βββββΆβ Services β
β (api.js) β β (rendering.js)β β (browser.js) β
β (admin.js) β β (profiles.js) β β (stealth.js) β
βββββββββββββββββββ β (detection.js)β β (interaction.js)
β βββββββββββββββββββ βββββββββββββββββββ
βΌ β β
βββββββββββββββββββ βΌ βΌ
β Middleware β βββββββββββββββββββ βββββββββββββββββββ
β (auth.js) β β Utils β β Config β
β (error.js) β β (logger.js) β β (index.js) β
β (analyzer.js) β β (helpers.js) β β (browser.js) β
βββββββββββββββββββ β (validator.js)β β (profiles/) β
β βββββββββββββββββββ βββββββββββββββββββ
βΌ β β
βββββββββββββββββββ βΌ βΌ
β Fingerprinting β βββββββββββββββββββ βββββββββββββββββββ
β (canvas-spoof) β β Behavioral β β Evasion β
β (webgl-spoof) β β (mouse-movement)β β (cloudflare) β
β (audio-context) β β (keyboard-dyn) β β (datadome) β
β (webrtc-ctrl) β β (scroll-pattern)β β (waf-bypass) β
β (hardware-noise)β β (attention-mod) β β (tls-fingerpr) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Testing β β Development β β Profiles β
β (test-framework)β β (dev-tools) β β (chrome-prof) β
β (detection-test)β β (profile-gen) β β (mobile-prof) β
β (performance) β β (fingerpr-test) β β (firefox-prof) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Migration from v1.2.0:
- All v1.2.0 functionality preserved with enhanced anti-detection capabilities
- New environment variables for fingerprint control and stealth configuration
- Enhanced API endpoints for profile management and detection testing
- Backward compatible with all existing configurations and scripts
π Detailed Documentation: MODULAR_ARCHITECTURE.md
# Install Docker (if needed)
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Deploy HeadlessX
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure DOMAIN, SUBDOMAIN, AUTH_TOKEN
# Start services
docker-compose up -d
# Optional: Setup SSL
apt install certbot python3-certbot-nginx
certbot --nginx -d your-subdomain.yourdomain.com
Docker Management:
docker-compose ps # Check status
docker-compose logs headlessx # View logs
docker-compose restart # Restart services
docker-compose down # Stop services
# Automated setup (recommended)
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Configure environment
chmod +x scripts/setup.sh
sudo ./scripts/setup.sh # Installs dependencies, builds website, starts PM2
π Nginx Configuration (Auto-handled by setup script):
The setup script automatically configures nginx, but if you need to manually configure:
# Copy and configure nginx site
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace placeholders with your actual domain
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/your-subdomain.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Enable the site
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
# Test and reload nginx
sudo nginx -t && sudo systemctl reload nginx
Manual setup (if not using setup script):
sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs build-essential
npm install && npm run build
sudo npm install -g pm2
npm run pm2:start
PM2 Management:
npm run pm2:status # Check status
npm run pm2:logs # View logs
npm run pm2:restart # Restart server
npm run pm2:stop # Stop server
git clone https://github.com/SaifyXPRO/HeadlessX.git
cd HeadlessX
cp .env.example .env
nano .env # Set AUTH_TOKEN, DOMAIN=localhost, SUBDOMAIN=headlessx
# Make scripts executable
chmod +x scripts/*.sh
# Install dependencies
npm install
cd website && npm install && npm run build && cd ..
# Start development server
npm start # Access at http://localhost:3000
HeadlessX Routes:
βββ /favicon.ico β Favicon
βββ /robots.txt β SEO robots file
βββ /api/health β Health check (no auth required)
βββ /api/status β Server status (requires token)
βββ /api/render β Full page rendering
βββ /api/html β HTML extraction
βββ /api/content β Clean text extraction
βββ /api/screenshot β Screenshot generation
βββ /api/pdf β PDF generation
βββ /api/batch β Batch URL processing
π Request Flow:
- Nginx receives request on port 80/443
- Proxies to Node.js server on port 3000
- Server routes based on path:
/api/*
β API endpoints/*
β Website files (built Next.js app)
curl https://your-subdomain.yourdomain.com/api/health
curl -X POST "https://your-subdomain.yourdomain.com/api/render/stealth?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"profile": "desktop-chrome",
"stealthMode": "maximum",
"behaviorSimulation": true,
"timeout": 30000
}'
curl -X POST "https://your-subdomain.yourdomain.com/api/render?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"profile": "iphone-14-pro",
"geolocation": {"latitude": 40.7128, "longitude": -74.0060},
"behaviorSimulation": true
}'
curl -X POST "https://your-subdomain.yourdomain.com/api/test-fingerprint?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"profile": "desktop-chrome",
"testCanvas": true,
"testWebGL": true,
"testAudio": true
}'
curl "https://your-subdomain.yourdomain.com/api/profiles?token=YOUR_AUTH_TOKEN"
curl -X POST "https://your-subdomain.yourdomain.com/api/render?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"profile": "desktop-firefox",
"cloudflareBypass": true,
"datadomeBypass": true,
"mouseMovement": "natural",
"keyboardDynamics": "human",
"timeout": 45000
}'
curl -X POST "https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "timeout": 30000}'
curl "https://your-subdomain.yourdomain.com/api/screenshot?token=YOUR_AUTH_TOKEN&url=https://example.com&fullPage=true" \
-o screenshot.png
curl -X POST "https://your-subdomain.yourdomain.com/api/text?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "waitForSelector": "main"}'
curl -X POST "https://your-subdomain.yourdomain.com/api/pdf?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "format": "A4"}' \
-o document.pdf
HTTP Request Module Configuration:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"headers": {
"Content-Type": "application/json"
},
"qs": {
"token": "YOUR_AUTH_TOKEN"
},
"body": {
"url": "{{url_to_scrape}}",
"timeout": 30000,
"waitForSelector": "{{optional_selector}}"
}
}
Webhooks by Zapier Setup:
- URL:
https://your-subdomain.yourdomain.com/api/html?token=YOUR_AUTH_TOKEN
- Method: POST
- Headers:
Content-Type: application/json
- Body:
{
"url": "{{url_from_trigger}}",
"timeout": 30000,
"humanBehavior": true
}
HTTP Request Node:
{
"url": "https://your-subdomain.yourdomain.com/api/html",
"method": "POST",
"authentication": "queryAuth",
"query": {
"token": "YOUR_AUTH_TOKEN"
},
"headers": {
"Content-Type": "application/json"
},
"body": {
"url": "={{$json.url}}",
"timeout": 30000,
"humanBehavior": true
}
}
Available via n8n Community Node:
- Install:
npm install n8n-nodes-headlessx
- GitHub Repository
import requests
def scrape_with_headlessx(url, token):
response = requests.post(
"https://your-subdomain.yourdomain.com/api/html",
params={"token": token},
json={
"url": url,
"timeout": 30000,
"humanBehavior": True
}
)
return response.json()
# Usage
result = scrape_with_headlessx("https://example.com", "YOUR_TOKEN")
print(result['html'])
const axios = require('axios');
async function scrapeWithHeadlessX(url, token) {
try {
const response = await axios.post(
`https://your-subdomain.yourdomain.com/api/html?token=${token}`,
{
url: url,
timeout: 30000,
humanBehavior: true
}
);
return response.data;
} catch (error) {
console.error('Scraping failed:', error.message);
throw error;
}
}
// Usage
scrapeWithHeadlessX('https://example.com', 'YOUR_TOKEN')
.then(result => console.log(result.html))
.catch(error => console.error(error));
curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://example1.com",
"https://example2.com",
"https://example3.com"
],
"timeout": 30000,
"humanBehavior": true
}'
curl -X POST "https://your-subdomain.yourdomain.com/api/batch?token=YOUR_AUTH_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com", "https://httpbin.org"],
"format": "text",
"options": {"timeout": 30000}
}'
HeadlessX v1.3.0 - Enhanced Anti-Detection Architecture/
βββ π src/ # Modular application source
β βββ π config/ # Configuration management
β β βββ index.js # Main configuration loader
β β βββ browser.js # Browser-specific settings
β βββ π utils/ # Utility functions
β β βββ errors.js # Error handling & categorization
β β βββ logger.js # Structured logging
β β βββ helpers.js # Common utilities
β βββ π services/ # Business logic services
β β βββ browser.js # Browser lifecycle management
β β βββ stealth.js # Anti-detection techniques
β β βββ interaction.js # Human-like behavior
β β βββ rendering.js # Core rendering logic
β βββ π middleware/ # Express middleware
β β βββ auth.js # Authentication
β β βββ error.js # Error handling
β βββ π controllers/ # Request handlers
β β βββ system.js # Health & status endpoints
β β βββ rendering.js # Main rendering endpoints
β β βββ batch.js # Batch processing
β β βββ get.js # GET endpoints & docs
β βββ π routes/ # Route definitions
β β βββ api.js # API route mappings
β β βββ static.js # Static file serving
β βββ app.js # Main application setup
β βββ server.js # Entry point for PM2
β βββ rate-limiter.js # Rate limiting implementation
βββ π website/ # Next.js website (unchanged)
β βββ app/ # Next.js 13+ app directory
β βββ components/ # React components
β βββ .env.example # Website environment template
β βββ next.config.js # Next.js configuration
β βββ package.json # Website dependencies
βββ π scripts/ # Deployment & management scripts
β βββ setup.sh # Automated installation (updated)
β βββ update_server.sh # Server update script (updated)
β βββ verify-domain.sh # Domain verification
β βββ test-routing.sh # Integration testing
βββ π nginx/ # Nginx configuration
β βββ headlessx.conf # Nginx proxy config
βββ π docker/ # Docker deployment (updated)
β βββ Dockerfile # Container definition
β βββ docker-compose.yml # Docker Compose setup
βββ ecosystem.config.js # PM2 configuration (moved to root)
βββ .env.example # Environment template (updated)
βββ package.json # Server dependencies (updated)
βββ docs/
β βββ MODULAR_ARCHITECTURE.md # Architecture documentation
βββ README.md # This file
# 1. Install dependencies
npm install
# 2. Build website
cd website
npm install
npm run build
cd ..
# 3. Set environment variables
export AUTH_TOKEN="development_token_123"
export DOMAIN="localhost"
export SUBDOMAIN="headlessx"
# 4. Start server
npm start # Uses src/app.js
# 5. Access locally
# Website: http://localhost:3000
# API: http://localhost:3000/api/health
# Test server and website integration
bash scripts/test-routing.sh localhost
# Test with environment variables
bash scripts/verify-domain.sh
Create your .env
file from the template:
cp .env.example .env
nano .env
Required configuration:
# Security Token (Generate a secure random string)
AUTH_TOKEN=your_secure_token_here
# Domain Configuration
DOMAIN=yourdomain.com
SUBDOMAIN=headlessx
# Optional: Browser Settings
BROWSER_TIMEOUT=60000
MAX_CONCURRENT_BROWSERS=5
# Optional: Server Settings
PORT=3000
NODE_ENV=production
Option 1: Automatic (Recommended)
# The setup script automatically replaces domain placeholders
sudo ./scripts/setup.sh
Option 2: Manual Configuration
# Copy nginx configuration
sudo cp nginx/headlessx.conf /etc/nginx/sites-available/headlessx
# Replace domain placeholders (replace with your actual domain)
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/headlessx.yourdomain.com/g' /etc/nginx/sites-available/headlessx
# Example: If your domain is "api.example.com"
sudo sed -i 's/SUBDOMAIN.DOMAIN.COM/api.example.com/g' /etc/nginx/sites-available/headlessx
# Enable site and reload nginx
sudo ln -sf /etc/nginx/sites-available/headlessx /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Your final URLs will be:
- Website:
https://your-subdomain.yourdomain.com
- API Health:
https://your-subdomain.yourdomain.com/api/health
- API Endpoints:
https://your-subdomain.yourdomain.com/api/*
Endpoint | Method | Description | Auth Required |
---|---|---|---|
/api/health |
GET | Health check | β |
/api/status |
GET | Server status | β |
/api/render |
POST | Full page rendering (JSON) | β |
/api/html |
GET/POST | Raw HTML extraction | β |
/api/content |
GET/POST | Clean text extraction | β |
/api/screenshot |
GET | Screenshot generation | β |
/api/pdf |
GET | PDF generation | β |
/api/batch |
POST | Batch URL processing | β |
All endpoints (except /api/health
) require a token via:
- Query parameter:
?token=YOUR_TOKEN
- Header:
X-Token: YOUR_TOKEN
- Header:
Authorization: Bearer YOUR_TOKEN
Visit your HeadlessX website for full API documentation with examples, or check:
curl https://your-subdomain.yourdomain.com/api/health
curl "https://your-subdomain.yourdomain.com/api/status?token=YOUR_TOKEN"
# PM2 logs
npm run pm2:logs
pm2 logs headlessx --lines 100
# Docker logs
docker-compose logs -f headlessx
# Nginx logs
sudo tail -f /var/log/nginx/access.log
git pull origin main
npm run build # Rebuild website
npm run pm2:restart # PM2
# OR
docker-compose restart # Docker
"npm ci" Error (missing package-lock.json):
chmod +x scripts/generate-lockfiles.sh
./scripts/generate-lockfiles.sh # Generate lock files
# OR
npm install --production # Use install instead
"Cannot find module 'express'":
npm install # Install dependencies
System dependency errors (Ubuntu):
sudo apt update && sudo apt install -y \
libatk1.0-0t64 libatk-bridge2.0-0t64 libcups2t64 \
libatspi2.0-0t64 libasound2t64 libxcomposite1
PM2 not starting:
sudo npm install -g pm2
chmod +x scripts/setup.sh # Make script executable
pm2 start config/ecosystem.config.js
pm2 logs headlessx # Check errors
Script permission errors:
# Make all scripts executable
chmod +x scripts/*.sh
# Or use the quick setup
chmod +x scripts/quick-setup.sh && ./scripts/quick-setup.sh
Playwright browser installation errors:
# Use dedicated Playwright setup script
chmod +x scripts/setup-playwright.sh
./scripts/setup-playwright.sh
# Or install manually:
sudo apt update && sudo apt install -y \
libgtk-3-0t64 libpangocairo-1.0-0 libcairo-gobject2 \
libgdk-pixbuf-2.0-0 libdrm2 libxss1 libxrandr2 \
libasound2t64 libatk1.0-0t64 libnss3
# Install only Chromium (most stable)
npx playwright install chromium
# Alternative: Use Docker (avoids dependency issues)
docker-compose up -d
- Token Authentication: Secure API access with custom tokens
- Rate Limiting: Nginx-level request throttling
- Security Headers: XSS, CSRF, and clickjacking protection
- Bot Protection: Common attack vector blocking
- SSL/TLS: Automatic HTTPS with Let's Encrypt
We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your input is valuable.
- π Report Bugs: Create a bug report
- π‘ Suggest Features: Share your ideas
- π Improve Docs: Help make our documentation better
- π» Submit Code: Fork, code, and create a pull request
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Please read CONTRIBUTING.md for detailed guidelines.
Join our growing community of developers, data scientists, and automation enthusiasts!
- β Ask Questions: Stuck on something? Start a Q&A discussion
- π‘ Share Ideas: Have a feature idea? Create an idea post
- π¨ Showcase: Built something cool? Show it off!
- π’ Announcements: Stay updated with the latest news
- Be respectful and inclusive
- Help others learn and grow
- Share knowledge and experiences
- Report issues constructively
- Follow our Code of Conduct
This project is licensed under the MIT License - see the LICENSE file for details.
Resource | Description | Link |
---|---|---|
π Documentation | Complete API reference & guides | View Docs |
π Bug Reports | Found a bug? Report it here | Report Bug |
π‘ Feature Requests | Suggest new features | Request Feature |
π Security | Report security vulnerabilities | Security Policy |
π¬ Discussions | Community Q&A & discussions | Join Discussions |
π Project Board | Track development progress | View Board |
π Changelog | See what's new | View Changes |
πΊοΈ Roadmap | Future plans & features | View Roadmap |
- Installation Help: SETUP.md
- Troubleshooting: TROUBLESHOOTING.md
- API Documentation: GET_ENDPOINTS.md | POST_ENDPOINTS.md
- Architecture Guide: MODULAR_ARCHITECTURE.md
- Ethics & Responsible Use: ETHICS.md
HeadlessX v1.3.0 - The most advanced open-source anti-detection web scraping solution.
Made with π by developers, for developers
Made with β€οΈ for the developer community.