A powerful, feature-rich Python library to bypass Cloudflare's anti-bot protection with advanced stealth capabilities, async support, and comprehensive monitoring.
- 🔄 Async Support: High-performance concurrent scraping with
AsyncCloudScraper
- 🎭 Enhanced Stealth Mode: Advanced anti-detection with browser fingerprinting resistance
- 📊 Comprehensive Metrics: Real-time performance monitoring and health checks
- ⚡ Performance Optimization: Memory-efficient session management and request optimization
- 🔧 Configuration Management: YAML/JSON config files with environment variable support
- 🛡️ Advanced Security: Request signing and TLS fingerprinting
- 🧪 Robust Testing: Comprehensive test suite with 95%+ coverage
- 📈 Smart Proxy Management: Intelligent proxy rotation with health monitoring
- Multi-Challenge Support: Handles Cloudflare v1, v2, v3, and Turnstile challenges
- JavaScript Interpreters: js2py, nodejs, and native V8 support
- Browser Emulation: Chrome, Firefox, Safari fingerprinting
- CAPTCHA Integration: Support for 2captcha, Anti-Captcha, and more
- 🎭 Stealth Technology: Human-like browsing patterns with adaptive delays
- 🔄 Async/Await Support: High-throughput concurrent operations
- 📊 Performance Monitoring: Real-time metrics and optimization suggestions
- 🛡️ Security Features: Request signing and TLS fingerprinting
- 🔧 Smart Configuration: YAML/JSON configs with environment variables
- 📈 Intelligent Proxies: Smart rotation with automatic health monitoring
- 💾 Memory Efficient: Automatic cleanup and resource management
- 🧪 Comprehensive Testing: 95%+ test coverage with CI/CD
pip install cloudscraper
import cloudscraper
# Create a CloudScraper instance
scraper = cloudscraper.create_scraper()
# Use it like a regular requests session
response = scraper.get("https://example.com")
print(response.text)
import cloudscraper
# Create scraper with advanced options
scraper = cloudscraper.create_scraper(
browser='chrome',
debug=True,
enable_stealth=True,
stealth_options={
'min_delay': 1.0,
'max_delay': 3.0,
'human_like_delays': True,
'randomize_headers': True
},
rotating_proxies=[
'http://proxy1:8080',
'http://proxy2:8080'
],
proxy_options={
'rotation_strategy': 'smart',
'ban_time': 300
},
enable_metrics=True,
session_refresh_interval=3600
)
response = scraper.get('https://protected-site.com')
import asyncio
import cloudscraper
async def main():
async with cloudscraper.create_async_scraper(
max_concurrent_requests=10,
enable_stealth=True
) as scraper:
# Single request
response = await scraper.get('https://example.com')
# Batch requests
requests = [
{'method': 'GET', 'url': f'https://example.com/page{i}'}
for i in range(5)
]
responses = await scraper.batch_requests(requests)
# Get performance stats
stats = scraper.get_stats()
print(f"Total requests: {stats['total_requests']}")
asyncio.run(main())
import cloudscraper
# Load from YAML config
scraper = cloudscraper.create_scraper(config_file='scraper_config.yaml')
# Or from JSON
scraper = cloudscraper.create_scraper(config_file='scraper_config.json')
scraper_config.yaml:
debug: true
interpreter: js2py
enable_stealth: true
stealth_options:
min_delay: 0.5
max_delay: 2.0
human_like_delays: true
randomize_headers: true
rotating_proxies:
- "http://proxy1:8080"
- "http://proxy2:8080"
proxy_options:
rotation_strategy: "smart"
ban_time: 300
enable_metrics: true
That's it! The scraper will automatically handle any Cloudflare challenges it encounters.
Cloudflare's anti-bot protection works by presenting JavaScript challenges that must be solved before accessing the protected content. cloudscraper:
- Detects Cloudflare challenges automatically
- Solves JavaScript challenges using embedded interpreters
- Maintains session state and cookies
- Returns the protected content seamlessly
For reference, this is what Cloudflare's protection page looks like:
Checking your browser before accessing website.com.
This process is automatic. Your browser will redirect to your requested content shortly.
Please allow up to 5 seconds...
- Python 3.8+
- requests >= 2.31.0
- js2py >= 0.74 (default JavaScript interpreter)
- Additional optional dependencies for enhanced features
cloudscraper supports multiple JavaScript interpreters:
- js2py (default) - Pure Python implementation
- nodejs - Requires Node.js installation
- native - Built-in Python solver
- ChakraCore - Microsoft's JavaScript engine
- V8 - Google's JavaScript engine
import cloudscraper
# Create scraper instance
scraper = cloudscraper.create_scraper()
# Use like requests
response = scraper.get("https://protected-site.com")
print(response.text)
# Works with all HTTP methods
response = scraper.post("https://protected-site.com/api", json={"key": "value"})
Enable stealth techniques for better bypass success:
scraper = cloudscraper.create_scraper(
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True,
'simulate_viewport': True,
'behavioral_patterns': True
}
)
Configure stealth options for better success rates:
scraper = cloudscraper.create_scraper(
enable_stealth=True,
stealth_options={
'min_delay': 1.0,
'max_delay': 4.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True,
'simulate_viewport': True,
'behavioral_patterns': True
}
)
Choose specific browser fingerprints:
# Use Chrome fingerprint
scraper = cloudscraper.create_scraper(browser='chrome')
# Use Firefox fingerprint
scraper = cloudscraper.create_scraper(browser='firefox')
# Advanced browser configuration
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'windows',
'mobile': False
}
)
# Use specific interpreter
scraper = cloudscraper.create_scraper(interpreter='js2py')
scraper = cloudscraper.create_scraper(interpreter='nodejs')
scraper = cloudscraper.create_scraper(interpreter='native')
# Single proxy
scraper = cloudscraper.create_scraper()
scraper.proxies = {
'http': 'http://proxy:8080',
'https': 'http://proxy:8080'
}
# Proxy rotation
proxies = [
'http://proxy1:8080',
'http://proxy2:8080',
'http://proxy3:8080'
]
scraper = cloudscraper.create_scraper(
rotating_proxies=proxies,
proxy_options={
'rotation_strategy': 'smart',
'ban_time': 300
}
)
For sites with CAPTCHA challenges:
scraper = cloudscraper.create_scraper(
captcha={
'provider': '2captcha',
'api_key': 'your_api_key'
}
)
Supported CAPTCHA providers:
- 2captcha
- anticaptcha
- CapSolver
- CapMonster Cloud
- deathbycaptcha
- 9kw
import cloudscraper
scraper = cloudscraper.create_scraper()
# Simple GET request
response = scraper.get("https://example.com")
print(response.text)
# POST request with data
response = scraper.post("https://example.com/api", json={"key": "value"})
print(response.json())
import cloudscraper
# Maximum compatibility configuration
scraper = cloudscraper.create_scraper(
interpreter='js2py',
delay=5,
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True
},
browser='chrome',
debug=True
)
response = scraper.get("https://protected-site.com")
import cloudscraper
scraper = cloudscraper.create_scraper()
# Login to a site
login_data = {'username': 'user', 'password': 'pass'}
scraper.post("https://example.com/login", data=login_data)
# Make authenticated requests
response = scraper.get("https://example.com/dashboard")
Challenge solving fails:
# Try different interpreter
scraper = cloudscraper.create_scraper(interpreter='nodejs')
# Increase delay
scraper = cloudscraper.create_scraper(delay=10)
# Enable debug mode
scraper = cloudscraper.create_scraper(debug=True)
403 Forbidden errors:
# Enable stealth mode
scraper = cloudscraper.create_scraper(
enable_stealth=True,
auto_refresh_on_403=True
)
Slow performance:
# Use faster interpreter
scraper = cloudscraper.create_scraper(interpreter='native')
Enable debug mode to see what's happening:
scraper = cloudscraper.create_scraper(debug=True)
response = scraper.get("https://example.com")
# Debug output shows:
# - Challenge type detected
# - JavaScript interpreter used
# - Challenge solving process
# - Final response status
Parameter | Type | Default | Description |
---|---|---|---|
debug |
boolean | False | Enable debug output |
delay |
float | auto | Override challenge delay |
interpreter |
string | 'js2py' | JavaScript interpreter |
browser |
string/dict | None | Browser fingerprint |
enable_stealth |
boolean | True | Enable stealth mode |
allow_brotli |
boolean | True | Enable Brotli compression |
Parameter | Type | Default | Description |
---|---|---|---|
disableCloudflareV1 |
boolean | False | Disable v1 challenges |
disableCloudflareV2 |
boolean | False | Disable v2 challenges |
disableCloudflareV3 |
boolean | False | Disable v3 challenges |
disableTurnstile |
boolean | False | Disable Turnstile |
Parameter | Type | Default | Description |
---|---|---|---|
session_refresh_interval |
int | 3600 | Session refresh time (seconds) |
auto_refresh_on_403 |
boolean | True | Auto-refresh on 403 errors |
max_403_retries |
int | 3 | Max 403 retry attempts |
scraper = cloudscraper.create_scraper(
debug=True,
delay=5,
interpreter='js2py',
browser='chrome',
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True
}
)
Extract Cloudflare cookies for use in other applications:
import cloudscraper
# Get cookies as dictionary
tokens, user_agent = cloudscraper.get_tokens("https://example.com")
print(tokens)
# {'cf_clearance': '...', '__cfduid': '...'}
# Get cookies as string
cookie_string, user_agent = cloudscraper.get_cookie_string("https://example.com")
print(cookie_string)
# "cf_clearance=...; __cfduid=..."
Use cloudscraper tokens with curl or other HTTP clients:
import subprocess
import cloudscraper
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
result = subprocess.check_output([
'curl',
'--cookie', cookie_string,
'-A', user_agent,
'https://example.com'
])
MIT License. See LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
This tool is for educational and testing purposes only. Always respect website terms of service and use responsibly.