A powerful CORS proxy service designed for Chrome extensions and web applications. This service provides multiple endpoint formats compatible with popular CORS proxy services like AllOrigins, CorsProxy.io, HTMLDriven, and CodeTabs.
- 🌐 Multiple Endpoint Formats: Compatible with various CORS proxy services
- 🔒 Security: Built-in rate limiting and security headers
- 🚀 Performance: Compression and optimized request handling
- 📱 Chrome Extension Ready: Specifically designed for browser extensions
- 🛡️ Error Handling: Comprehensive error handling and validation
- 📊 Health Monitoring: Built-in health check endpoint
- 🧩 Modular Architecture: Extensible page extraction system
- 🎯 Smart Extraction: Automatic extractor selection based on content type
- 📋 Multiple Extraction Methods: Support for different content extraction strategies
- ⚡ Configurable Presets: Fast, comprehensive, mobile, and custom configurations
# Clone the repository
git clone https://github.com/ranbot-ai/cors-proxy
cd cors-proxy
# Install dependencies
npm install
# Start the server
npm start
# For development with auto-reload
npm run dev
Once the server is running, you can access it at http://localhost:3000
GET /get?url=<target-url>
Returns JSON response with wrapped content, similar to AllOrigins.
Example:
fetch('http://localhost:3000/get?url=https://api.github.com/users/octocat')
.then(response => response.json())
.then(data => console.log(data.contents));
GET|POST|PUT|DELETE /proxy?url=<target-url>
Direct proxy that returns the response as-is.
Example:
fetch('http://localhost:3000/proxy?url=https://api.github.com/users/octocat')
.then(response => response.json())
.then(data => console.log(data));
GET|POST /v1/proxy?quest=<target-url>
Compatible with CodeTabs proxy format.
Example:
fetch('http://localhost:3000/v1/proxy?quest=https://api.github.com/users/octocat')
.then(response => response.json())
.then(data => console.log(data));
GET /pagecontent?url=<target-url>&extractor=<extractor-name>&method=<extraction-method>
Extracts text content from a webpage using a headless browser. Returns cleaned document.body.innerText
with normalized whitespace, limited to 25,000 characters.
Parameters:
url
(required): Target URL to extract content fromextractor
(optional): Extractor instance to use (default, fast, comprehensive, mobile, etc.)method
(optional): Extraction method (innerText, textContent, custom)config
(optional): Configuration preset name
Example:
fetch('http://localhost:3000/pagecontent?url=https://example.com&extractor=comprehensive&method=innerText')
.then(response => response.json())
.then(data => {
console.log('Page title:', data.title);
console.log('Page content:', data.content);
console.log('Content length:', data.contentLength);
});
Response Format:
{
"success": true,
"url": "https://example.com",
"title": "Example Domain",
"content": "Example Domain This domain is for use in illustrative examples...",
"contentLength": 1234,
"description": "This domain is for use in illustrative examples...",
"language": "en",
"timestamp": "2024-01-01T00:00:00.000Z"
}
GET /smart-extract?url=<target-url>
Automatically selects the best extractor based on URL patterns and content type.
Example:
fetch('http://localhost:3000/smart-extract?url=https://news.example.com/article')
.then(response => response.json())
.then(data => {
console.log('Extractor used:', data.extractorUsed);
console.log('Content:', data.content);
});
GET /extract-multiple-methods?url=<target-url>&methods=<method1,method2>
Extracts content using multiple methods for comparison.
Example:
fetch('http://localhost:3000/extract-multiple-methods?url=https://example.com&methods=innerText,textContent')
.then(response => response.json())
.then(data => {
console.log('InnerText result:', data.methods.innerText);
console.log('TextContent result:', data.methods.textContent);
});
GET /extractor-status
Returns the status of all page extractors.
Example:
fetch('http://localhost:3000/extractor-status')
.then(response => response.json())
.then(data => {
console.log('Total extractors:', data.totalExtractors);
console.log('Extractors:', data.extractors);
});
GET|POST|PUT|DELETE /?<target-url>
Direct URL as query parameter, similar to CorsProxy.io.
Example:
fetch('http://localhost:3000/?https://api.github.com/users/octocat')
.then(response => response.json())
.then(data => console.log(data));
GET /health
Returns server health status and uptime.
The page extraction functionality has been restructured into a modular system for better maintainability and extensibility.
lib/
├── pageExtractor.js # Core extraction logic with Puppeteer
├── extractorConfig.js # Configuration presets and custom extractors
└── extractorManager.js # Manager for multiple extractor instances
- default: Balanced extraction (30s timeout, 25k chars)
- fast: Quick extraction (15s timeout, 10k chars)
- comprehensive: Detailed extraction (45s timeout, 50k chars, metadata)
- mobile: Mobile-optimized extraction (iPhone user agent)
- bot: Bot-friendly extraction (custom user agent)
- debug: Development mode (visible browser, dev tools)
- article: Extracts main article content using semantic selectors
- headings: Extracts only headings (h1-h6)
- links: Extracts links with their URLs
- structured: Extracts structured data (title, description, headings, content)
// Direct module usage
const { ExtractorManager } = require('./lib/extractorManager');
const manager = new ExtractorManager();
// Create extractors with different configurations
manager.createExtractor('fast', 'fast');
manager.createExtractor('comprehensive', 'comprehensive');
// Extract content
const result = await manager.extractContent('https://example.com', {}, 'fast');
// Smart extraction (automatic extractor selection)
const smartResult = await manager.smartExtract('https://news.example.com');
// Multiple methods
const multiResult = await manager.extractWithMultipleMethods(
'https://example.com',
['innerText', 'textContent']
);
// Include the CorsProxyClient class (from chrome-extension-example.js)
const proxyClient = new CorsProxyClient('https://your-proxy-domain.com');
// Handle messages from content scripts
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
if (request.action === 'fetchWithProxy') {
proxyClient.fetchWithProxy(request.url, request.options)
.then(response => sendResponse({ success: true, data: response }))
.catch(error => sendResponse({ success: false, error: error.message }));
return true; // Keep message channel open
}
});
// Function to fetch data through the proxy
function fetchFromContentScript(url, options = {}) {
return new Promise((resolve, reject) => {
chrome.runtime.sendMessage({
action: 'fetchWithProxy',
url,
options
}, response => {
if (response.success) {
resolve(response.data);
} else {
reject(new Error(response.error));
}
});
});
}
// Usage example
fetchFromContentScript('https://api.example.com/data')
.then(response => console.log(response.data))
.catch(error => console.error(error));
Create a .env
file in the root directory:
PORT=3000
NODE_ENV=development
RATE_LIMIT_POINTS=100
RATE_LIMIT_DURATION=60
REQUEST_TIMEOUT=10000
MAX_REQUEST_SIZE=10mb
ENABLE_LOGGING=true
The service includes built-in rate limiting:
- Default: 100 requests per minute per IP
- Configurable: Adjust via environment variables
- Error Response: 429 status code when limit exceeded
- Helmet.js: Security headers protection
- Rate Limiting: Prevents abuse
- Request Validation: URL validation and sanitization
- CORS Headers: Proper CORS configuration
- Request Size Limits: Prevents large payload attacks
The service provides comprehensive error handling:
{
"error": "Error type",
"message": "Detailed error message",
"status": {
"url": "requested-url",
"http_code": 500
}
}
Run the included test suite:
# Make sure the server is running first
npm start
# In another terminal, run tests
npm test
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
# Create Heroku app
heroku create your-cors-proxy
# Deploy
git push heroku main
# Set environment variables
heroku config:set NODE_ENV=production
heroku config:set RATE_LIMIT_POINTS=200
- Connect your repository
- Set environment variables
- Deploy with the start command:
npm start
For Chrome extensions, add the proxy domain to your manifest.json:
{
"manifest_version": 3,
"permissions": [
"activeTab"
],
"host_permissions": [
"https://your-proxy-domain.com/*"
]
}
- Use appropriate proxy type: Choose the right endpoint for your use case
- Implement fallback: Use multiple proxy types for reliability
- Cache responses: Cache frequently accessed data
- Monitor rate limits: Implement client-side rate limiting
- CORS errors: Ensure the proxy domain is added to host_permissions
- Rate limiting: Implement exponential backoff
- Timeout errors: Increase timeout values for slow APIs
- Large responses: Check MAX_REQUEST_SIZE setting
Enable debug logging:
ENABLE_LOGGING=true npm start
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the Chrome extension example code