Skip to content

A powerful CORS proxy service designed for Chrome extensions and web applications. This service provides multiple endpoint formats compatible with popular CORS proxy services like AllOrigins, CorsProxy.io, HTMLDriven, and CodeTabs.

Notifications You must be signed in to change notification settings

ranbot-ai/cors-proxy

Repository files navigation

RanBOT CORS Proxy

A powerful CORS proxy service designed for Chrome extensions and web applications. This service provides multiple endpoint formats compatible with popular CORS proxy services like AllOrigins, CorsProxy.io, HTMLDriven, and CodeTabs.

Features

  • 🌐 Multiple Endpoint Formats: Compatible with various CORS proxy services
  • 🔒 Security: Built-in rate limiting and security headers
  • 🚀 Performance: Compression and optimized request handling
  • 📱 Chrome Extension Ready: Specifically designed for browser extensions
  • 🛡️ Error Handling: Comprehensive error handling and validation
  • 📊 Health Monitoring: Built-in health check endpoint
  • 🧩 Modular Architecture: Extensible page extraction system
  • 🎯 Smart Extraction: Automatic extractor selection based on content type
  • 📋 Multiple Extraction Methods: Support for different content extraction strategies
  • Configurable Presets: Fast, comprehensive, mobile, and custom configurations

Quick Start

Installation

# Clone the repository
git clone https://github.com/ranbot-ai/cors-proxy
cd cors-proxy

# Install dependencies
npm install

# Start the server
npm start

# For development with auto-reload
npm run dev

Basic Usage

Once the server is running, you can access it at http://localhost:3000

API Endpoints

1. AllOrigins-style Proxy

GET /get?url=<target-url>

Returns JSON response with wrapped content, similar to AllOrigins.

Example:

fetch('http://localhost:3000/get?url=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data.contents));

2. Simple Proxy

GET|POST|PUT|DELETE /proxy?url=<target-url>

Direct proxy that returns the response as-is.

Example:

fetch('http://localhost:3000/proxy?url=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

3. CodeTabs-style Proxy

GET|POST /v1/proxy?quest=<target-url>

Compatible with CodeTabs proxy format.

Example:

fetch('http://localhost:3000/v1/proxy?quest=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

4. Page Content Extraction

GET /pagecontent?url=<target-url>&extractor=<extractor-name>&method=<extraction-method>

Extracts text content from a webpage using a headless browser. Returns cleaned document.body.innerText with normalized whitespace, limited to 25,000 characters.

Parameters:

  • url (required): Target URL to extract content from
  • extractor (optional): Extractor instance to use (default, fast, comprehensive, mobile, etc.)
  • method (optional): Extraction method (innerText, textContent, custom)
  • config (optional): Configuration preset name

Example:

fetch('http://localhost:3000/pagecontent?url=https://example.com&extractor=comprehensive&method=innerText')
  .then(response => response.json())
  .then(data => {
    console.log('Page title:', data.title);
    console.log('Page content:', data.content);
    console.log('Content length:', data.contentLength);
  });

Response Format:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "Example Domain This domain is for use in illustrative examples...",
  "contentLength": 1234,
  "description": "This domain is for use in illustrative examples...",
  "language": "en",
  "timestamp": "2024-01-01T00:00:00.000Z"
}

4.1. Smart Content Extraction

GET /smart-extract?url=<target-url>

Automatically selects the best extractor based on URL patterns and content type.

Example:

fetch('http://localhost:3000/smart-extract?url=https://news.example.com/article')
  .then(response => response.json())
  .then(data => {
    console.log('Extractor used:', data.extractorUsed);
    console.log('Content:', data.content);
  });

4.2. Multiple Extraction Methods

GET /extract-multiple-methods?url=<target-url>&methods=<method1,method2>

Extracts content using multiple methods for comparison.

Example:

fetch('http://localhost:3000/extract-multiple-methods?url=https://example.com&methods=innerText,textContent')
  .then(response => response.json())
  .then(data => {
    console.log('InnerText result:', data.methods.innerText);
    console.log('TextContent result:', data.methods.textContent);
  });

4.3. Extractor Status

GET /extractor-status

Returns the status of all page extractors.

Example:

fetch('http://localhost:3000/extractor-status')
  .then(response => response.json())
  .then(data => {
    console.log('Total extractors:', data.totalExtractors);
    console.log('Extractors:', data.extractors);
  });

5. Direct Proxy (CorsProxy.io style)

GET|POST|PUT|DELETE /?<target-url>

Direct URL as query parameter, similar to CorsProxy.io.

Example:

fetch('http://localhost:3000/?https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

6. Health Check

GET /health

Returns server health status and uptime.

Modular Page Extraction System

The page extraction functionality has been restructured into a modular system for better maintainability and extensibility.

Architecture

lib/
├── pageExtractor.js     # Core extraction logic with Puppeteer
├── extractorConfig.js   # Configuration presets and custom extractors
└── extractorManager.js  # Manager for multiple extractor instances

Configuration Presets

  • default: Balanced extraction (30s timeout, 25k chars)
  • fast: Quick extraction (15s timeout, 10k chars)
  • comprehensive: Detailed extraction (45s timeout, 50k chars, metadata)
  • mobile: Mobile-optimized extraction (iPhone user agent)
  • bot: Bot-friendly extraction (custom user agent)
  • debug: Development mode (visible browser, dev tools)

Custom Extraction Methods

  • article: Extracts main article content using semantic selectors
  • headings: Extracts only headings (h1-h6)
  • links: Extracts links with their URLs
  • structured: Extracts structured data (title, description, headings, content)

Usage Examples

// Direct module usage
const { ExtractorManager } = require('./lib/extractorManager');
const manager = new ExtractorManager();

// Create extractors with different configurations
manager.createExtractor('fast', 'fast');
manager.createExtractor('comprehensive', 'comprehensive');

// Extract content
const result = await manager.extractContent('https://example.com', {}, 'fast');

// Smart extraction (automatic extractor selection)
const smartResult = await manager.smartExtract('https://news.example.com');

// Multiple methods
const multiResult = await manager.extractWithMultipleMethods(
  'https://example.com',
  ['innerText', 'textContent']
);

Chrome Extension Integration

Background Script Setup

// Include the CorsProxyClient class (from chrome-extension-example.js)
const proxyClient = new CorsProxyClient('https://your-proxy-domain.com');

// Handle messages from content scripts
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'fetchWithProxy') {
    proxyClient.fetchWithProxy(request.url, request.options)
      .then(response => sendResponse({ success: true, data: response }))
      .catch(error => sendResponse({ success: false, error: error.message }));
    return true; // Keep message channel open
  }
});

Content Script Usage

// Function to fetch data through the proxy
function fetchFromContentScript(url, options = {}) {
  return new Promise((resolve, reject) => {
    chrome.runtime.sendMessage({
      action: 'fetchWithProxy',
      url,
      options
    }, response => {
      if (response.success) {
        resolve(response.data);
      } else {
        reject(new Error(response.error));
      }
    });
  });
}

// Usage example
fetchFromContentScript('https://api.example.com/data')
  .then(response => console.log(response.data))
  .catch(error => console.error(error));

Configuration

Environment Variables

Create a .env file in the root directory:

PORT=3000
NODE_ENV=development
RATE_LIMIT_POINTS=100
RATE_LIMIT_DURATION=60
REQUEST_TIMEOUT=10000
MAX_REQUEST_SIZE=10mb
ENABLE_LOGGING=true

Rate Limiting

The service includes built-in rate limiting:

  • Default: 100 requests per minute per IP
  • Configurable: Adjust via environment variables
  • Error Response: 429 status code when limit exceeded

Security Features

  • Helmet.js: Security headers protection
  • Rate Limiting: Prevents abuse
  • Request Validation: URL validation and sanitization
  • CORS Headers: Proper CORS configuration
  • Request Size Limits: Prevents large payload attacks

Error Handling

The service provides comprehensive error handling:

{
  "error": "Error type",
  "message": "Detailed error message",
  "status": {
    "url": "requested-url",
    "http_code": 500
  }
}

Testing

Run the included test suite:

# Make sure the server is running first
npm start

# In another terminal, run tests
npm test

Deployment

Docker Deployment

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

Heroku Deployment

# Create Heroku app
heroku create your-cors-proxy

# Deploy
git push heroku main

# Set environment variables
heroku config:set NODE_ENV=production
heroku config:set RATE_LIMIT_POINTS=200

Railway/Render Deployment

  1. Connect your repository
  2. Set environment variables
  3. Deploy with the start command: npm start

Browser Extension Manifest

For Chrome extensions, add the proxy domain to your manifest.json:

{
  "manifest_version": 3,
  "permissions": [
    "activeTab"
  ],
  "host_permissions": [
    "https://your-proxy-domain.com/*"
  ]
}

Performance Tips

  1. Use appropriate proxy type: Choose the right endpoint for your use case
  2. Implement fallback: Use multiple proxy types for reliability
  3. Cache responses: Cache frequently accessed data
  4. Monitor rate limits: Implement client-side rate limiting

Troubleshooting

Common Issues

  1. CORS errors: Ensure the proxy domain is added to host_permissions
  2. Rate limiting: Implement exponential backoff
  3. Timeout errors: Increase timeout values for slow APIs
  4. Large responses: Check MAX_REQUEST_SIZE setting

Debug Mode

Enable debug logging:

ENABLE_LOGGING=true npm start

License

MIT License - see LICENSE file for details.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

Support

For issues and questions:

  • Create an issue on GitHub
  • Check the troubleshooting section
  • Review the Chrome extension example code

About

A powerful CORS proxy service designed for Chrome extensions and web applications. This service provides multiple endpoint formats compatible with popular CORS proxy services like AllOrigins, CorsProxy.io, HTMLDriven, and CodeTabs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published