RanBOT CORS Proxy

A powerful CORS proxy service designed for Chrome extensions and web applications. This service provides multiple endpoint formats compatible with popular CORS proxy services like AllOrigins, CorsProxy.io, HTMLDriven, and CodeTabs.

Features

🌐 Multiple Endpoint Formats: Compatible with various CORS proxy services
🔒 Security: Built-in rate limiting and security headers
🚀 Performance: Compression and optimized request handling
📱 Chrome Extension Ready: Specifically designed for browser extensions
🛡️ Error Handling: Comprehensive error handling and validation
📊 Health Monitoring: Built-in health check endpoint
🧩 Modular Architecture: Extensible page extraction system
🎯 Smart Extraction: Automatic extractor selection based on content type
📋 Multiple Extraction Methods: Support for different content extraction strategies
⚡ Configurable Presets: Fast, comprehensive, mobile, and custom configurations

Quick Start

Installation

# Clone the repository
git clone https://github.com/ranbot-ai/cors-proxy
cd cors-proxy

# Install dependencies
npm install

# Start the server
npm start

# For development with auto-reload
npm run dev

Basic Usage

Once the server is running, you can access it at http://localhost:3000

API Endpoints

1. AllOrigins-style Proxy

GET /get?url=<target-url>

Returns JSON response with wrapped content, similar to AllOrigins.

Example:

fetch('http://localhost:3000/get?url=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data.contents));

2. Simple Proxy

GET|POST|PUT|DELETE /proxy?url=<target-url>

Direct proxy that returns the response as-is.

Example:

fetch('http://localhost:3000/proxy?url=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

3. CodeTabs-style Proxy

GET|POST /v1/proxy?quest=<target-url>

Compatible with CodeTabs proxy format.

Example:

fetch('http://localhost:3000/v1/proxy?quest=https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

4. Page Content Extraction

GET /pagecontent?url=<target-url>&extractor=<extractor-name>&method=<extraction-method>

Extracts text content from a webpage using a headless browser. Returns cleaned document.body.innerText with normalized whitespace, limited to 25,000 characters.

Parameters:

url (required): Target URL to extract content from
extractor (optional): Extractor instance to use (default, fast, comprehensive, mobile, etc.)
method (optional): Extraction method (innerText, textContent, custom)
config (optional): Configuration preset name

Example:

fetch('http://localhost:3000/pagecontent?url=https://example.com&extractor=comprehensive&method=innerText')
  .then(response => response.json())
  .then(data => {
    console.log('Page title:', data.title);
    console.log('Page content:', data.content);
    console.log('Content length:', data.contentLength);
  });

Response Format:

{
  "success": true,
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "Example Domain This domain is for use in illustrative examples...",
  "contentLength": 1234,
  "description": "This domain is for use in illustrative examples...",
  "language": "en",
  "timestamp": "2024-01-01T00:00:00.000Z"
}

4.1. Smart Content Extraction

GET /smart-extract?url=<target-url>

Automatically selects the best extractor based on URL patterns and content type.

Example:

fetch('http://localhost:3000/smart-extract?url=https://news.example.com/article')
  .then(response => response.json())
  .then(data => {
    console.log('Extractor used:', data.extractorUsed);
    console.log('Content:', data.content);
  });

4.2. Multiple Extraction Methods

GET /extract-multiple-methods?url=<target-url>&methods=<method1,method2>

Extracts content using multiple methods for comparison.

Example:

fetch('http://localhost:3000/extract-multiple-methods?url=https://example.com&methods=innerText,textContent')
  .then(response => response.json())
  .then(data => {
    console.log('InnerText result:', data.methods.innerText);
    console.log('TextContent result:', data.methods.textContent);
  });

4.3. Extractor Status

GET /extractor-status

Returns the status of all page extractors.

Example:

fetch('http://localhost:3000/extractor-status')
  .then(response => response.json())
  .then(data => {
    console.log('Total extractors:', data.totalExtractors);
    console.log('Extractors:', data.extractors);
  });

5. Direct Proxy (CorsProxy.io style)

GET|POST|PUT|DELETE /?<target-url>

Direct URL as query parameter, similar to CorsProxy.io.

Example:

fetch('http://localhost:3000/?https://api.github.com/users/octocat')
  .then(response => response.json())
  .then(data => console.log(data));

6. Health Check

GET /health

Returns server health status and uptime.

Modular Page Extraction System

The page extraction functionality has been restructured into a modular system for better maintainability and extensibility.

Architecture

lib/
├── pageExtractor.js     # Core extraction logic with Puppeteer
├── extractorConfig.js   # Configuration presets and custom extractors
└── extractorManager.js  # Manager for multiple extractor instances

Configuration Presets

default: Balanced extraction (30s timeout, 25k chars)
fast: Quick extraction (15s timeout, 10k chars)
comprehensive: Detailed extraction (45s timeout, 50k chars, metadata)
mobile: Mobile-optimized extraction (iPhone user agent)
bot: Bot-friendly extraction (custom user agent)
debug: Development mode (visible browser, dev tools)

Custom Extraction Methods

article: Extracts main article content using semantic selectors
headings: Extracts only headings (h1-h6)
links: Extracts links with their URLs
structured: Extracts structured data (title, description, headings, content)

Usage Examples

// Direct module usage
const { ExtractorManager } = require('./lib/extractorManager');
const manager = new ExtractorManager();

// Create extractors with different configurations
manager.createExtractor('fast', 'fast');
manager.createExtractor('comprehensive', 'comprehensive');

// Extract content
const result = await manager.extractContent('https://example.com', {}, 'fast');

// Smart extraction (automatic extractor selection)
const smartResult = await manager.smartExtract('https://news.example.com');

// Multiple methods
const multiResult = await manager.extractWithMultipleMethods(
  'https://example.com',
  ['innerText', 'textContent']
);

Chrome Extension Integration

Background Script Setup

// Include the CorsProxyClient class (from chrome-extension-example.js)
const proxyClient = new CorsProxyClient('https://your-proxy-domain.com');

// Handle messages from content scripts
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
  if (request.action === 'fetchWithProxy') {
    proxyClient.fetchWithProxy(request.url, request.options)
      .then(response => sendResponse({ success: true, data: response }))
      .catch(error => sendResponse({ success: false, error: error.message }));
    return true; // Keep message channel open
  }
});

Content Script Usage

// Function to fetch data through the proxy
function fetchFromContentScript(url, options = {}) {
  return new Promise((resolve, reject) => {
    chrome.runtime.sendMessage({
      action: 'fetchWithProxy',
      url,
      options
    }, response => {
      if (response.success) {
        resolve(response.data);
      } else {
        reject(new Error(response.error));
      }
    });
  });
}

// Usage example
fetchFromContentScript('https://api.example.com/data')
  .then(response => console.log(response.data))
  .catch(error => console.error(error));

Configuration

Environment Variables

Create a .env file in the root directory:

PORT=3000
NODE_ENV=development
RATE_LIMIT_POINTS=100
RATE_LIMIT_DURATION=60
REQUEST_TIMEOUT=10000
MAX_REQUEST_SIZE=10mb
ENABLE_LOGGING=true

Rate Limiting

The service includes built-in rate limiting:

Default: 100 requests per minute per IP
Configurable: Adjust via environment variables
Error Response: 429 status code when limit exceeded

Security Features

Helmet.js: Security headers protection
Rate Limiting: Prevents abuse
Request Validation: URL validation and sanitization
CORS Headers: Proper CORS configuration
Request Size Limits: Prevents large payload attacks

Error Handling

The service provides comprehensive error handling:

{
  "error": "Error type",
  "message": "Detailed error message",
  "status": {
    "url": "requested-url",
    "http_code": 500
  }
}

Testing

Run the included test suite:

# Make sure the server is running first
npm start

# In another terminal, run tests
npm test

Deployment

Docker Deployment

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]

Heroku Deployment

# Create Heroku app
heroku create your-cors-proxy

# Deploy
git push heroku main

# Set environment variables
heroku config:set NODE_ENV=production
heroku config:set RATE_LIMIT_POINTS=200

Railway/Render Deployment

Connect your repository
Set environment variables
Deploy with the start command: npm start

Browser Extension Manifest

For Chrome extensions, add the proxy domain to your manifest.json:

{
  "manifest_version": 3,
  "permissions": [
    "activeTab"
  ],
  "host_permissions": [
    "https://your-proxy-domain.com/*"
  ]
}

Performance Tips

Use appropriate proxy type: Choose the right endpoint for your use case
Implement fallback: Use multiple proxy types for reliability
Cache responses: Cache frequently accessed data
Monitor rate limits: Implement client-side rate limiting

Troubleshooting

Common Issues

CORS errors: Ensure the proxy domain is added to host_permissions
Rate limiting: Implement exponential backoff
Timeout errors: Increase timeout values for slow APIs
Large responses: Check MAX_REQUEST_SIZE setting

Debug Mode

Enable debug logging:

ENABLE_LOGGING=true npm start

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions:

Create an issue on GitHub
Check the troubleshooting section
Review the Chrome extension example code

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
lib		lib
.gitignore		.gitignore
.nvmrc		.nvmrc
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
README.md		README.md
chrome-extension-example.js		chrome-extension-example.js
config.js		config.js
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js
test.js		test.js

ranbot-ai/cors-proxy

Folders and files

Latest commit

History

Repository files navigation

RanBOT CORS Proxy

Features

Quick Start

Installation

Basic Usage

API Endpoints

1. AllOrigins-style Proxy

2. Simple Proxy

3. CodeTabs-style Proxy

4. Page Content Extraction

4.1. Smart Content Extraction

4.2. Multiple Extraction Methods

4.3. Extractor Status

5. Direct Proxy (CorsProxy.io style)

6. Health Check

Modular Page Extraction System

Architecture

Configuration Presets

Custom Extraction Methods

Usage Examples

Chrome Extension Integration

Background Script Setup

Content Script Usage

Configuration

Environment Variables

Rate Limiting

Security Features

Error Handling

Testing

Deployment

Docker Deployment

Heroku Deployment

Railway/Render Deployment

Browser Extension Manifest

Performance Tips

Troubleshooting

Common Issues

Debug Mode

License

Contributing

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages