Skip to content

deepscrape/crawlqueue

Repository files navigation

Scalable Web Scraping Service QUEUE

Overview

A high-performance, distributed web scraping service built with:

  • Python
  • FastAPI
  • Playwright
  • Upstash Redis
  • Fly.io Deployment

Features

  • Concurrent scraping of multiple URLs
  • Distributed job queueing
  • Scalable architecture
  • Real-time job status tracking

Prerequisites

  • Python 3.11+
  • Upstash Redis Account
  • Fly.io Account

Local Development Setup

  1. Clone the repository
  2. Install dependencies:
pip install -r requirements.txt
  1. Set Upstash Redis Environment Variables:
export UPSTASH_REDIS_REST_URL=your_redis_url
export UPSTASH_REDIS_REST_TOKEN=your_redis_token
  1. Run API Server:
uvicorn api:app --reload
  1. Run Worker:
python worker.py

Deployment

fly launch

SCALABLE SERVICE QUEUE

  • REDIS QUEUE operation/{operation_id} : Check job status per browser per machine

Configuration

Adjust concurrency and worker settings in worker.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published