A high-performance, distributed web scraping service built with:
- Python
- FastAPI
- Playwright
- Upstash Redis
- Fly.io Deployment
- Concurrent scraping of multiple URLs
- Distributed job queueing
- Scalable architecture
- Real-time job status tracking
- Python 3.11+
- Upstash Redis Account
- Fly.io Account
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Set Upstash Redis Environment Variables:
export UPSTASH_REDIS_REST_URL=your_redis_url
export UPSTASH_REDIS_REST_TOKEN=your_redis_token
- Run API Server:
uvicorn api:app --reload
- Run Worker:
python worker.py
fly launch
REDIS QUEUE operation/{operation_id}
: Check job status per browser per machine
Adjust concurrency and worker settings in worker.py