A FastAPI-based web service that uses Playwright to fetch and process web content. This service provides a robust API for web scraping with support for proxies, media blocking, and API key authentication.
- 🚀 Fast and async web scraping using Playwright
- 🔒 Optional API key authentication
- 🌐 Proxy support
- 🖼️ Media blocking capabilities
- 🐳 Docker support
- 🏗️ CI/CD with GitHub Actions
- 📚 Interactive API documentation (Swagger UI)
- Clone the repository:
git clone git@github.com:watercrawl/playwright.git
cd playwright
- Set up environment variables:
cp .env.example .env
- Edit
.env
file with your settings:
AUTH_API_KEY=your-secret-api-key
PORT=8000
HOST=0.0.0.0
- Build and run with Docker Compose:
docker compose up --build
The service will be available at http://localhost:8000
Access the interactive API documentation at http://localhost:8000/docs
docker pull watercrawl/playwright:latest
docker run -p 8000:8000 -e AUTH_API_KEY=your-secret-key watercrawl/playwright
The API documentation is available through Swagger UI at /docs
endpoint. This provides:
- Interactive API documentation
- Request/response examples
- Try-it-out functionality
- OpenAPI specification
- GET
/health/liveness
- Liveness probe - GET
/health/readiness
- Readiness probe
- POST
/html
- Fetch HTML content from a URL
{
"url": "https://example.com",
"proxy": {
"type": "http",
"host": "proxy.example.com",
"port": 8080,
"username": "user",
"password": "pass"
},
"block_media": true,
"user_agent": "custom-user-agent",
"locale": "en-US",
"extra_headers": {
"Custom-Header": "value"
}
}
When AUTH_API_KEY
is set in the environment, the API requires authentication using the X-API-Key
header:
curl -X POST http://localhost:8000/html \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret-api-key" \
-d '{"url": "https://example.com"}'
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Install Playwright browsers:
playwright install chromium
- Run the application:
uvicorn main:app --reload
- Access the API documentation:
- Open
http://localhost:8000/docs
in your browser - Try out the endpoints directly from the Swagger UI
- View the OpenAPI specification at
/openapi.json
- Open
Variable | Description | Default |
---|---|---|
AUTH_API_KEY | API key for authentication | None (disabled) |
PORT | Server port | 8000 |
HOST | Server host | 0.0.0.0 |
PYTHONUNBUFFERED | Python unbuffered output | 1 |
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.