Releases
1.0.0
1. Release Overview
Version: 1.0.0
Goal: First stable release with robust anti-bot scraping, Dockerized deployment, and automated testing.
2. Major Features
Playwright-based Scraper:
Async scraping with Playwright and BeautifulSoup.
Domain-specific and generic content extraction.
Anti-Bot Evasion:
Integrated playwright-stealth
for fingerprint evasion.
Randomized user agent, viewport, and language per request.
Navigator property spoofing.
Rate limiting per domain.
Robust Extraction Logic:
Fallback to <body>
text for edge cases.
Handles redirects, 404s, and Cloudflare blocks gracefully.
Dockerized Workflow:
Dockerfile and docker-compose for reproducible builds and test runs.
Automated Testing:
Pytest suite with coverage for extraction, error handling, and edge cases.
CI-ready test execution via Docker Compose.
Documentation:
Comprehensive README.md
with setup, usage, and development workflow.
Changelog and release notes.
You can’t perform that action at this time.