Skip to content

v1.0.0

Compare
Choose a tag to compare
@JustAzul JustAzul released this 09 Jun 18:30
· 119 commits to master since this release
077e662

1. Release Overview

  • Version: 1.0.0
  • Goal: First stable release with robust anti-bot scraping, Dockerized deployment, and automated testing.

2. Major Features

  • Playwright-based Scraper:
    • Async scraping with Playwright and BeautifulSoup.
    • Domain-specific and generic content extraction.
  • Anti-Bot Evasion:
    • Integrated playwright-stealth for fingerprint evasion.
    • Randomized user agent, viewport, and language per request.
    • Navigator property spoofing.
    • Rate limiting per domain.
  • Robust Extraction Logic:
    • Fallback to <body> text for edge cases.
    • Handles redirects, 404s, and Cloudflare blocks gracefully.
  • Dockerized Workflow:
    • Dockerfile and docker-compose for reproducible builds and test runs.
  • Automated Testing:
    • Pytest suite with coverage for extraction, error handling, and edge cases.
    • CI-ready test execution via Docker Compose.
  • Documentation:
    • Comprehensive README.md with setup, usage, and development workflow.
    • Changelog and release notes.