14 Jun 22:10

JustAzul

70c668a

v1.3.0 Latest

Latest

Summary:
This release delivers new scraping features, performance optimizations, improved test coverage, and major enhancements to CI/CD workflows and documentation. The codebase is now more robust, maintainable, and easier to extend, with a focus on reliability and developer experience.

Features

Support for custom_elements_to_remove in API scrape arguments and extraction
- Commit: 851e38d
Added filter_none_values utility with comprehensive tests
- Commit: 8c58e5a
Added click_selector support to extract_text_from_url
- Commit: 558f513

Performance

Reuse singleton browser instance for all scrapes, reducing resource usage
- Commit: b905f3d
Avoid repeated BeautifulSoup parsing by reusing soup objects
- Commit: 096bb1b

Refactors

Refactored mcp_server to use filter_none_values and support click_selector
- Commit: 045de45
Merged dynamic article extraction tests into a single random-domain test
- Commit: db7d1e0

Documentation

Updated README: removed roadmap/contact/robots.txt sections, improved badge clarity, and added usage examples
- Commits: 76cc302, 0258831, e3c8aad, 35b5e6a, 4653ecb, 1407f28, 5b35586

Chore

Added MIT license file
- Commit: 9791b2b
Ignored all files/directories under cursor/
- Commit: 8ed75e1
Removed empty __init__.py files from src and tests/helpers
- Commit: 48e723a
Ensured trailing newline in config files
- Commit: 3b5ddee
Updated and cleaned up workflows (build, test, release, coverage)
- Commits: 37dd86d, 2258c78, 950dc3e, 57bf6b1, d5d1463, 954ab47, 07018f1, 3230816, 0adc52c, 40d5aa9, 774dcf3, 51c6cd8, 60e95a7, c135d3c, c3545d4, af1234c, 1975d59
Standardized on master branch, removed legacy/duplicate workflow files
- Commits: 40d5aa9, 774dcf3, 51c6cd8

Tests

Improved JS delay and user agent tests for robustness and coverage
- Commits: 3a2b59a, 27108ae, dc15558, 483615d, c5182c1, db7d1e0, 0623313, 2af0a3b, d25f2c0
Added and expanded tests for new utilities and features
- Commits: 8c58e5a, d25f2c0, c5182c1

Fixes

Update badge Gist URLs and workflow gistIDs for build, test, and coverage
- Commit: 09e6060
Filter out None values from tool arguments to prevent type errors
- Commit: a4d6fed
Ensure output_format string is converted to OutputFormat enum in get_prompt
- Commit: d25f2c0
Refactor JS-delay test to use real demo page and improve reliability
- Commit: 0623313
Restore per-scrape browser launch and cleanup for test reliability
- Commit: 2af0a3b
Ensure browser is always closed using finally block in extract_text_from_url
- Commit: d779341

Full Changelog: 1.2.0...1.3.0

Assets 2

12 Jun 16:11

JustAzul

1.2.0

2c3de28

v1.2.0

Release Notes: 1.2.0

Summary:
This release introduces a new grace_period_seconds feature for improved JavaScript rendering support, significant refactoring for configuration and test logic, improved documentation, and enhanced test coverage. The codebase is now more robust, with clearer configuration and more reliable extraction logic.

Features

Add grace_period_seconds parameter for JS rendering delay
- Allows fine-tuning of wait time for JavaScript-rendered content.
- Commit: eb3b841

Refactors

Configuration and server logic improvements
- Removed unused content length and grace period options from config.
- Updated server and scraper logic to use new parameters and improve clarity.
- Commit: 61a9264

Documentation

README and usage updates
- Updated documentation to reflect new parameters and configuration changes.
- Commit: c80fcd6

Tests

Test updates and improvements
- Refactored tests for new result format and grace_period_seconds support.
- Improved test assertions and coverage for error handling and edge cases.
- Commit: 2c3de28

Affected Files

Modified: README.md, src/config.py, src/mcp_server.py, src/scraper/__init__.py, tests/test_mcp_server.py, tests/test_scraper.py

Full Changelog: 1.1.1...1.2.0

Assets 2

12 Jun 15:15

JustAzul

1.1.1

3f6a953

v1.1.1

Refactor

refactor(config): add env-based config and type-safe parsing helpers
refactor(mcp): adapt to new dict return type from extract_text_from_url
refactor(scraper): return dict with metadata and error from extract_text_from_url

Docs

docs(readme): rewrite and reorganize documentation for clarity, usage, and config

Test

test(tests): add test_mcp_server.py for MCP server testing

CI

ci(test): add separate test services for mcp and scraper in docker-compose

Fix

fix(docker): format environment variable assignments in Dockerfile
fix(mcp_server): remove hardcoded stdio to properly serve the mcp tool requests

Assets 2

11 Jun 20:29

JustAzul

1.1.0

dd9d284

v1.1.0

Release Notes: 1.1.0

Summary:
This release introduces a modular scraper architecture, significant refactoring for maintainability, improved documentation, and enhanced test coverage. Obsolete files and legacy code have been removed, and the codebase is now more consistent and easier to extend.

Features

New modular scraper implementation and helpers
- Introduced a new scraper module with improved structure and extensibility.
- Added helpers for browser automation, content selection, error handling, HTML utilities, and rate limiting.
- Commit: 74c7852

Refactors

Core and server logic improvements
- Centralized configuration, added a Logger, and improved debug/error handling.
- Reformatted and improved readability of server logic.
- Removed legacy and obsolete files (CLI, main, stdio_server, test runner, old scraper).
- Improved extraction logic and cleaned up dependencies.
- Commits: 3f6f876, fb95ddc, de5e0c7, c8167d9, 3752cf1, 01c145e

Style

Code formatting and consistency
- Reformatted logger and test files for PEP8 compliance and readability.
- Removed trailing whitespace from __init__.py files.
- Commits: cd753ac, d272992, c053c4d

Documentation

Documentation and configuration updates
- Updated README with new usage instructions, removed CLI references, and added Cursor IDE integration.
- Updated environment, Docker, and documentation for new config structure.
- Commits: dd9d284, 6fbba89

Chore

Cleanup and maintenance
- Removed obsolete files and updated project structure.
- Commit: c8167d9

Tests

Test updates and improvements
- Updated and added tests for new config, timing constants, and scraper logic.
- Reformatted test files for clarity.
- Commits: cd753ac, 47c70d0

Affected Files

Added: src/logger.py, src/mcp_server.py, src/scraper/__init__.py, src/scraper/helpers/*, tests/test_scraper.py
Modified: .env.example, Dockerfile, README.md, docker-compose.yml, requirements.txt, src/__init__.py, src/config.py, tests/__init__.py, tests/test_helpers.py
Deleted: CHANGELOG.md, src/main.py, src/scraper.py, src/stdio_server.py, tests/test_mcp.py

Full Changelog: 1.0.0...1.1.0

Assets 2

09 Jun 18:30

JustAzul

1.0.0

077e662

v1.0.0

1. Release Overview

Version: 1.0.0
Goal: First stable release with robust anti-bot scraping, Dockerized deployment, and automated testing.

2. Major Features

Playwright-based Scraper:
- Async scraping with Playwright and BeautifulSoup.
- Domain-specific and generic content extraction.
Anti-Bot Evasion:
- Integrated playwright-stealth for fingerprint evasion.
- Randomized user agent, viewport, and language per request.
- Navigator property spoofing.
- Rate limiting per domain.
Robust Extraction Logic:
- Fallback to <body> text for edge cases.
- Handles redirects, 404s, and Cloudflare blocks gracefully.
Dockerized Workflow:
- Dockerfile and docker-compose for reproducible builds and test runs.
Automated Testing:
- Pytest suite with coverage for extraction, error handling, and edge cases.
- CI-ready test execution via Docker Compose.
Documentation:
- Comprehensive README.md with setup, usage, and development workflow.
- Changelog and release notes.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Features

Performance

Refactors

Documentation

Chore

Tests

Fixes

Uh oh!

Release Notes: 1.2.0

Features

Refactors

Documentation

Tests

Affected Files

Uh oh!

Refactor

Docs

Test

CI

Fix

Uh oh!

Release Notes: 1.1.0

Features

Refactors

Style

Documentation

Chore

Tests

Affected Files

Uh oh!

1. Release Overview

2. Major Features

Uh oh!

Releases: JustAzul/web-scrapper-stdio

v1.3.0

Features

Performance

Refactors

Documentation

Chore

Tests

Fixes

Uh oh!

v1.2.0

Release Notes: 1.2.0

Features

Refactors

Documentation

Tests

Affected Files

Uh oh!

v1.1.1

Refactor

Docs

Test

CI

Fix

Uh oh!

v1.1.0

Release Notes: 1.1.0

Features

Refactors

Style

Documentation

Chore

Tests

Affected Files

Uh oh!

v1.0.0

1. Release Overview

2. Major Features

Uh oh!