Skip to content

max_session_rotations not being respected in AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser #1484

@ericvg97

Description

@ericvg97

Hi, its me again, sorry for the amount of issues I am opening:) I think in this case it is a bug. I am using this

concurrency_settings  =  ConcurrencySettings(
    min_concurrency=1,
    max_concurrency=1,
    desired_concurrency=1,
    max_tasks_per_minute=200)
    
crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
    max_requests_per_crawl=10000,
    playwright_crawler_specific_kwargs={
        "browser_type": "chromium",
        "headless": True,
    },
    max_session_rotations=10,
    retry_on_blocked=True,
    concurrency_settings=concurrency_settings,
    use_session_pool=True,
    max_request_retries=5,
    keep_alive=True,
)

and if I crawl this url "http://httpbingo.org/status/429" it keeps rotating the session ("Assuming the session is blocked based on HTTP status code 429", rotating session and retrying...") until infinity

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions