Skip to content

Improve sameDomainDelaySecs implementation #3148

@janbuchar

Description

@janbuchar
  • The current implementation (added in feat: add support for sameDomainDelay #2003) essentially holds requests to recently accessed domains in memory and re-enqueues them after a given timeout
  • ⚠️ this may result in lots of unnecessary request queue writes
  • ⚠️ there may be unexpected results if this is used in conjunction with request locking
    • but resolving this for multiple request queue consumers would be quite an endeavor
  • we cannot simply use maxRequestsPerMinute - we don't know the request URL (and thus the domain) before receiving it from the queue

Regardless of the resolution, the "final" version should be ported over to crawlee-python.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions