Skip to content

extraction-worker should parallelize different root domains #19

@TimDaub

Description

@TimDaub
  • there can be HTTPS pages that don't allow us to download at max concurrency and max rate
  • but generally, at least with neume's current music NFT crawl there are many different HTTPS endpoints being queried over the life cycle of a crawl
  • But e.g. rate-limiting only happens when a single endpoint is hammered badly
  • so it'd be great if the extraction-worker implemented an algorithm that would maximize the diversity of request types such that even with rate limiting in progress, maximally many requests are completed in parallel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions