Skip to content

Pydantic validation error in request queue after request retries #1472

@Rigos0

Description

@Rigos0

Hello Apify devs!

This week, my scraper which has been using crawlee[parsel] == 0.5.1 and apify==2.2.0 stopped working (returning 0 results, logs indicate the requests somehow getting blocked but I don't have time to debug it in more depth). The scraper has been running from January every Monday with no problems, deployed on the Apify cloud.

Therefore, I've upgraded the libraries to the newest versions; apify == 3.0.1, crawlee[parsel] == 1.0.2 which fixed the issue with not returning any results. But another bug is occurring at little bit different points during each run. It's usually after ~3000 results, I suspect it's getting triggered by trying to repeat requests. This is the error message. The scraper afterwards finishes with 300/700 requests handled.

[crawlee._autoscaling.autoscaled_pool] ERROR Exception in worker task orchestrator
      Traceback (most recent call last):
        File "/usr/local/lib/python3.12/site-packages/crawlee/_autoscaling/autoscaled_pool.py", line 221, in _worker_task_orchestrator
          while not (finished := await self._is_finished_function()) and not run.result.done():
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/crawlee/crawlers/_basic/_basic_crawler.py", line 1327, in __is_finished_function
          return await request_manager.is_finished()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/crawlee/storages/_request_queue.py", line 308, in is_finished
          if await self.is_empty():
             ^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/crawlee/storages/_request_queue.py", line 292, in is_empty
          return await self._client.is_empty()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_client.py", line 155, in is_empty
          return await self._implementation.is_empty()
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 353, in is_empty
          await self._ensure_head_is_non_empty()
        File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 221, in _ensure_head_is_non_empty
          await self._list_head()
        File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 250, in _list_head
          request = Request.model_validate(
                    ^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 705, in model_validate
          return cls.__pydantic_validator__.validate_python(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      pydantic_core._pydantic_core.ValidationError: 1 validation error for Request
        Input should be a valid dictionary or instance of Request [type=model_type, input_value=None, input_type=NoneType]
          For further information visit https://errors.pydantic.dev/2.11/v/model_type

And furthermore downgrading back to the original crawlee[parsel] == 0.5.1 and apify==2.2.0 and pushing again is triggering some new library compatibility issues, so I can't even get the scraper running with the original set up.

Any help or temporary fix is appreciated,
Richard

Metadata

Metadata

Assignees

Labels

bugSomething isn't working.t-toolingIssues with this label are in the ownership of the tooling team.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions