-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Hello Apify devs!
This week, my scraper which has been using crawlee[parsel] == 0.5.1 and apify==2.2.0 stopped working (returning 0 results, logs indicate the requests somehow getting blocked but I don't have time to debug it in more depth). The scraper has been running from January every Monday with no problems, deployed on the Apify cloud.
Therefore, I've upgraded the libraries to the newest versions; apify == 3.0.1, crawlee[parsel] == 1.0.2 which fixed the issue with not returning any results. But another bug is occurring at little bit different points during each run. It's usually after ~3000 results, I suspect it's getting triggered by trying to repeat requests. This is the error message. The scraper afterwards finishes with 300/700 requests handled.
[crawlee._autoscaling.autoscaled_pool] ERROR Exception in worker task orchestrator
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/crawlee/_autoscaling/autoscaled_pool.py", line 221, in _worker_task_orchestrator
while not (finished := await self._is_finished_function()) and not run.result.done():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/crawlee/crawlers/_basic/_basic_crawler.py", line 1327, in __is_finished_function
return await request_manager.is_finished()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/crawlee/storages/_request_queue.py", line 308, in is_finished
if await self.is_empty():
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/crawlee/storages/_request_queue.py", line 292, in is_empty
return await self._client.is_empty()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_client.py", line 155, in is_empty
return await self._implementation.is_empty()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 353, in is_empty
await self._ensure_head_is_non_empty()
File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 221, in _ensure_head_is_non_empty
await self._list_head()
File "/usr/local/lib/python3.12/site-packages/apify/storage_clients/_apify/_request_queue_single_client.py", line 250, in _list_head
request = Request.model_validate(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pydantic/main.py", line 705, in model_validate
return cls.__pydantic_validator__.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for Request
Input should be a valid dictionary or instance of Request [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.11/v/model_type
And furthermore downgrading back to the original crawlee[parsel] == 0.5.1 and apify==2.2.0 and pushing again is triggering some new library compatibility issues, so I can't even get the scraper running with the original set up.
Any help or temporary fix is appreciated,
Richard