-
-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
we are using celery executors and had redis connection issues last night. after redis was online again the workers could reconnect but did not execute any task anymore. all the tasks which should be run timed out waiting in the queue.
worker restart solves the problem immediately
log before restart attached, thats the end of the log (copied at 9 am one today)
...
[2025-05-25 18:05:00,485: INFO/ForkPoolWorker-15] Filling up the DagBag from /stackable/app/git/current/stages/int/apps/platform-core/dags/deadbeef.py
[2025-05-25 18:05:00,648: INFO/ForkPoolWorker-15] Running <TaskInstance: platform-deadbeef.deadbeef scheduled__2025-05-25T16:00:00+00:00 [queued]> on host airflow-worker-default-1.airflow-worker-default.mesh-platform-core.svc.cluster.local
[2025-05-25 18:05:02,045: WARNING/ForkPoolWorker-15] empty cryptography key - values will not be stored encrypted.
[2025-05-25 18:05:02,047: INFO/ForkPoolWorker-15] Using connection ID 'hpo_default' for task execution.
[2025-05-25 18:05:02,047: INFO/ForkPoolWorker-15] AWS Connection (conn_id='hpo_default', conn_type='aws') credentials retrieved from login and password.
[2025-05-25 18:05:02,605: INFO/ForkPoolWorker-15] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[7a457fda-8875-4b9d-add7-8b4711e2e43d] succeeded in 2.17235898389481s: None
[2025-05-25 18:12:38,465: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/stackable/app/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py", line 340, in start
blueprint.start(self)
File "/stackable/app/lib64/python3.9/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/stackable/app/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py", line 746, in start
c.loop(*c.loop_args())
File "/stackable/app/lib64/python3.9/site-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/stackable/app/lib64/python3.9/site-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/stackable/app/lib64/python3.9/site-packages/kombu/transport/redis.py", line 1344, in on_readable
self.cycle.on_readable(fileno)
File "/stackable/app/lib64/python3.9/site-packages/kombu/transport/redis.py", line 569, in on_readable
chan.handlers[type]()
File "/stackable/app/lib64/python3.9/site-packages/kombu/transport/redis.py", line 913, in _receive
ret.append(self._receive_one(c))
File "/stackable/app/lib64/python3.9/site-packages/kombu/transport/redis.py", line 923, in _receive_one
response = c.parse_response()
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 837, in parse_response
response = self._execute(conn, try_read)
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 813, in _execute
return conn.retry.call_with_retry(
File "/stackable/app/lib64/python3.9/site-packages/redis/retry.py", line 49, in call_with_retry
fail(error)
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 815, in <lambda>
lambda error: self._disconnect_raise_connect(conn, error),
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 802, in _disconnect_raise_connect
raise error
File "/stackable/app/lib64/python3.9/site-packages/redis/retry.py", line 46, in call_with_retry
return do()
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 814, in <lambda>
lambda: command(*args, **kwargs),
File "/stackable/app/lib64/python3.9/site-packages/redis/client.py", line 835, in try_read
return conn.read_response(disconnect_on_error=False, push_request=True)
File "/stackable/app/lib64/python3.9/site-packages/redis/connection.py", line 512, in read_response
response = self._parser.read_response(disable_decoding=disable_decoding)
File "/stackable/app/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 15, in read_response
result = self._read_response(disable_decoding=disable_decoding)
File "/stackable/app/lib64/python3.9/site-packages/redis/_parsers/resp2.py", line 25, in _read_response
raw = self._buffer.readline()
File "/stackable/app/lib64/python3.9/site-packages/redis/_parsers/socket.py", line 115, in readline
self._read_from_socket()
File "/stackable/app/lib64/python3.9/site-packages/redis/_parsers/socket.py", line 68, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
[2025-05-25 18:12:41,476: WARNING/MainProcess] /stackable/app/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py:391: CPendingDeprecationWarning:
In Celery 5.1 we introduced an optional breaking change which
on connection loss cancels all currently executed tasks with late acknowledgement enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered
back to the queue. You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss
setting. In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.
warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)
[2025-05-25 18:12:41,477: INFO/MainProcess] Temporarily reducing the prefetch count to 12 to avoid over-fetching since 4 tasks are currently being processed.
The prefetch count will be gradually restored to 16 as the tasks complete processing.
[2025-05-25 18:12:41,494: WARNING/MainProcess] /stackable/app/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
warnings.warn(
[2025-05-25 18:12:41,497: ERROR/MainProcess] consumer: Cannot connect to redis://airflow-redis-haproxy:6379//: Connection closed by server..
Trying again in 2.00 seconds... (1/100)
[2025-05-25 18:12:43,502: ERROR/MainProcess] consumer: Cannot connect to redis://airflow-redis-haproxy:6379//: Connection closed by server..
Trying again in 4.00 seconds... (2/100)
[2025-05-25 18:12:47,508: ERROR/MainProcess] consumer: Cannot connect to redis://airflow-redis-haproxy:6379//: Connection closed by server..
Trying again in 6.00 seconds... (3/100)
[2025-05-25 18:12:53,523: INFO/MainProcess] Connected to redis://airflow-redis-haproxy:6379//
[2025-05-25 18:12:53,524: WARNING/MainProcess] /stackable/app/lib64/python3.9/site-packages/celery/worker/consumer/consumer.py:508: CPendingDeprecationWarning: The broker_connection_retry configuration setting will no longer determine
whether broker connection retries are made during startup in Celery 6.0 and above.
If you wish to retain the existing behavior for retrying connections on startup,
you should set broker_connection_retry_on_startup to True.
warnings.warn(
[2025-05-25 18:12:53,530: INFO/MainProcess] mingle: searching for neighbors
[2025-05-25 18:12:54,540: INFO/MainProcess] mingle: all alone
[2025-05-25 18:12:59,561: INFO/MainProcess] missed heartbeat from celery@airflow-worker-default-0
[2025-05-25 18:30:46,097: WARNING/ForkPoolWorker-16] empty cryptography key - values will not be stored encrypted.
[2025-05-25 18:30:46,099: INFO/ForkPoolWorker-16] Using connection ID 'hpo_default' for task execution.
[2025-05-25 18:30:46,100: INFO/ForkPoolWorker-16] AWS Connection (conn_id='hpo_default', conn_type='aws') credentials retrieved from login and password.
[2025-05-25 18:30:46,962: INFO/ForkPoolWorker-16] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[0a9c0d57-fa15-41dd-87f3-257f05a5c04f] succeeded in 1846.0727213749196s: None
[2025-05-25 18:30:47,489: WARNING/ForkPoolWorker-14] empty cryptography key - values will not be stored encrypted.
[2025-05-25 18:30:47,492: INFO/ForkPoolWorker-14] Using connection ID 'hpo_default' for task execution.
[2025-05-25 18:30:47,492: INFO/ForkPoolWorker-14] AWS Connection (conn_id='hpo_default', conn_type='aws') credentials retrieved from login and password.
[2025-05-25 18:30:48,363: INFO/ForkPoolWorker-14] Task airflow.providers.celery.executors.celery_executor_utils.execute_command[17fafc80-ab1b-45d9-b46b-ca71f46a3f98] succeeded in 1847.4771845629439s: None