🐛 redis.exceptions.LockNotOwnedError: Cannot extend a lock that's no longer owned #119

hihosilvers · 2025-05-23T22:15:00Z

hihosilvers
May 23, 2025

Bug Description

Redis container: redis:8.0.1

Python version: 3.12.10

Python dependencies:
django: 5.2.1
django-dramatiq: 0.13.0
dramatiq: 1.17.1
redis: 5.3.0

settings.py

DRAMATIQ_BROKER = {
    "BROKER": "dramatiq.brokers.redis.RedisBroker",
    "OPTIONS": {
        "url": env.str("REDIS_URL"),
    },
    "MIDDLEWARE": [
        "dramatiq.middleware.Prometheus",
        "dramatiq.middleware.AgeLimit",
        "dramatiq.middleware.TimeLimit",
        "dramatiq.middleware.Callbacks",
        "dramatiq.middleware.Retries",
        "django_dramatiq.middleware.DbConnectionsMiddleware",
        "django_dramatiq.middleware.AdminMiddleware",
    ],
}

DRAMATIQ_CRONTAB = {
    # for locking across multiple instances
    "REDIS_URL": env.str("REDIS_URL"),
}

Error occurs every every 0-10 minutes until container is stopped

Version

1.0.10

Stack trace

Docker compose logs:

django-api-1  | Scheduling heartbeat.
django-api-1  | Acquiring lock…
django-api-1  | Starting scheduler…
django-api-1  | [2025-05-23 18:30:17,922] [PID 377] [MainThread] [dramatiq.MainProcess] [INFO] Dramatiq '1.17.1' is booting up.
django-api-1  | [2025-05-23 18:30:17,923] [PID 555] [MainThread] [dramatiq.ForkProcess(0)] [INFO] Fork process 'dramatiq.middleware.prometheus:_run_exposition_server' is ready for action.
dev_db-1      | 2025-05-23 18:34:57.133 UTC [49] LOG:  checkpoint starting: time
dev_db-1      | 2025-05-23 18:35:27.638 UTC [49] LOG:  checkpoint complete: wrote 300 buffers (1.8%); 0 WAL file(s) added, 0 removed, 0 recycled; write=30.477 s, sync=0.024 s, total=30.506 s; sync files=290, longest=0.003 s, average=0.001 s; distance=1761 kB, estimate=1761 kB; lsn=0/1AC2510, redo lsn=0/1AC0D88
dev_db-1      | 2025-05-23 18:39:57.707 UTC [49] LOG:  checkpoint starting: time
dev_db-1      | 2025-05-23 18:39:58.039 UTC [49] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.316 s, sync=0.004 s, total=0.333 s; sync files=4, longest=0.002 s, average=0.001 s; distance=10 kB, estimate=1586 kB; lsn=0/1AC39C8, redo lsn=0/1AC3970
django-api-1  | 2025-05-23 18:42:13,235:WARNING - apscheduler.executors.default. Run time of job "dramatiq_crontab.utils.lock.extend (trigger: interval[0:00:05], next run at: 2025-05-23 18:42:13 UTC)" was missed by 0:00:04.429701
django-api-1  | 2025-05-23 18:42:13,238:WARNING - apscheduler.executors.default. Run time of job "heartbeat (trigger: cron[month='*', day='*', day_of_week='*', hour='*', minute='*'], next run at: 2025-05-23 18:42:00 UTC)" was missed by 0:00:13.238295
django-api-1  | 2025-05-23 18:42:13,816:ERROR - apscheduler.executors.default. Job "dramatiq_crontab.utils.lock.extend (trigger: interval[0:00:05], next run at: 2025-05-23 18:42:18 UTC)" raised an exception
django-api-1  | Traceback (most recent call last):
django-api-1  |   File "/venv/lib/python3.12/site-packages/apscheduler/executors/base.py", line 131, in run_job
django-api-1  |     retval = job.func(*job.args, **job.kwargs)
django-api-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-api-1  |   File "/venv/lib/python3.12/site-packages/redis/lock.py", line 282, in extend
django-api-1  |     return self.do_extend(additional_time, replace_ttl)
django-api-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-api-1  |   File "/venv/lib/python3.12/site-packages/redis/lock.py", line 293, in do_extend
django-api-1  |     raise LockNotOwnedError(
django-api-1  | redis.exceptions.LockNotOwnedError: Cannot extend a lock that's no longer owned
django-api-1  | 2025-05-23 18:42:18,808:ERROR - apscheduler.executors.default. Job "dramatiq_crontab.utils.lock.extend (trigger: interval[0:00:05], next run at: 2025-05-23 18:42:23 UTC)" raised an exception
django-api-1  | Traceback (most recent call last):
django-api-1  |   File "/venv/lib/python3.12/site-packages/apscheduler/executors/base.py", line 131, in run_job
django-api-1  |     retval = job.func(*job.args, **job.kwargs)
django-api-1  |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-api-1  |   File "/venv/lib/python3.12/site-packages/redis/lock.py", line 282, in extend
django-api-1  |     return self.do_extend(additional_time, replace_ttl)
django-api-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
django-api-1  |   File "/venv/lib/python3.12/site-packages/redis/lock.py", line 293, in do_extend
django-api-1  |     raise LockNotOwnedError(
django-api-1  | redis.exceptions.LockNotOwnedError: Cannot extend a lock that's no longer owned
django-api-1  | 2025-05-23 18:42:23,813:ERROR - apscheduler.executors.default. Job "dramatiq_crontab.utils.lock.extend (trigger: interval[0:00:05], next run at: 2025-05-23 18:42:28 UTC)" raised an exception
...

Steps to Reproduce

No response

Expected Behavior

Error should not occur.

Answered by codingjoe

Jun 25, 2025

Hi,

Thanks, you are welcome, we love to help!

The lock extension task is not run on dramatiq, but directly via the scheduler. Unless your CPU has no time to pick up the thread, that shouldn't be a likely cause. In most cases, it will be the Redis setup that is loosing data. Which isn't great. If your services restarts frequently, I would consider altering the appendfsync setting, to increase durability.

Anyhow, the pending patch should resolve the issue as long as your orchestration restarts the crontab process.

Best,
Joe

View full answer

herrbenesch · 2025-05-28T11:51:55Z

herrbenesch
May 28, 2025
Maintainer

Hi @hihosilvers. Thanks for reporting your issue.
We are running a similar setup as you do but don't see the LockNotOwnedError as much as you do.

Package versions are the same for us.
Our config looks like this

DRAMATIQ_REDIS_URL = parse_redis_url(os.getenv("REDIS_URL", "redis:///1"))
DRAMATIQ_BROKER = {
    "BROKER": "dramatiq.brokers.redis.RedisBroker",
    "OPTIONS": {"connection_pool": redis.ConnectionPool.from_url(DRAMATIQ_REDIS_URL)},
    "MIDDLEWARE": [
        "dramatiq.middleware.AgeLimit",
        "dramatiq.middleware.TimeLimit",
        "dramatiq.middleware.Retries",
        "django_dramatiq.middleware.DbConnectionsMiddleware",
    ],
}

DRAMATIQ_RATE_LIMITER_BACKEND = {
    "BACKEND": "dramatiq.rate_limits.backends.redis.RedisBackend",
    "BACKEND_OPTIONS": {
        "url": DRAMATIQ_REDIS_URL,
    },
}

# dramatiq crontab
DRAMATIQ_CRONTAB = {
    "REDIS_URL": DRAMATIQ_REDIS_URL,
}

We had a spike of this error somewhere in the past.
Perhaps @amureki remembers why it happened.

I did not dig further yet. Perhaps our config would provide you with something to get started. If not, perhaps @amureki would what to try.

0 replies

codingjoe · 2025-05-28T16:52:09Z

codingjoe
May 28, 2025
Maintainer

Hi @hihosilvers 👋,

Thanks for reaching out. We switch to a dead man snitch approach, to ensure deadlocks. Meaning, a scheduler process needs to check in frequently with a heart beat. If not, the lock will expire and a new process can assume the role of the scheduler.

If you are encountering this issue, your scheduler process didn't review the lock. Meaning, it didn't schedule jobs reliably. We are happy to help you debug this, but we would need additional information about your environment. How are you running your application?

Generally, if are automatically restarting the working, you don't experience any major discrepancies. However, it might be still worth looking into this.

I hope our response already help you a little. Please don't hesitate to reach out again.

Best
Joe

0 replies

pencil · 2025-06-24T14:43:46Z

pencil
Jun 24, 2025

I am running into this problem as well and I believe there is a bug in how dramatiq-crontab handles the situation.

When dramatiq-crontab loses its Redis lock (e.g. due to a delay or Redis eviction), the lock extension job continues to run and repeatedly throws LockNotOwnedError: Cannot extend a lock that's no longer owned as described above. However, despite this, the scheduler continues executing jobs, which violates the exclusivity guarantee.

Steps to reproduce

Start the scheduler with a Redis lock
Delete the Redis lock key (simulates an eviction or a delay in running lock extension)
Observe the error above from dramatiq_crontab.utils.lock.extend
Notice that scheduled tasks still run

Current behavior

The scheduler remains active and continues attempting to extend a lock it no longer owns, potentially resulting in multiple instances running concurrently.

Expected behavior

If the lock extension fails due to LockNotOwnedError, the scheduler should stop immediately. The system should then attempt to reacquire the lock. Only after successfully reacquiring the lock should the scheduler resume.

Suggested fix

Catch LockNotOwnedError in the lock extension routine, stop the scheduler, and attempt lock reacquisition. Resume the scheduler only if the lock is successfully reacquired.

5 replies

pencil Jun 24, 2025

I ended up implementing the suggested fix in #124

codingjoe Jun 25, 2025
Maintainer

Hi there,

Thanks for reaching out. Redis is a durable key-value store and doesn't just “loose” entries.
Please validate your eviction policy for your logical database. Ideally, you want a fully persistent configuration without eviction.

However, I do agree that a failing log extension should cause the scheduler to terminate. Since it is no longer "holding" the lock. With the process shutting down, most production orchestrations will simply launch a new process, which will acquire a new lock.

I will create a ticket for this.

Cheers
Joe

pencil Jun 25, 2025

Thanks for the quick follow-up, Joe!

You are right that eviction in low memory situations can be controlled by setting the eviction policy to noeviction but the lock can also get lost for other reasons because of the short TTL (default 10s) and tight extension intervals (default 5s). For example there can be delays due to network latency, Redis cluster failover, load spikes, …

codingjoe Jun 25, 2025
Maintainer

Hi,

Thanks, you are welcome, we love to help!

The lock extension task is not run on dramatiq, but directly via the scheduler. Unless your CPU has no time to pick up the thread, that shouldn't be a likely cause. In most cases, it will be the Redis setup that is loosing data. Which isn't great. If your services restarts frequently, I would consider altering the appendfsync setting, to increase durability.

Anyhow, the pending patch should resolve the issue as long as your orchestration restarts the crontab process.

Best,
Joe

Answer selected by codingjoe

pencil Jun 25, 2025

I saw your proposed fix and it looks good to me 👍

To clarify: I'm talking specifically about slowdowns between dramatiq-crontab and Redis that could cause the extend call on the lock to be delayed for up to 5s which would then result in it failing with the described LockNotOwnedError. These slowdown can come from network hiccups, Redis being single-threaded (e.g. blocking Lua scripts, heavy LRANGE/KEYS calls), etc. It should definitely not be common, but very possible in real deployments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

🐛 redis.exceptions.LockNotOwnedError: Cannot extend a lock that's no longer owned #119

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

🐛 redis.exceptions.LockNotOwnedError: Cannot extend a lock that's no longer owned #119

Uh oh!

Uh oh!

hihosilvers May 23, 2025

Bug Description

Version

Stack trace

Steps to Reproduce

Expected Behavior

Replies: 3 comments · 5 replies

Uh oh!

herrbenesch May 28, 2025 Maintainer

Uh oh!

codingjoe May 28, 2025 Maintainer

Uh oh!

pencil Jun 24, 2025

Steps to reproduce

Current behavior

Expected behavior

Suggested fix

Uh oh!

pencil Jun 24, 2025

Uh oh!

codingjoe Jun 25, 2025 Maintainer

Uh oh!

pencil Jun 25, 2025

Uh oh!

codingjoe Jun 25, 2025 Maintainer

Uh oh!

pencil Jun 25, 2025

hihosilvers
May 23, 2025

Replies: 3 comments 5 replies

herrbenesch
May 28, 2025
Maintainer

codingjoe
May 28, 2025
Maintainer

pencil
Jun 24, 2025

codingjoe Jun 25, 2025
Maintainer

codingjoe Jun 25, 2025
Maintainer