Skip to content

Commit c326e1b

Browse files
browniebrokenicwolffSuor
authored
Add command to clear stale cacheops keys (#434)
* Add conj-reaper management command * Move main command logic into a function and call it from the command Rework #323 into a reusable function that can be wrapped in a Celery task by * Rename back to reaper and set default values in logic Co-Authored-By: Alexander Schepanovski <suor.web@gmail.com> * Don't hardcode default DB alias * Expose reap_conjs on the package's top level API * Update documentation * Remove f-string as we're still supporting Python 3.5 * Fix comments in documentation * Pluralize management command Co-authored-by: Nic Wolff <nwolff@hearst.com> Co-authored-by: Alexander Schepanovski <suor.web@gmail.com>
1 parent 917a0fb commit c326e1b

File tree

4 files changed

+113
-0
lines changed

4 files changed

+113
-0
lines changed

README.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -725,6 +725,32 @@ Here is a simple stats implementation:
725725
Cache invalidation signal is emitted after object, model or global invalidation passing ``sender`` and ``obj_dict`` args. Note that during normal operation cacheops only uses object invalidation, calling it once for each model create/delete and twice for update: passing old and new object dictionary.
726726

727727

728+
Memory usage cleanup
729+
--------------------
730+
731+
In some cases, cacheops may leave some conjunction keys of expired cache keys in redis without being able
732+
to invalidate them. Cacheops ships with a ``cacheops.reap_conjs`` function that can clean up these keys,
733+
ignoring conjunction sets with some reasonable size.
734+
735+
It can be called using the ``reapconjs`` management command::
736+
737+
./manage.py reapconjs --chunk-size=100 --min-conj-set-size=10000 # with custom values
738+
./manage.py reapconjs # with default values (chunks=1000, min size=1000)
739+
740+
The command is a small wrapper that calls a function with the main logic. You can also call it from your code, for example from a Celery task:
741+
742+
.. code:: python
743+
744+
from cacheops import reap_conjs
745+
746+
@app.task
747+
def reap_conjs_task():
748+
reap_conjs(
749+
chunk_size=2000,
750+
min_conj_set_size=100,
751+
)
752+
753+
728754
CAVEATS
729755
-------
730756

cacheops/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .simple import * # noqa
66
from .query import * # noqa
77
from .invalidation import * # noqa
8+
from .reaper import * # noqa
89
from .templatetags.cacheops import * # noqa
910

1011

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
from argparse import ArgumentParser
2+
3+
from django.core.management.base import BaseCommand
4+
5+
from cacheops.reaper import reap_conjs
6+
7+
8+
class Command(BaseCommand):
9+
help = 'Removes expired conjunction keys from cacheops.'
10+
11+
def add_arguments(self, parser: ArgumentParser):
12+
parser.add_argument('--chunk-size', type=int, default=1000)
13+
parser.add_argument('--min-conj-set-size', type=int, default=1000)
14+
parser.add_argument('--dry-run', action='store_true')
15+
16+
def handle(self, chunk_size: int, min_conj_set_size: int, dry_run: bool, **kwargs):
17+
reap_conjs(
18+
chunk_size=chunk_size,
19+
min_conj_set_size=min_conj_set_size,
20+
dry_run=dry_run,
21+
)

cacheops/reaper.py

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import logging
2+
3+
from django.db import DEFAULT_DB_ALIAS
4+
5+
from .redis import redis_client
6+
from .sharding import get_prefix
7+
8+
logger = logging.getLogger(__name__)
9+
10+
11+
def reap_conjs(
12+
chunk_size: int = 1000,
13+
min_conj_set_size: int = 1000,
14+
using=DEFAULT_DB_ALIAS,
15+
dry_run: bool = False,
16+
):
17+
"""
18+
Remove expired cache keys from invalidation sets.
19+
20+
Cacheops saves each DB resultset cache key in a "conj set" so it can delete it later if it
21+
thinks it should be invalidated due to a saved record with matching values. But the resultset
22+
caches time out after 30 minutes, and their cache keys live in those conj sets forever!
23+
24+
So conj sets for frequent queries on tables that aren't updated often end up containing
25+
millions of already-expired cache keys and maybe a few thousand actually useful ones,
26+
and block Redis for multiple - or many - seconds when cacheops finally decides
27+
to invalidate them.
28+
29+
This function scans cacheops' conj keys for already-expired cache keys and removes them.
30+
"""
31+
logger.info('Starting scan for large conj sets')
32+
prefix = get_prefix(dbs=[using])
33+
for conj_key in redis_client.scan_iter(prefix + 'conj:*', count=chunk_size):
34+
total = redis_client.scard(conj_key)
35+
if total < min_conj_set_size:
36+
continue
37+
logger.info('Found %s cache keys in %s, scanning for expired keys', total, conj_key)
38+
_clear_conj_key(conj_key, chunk_size, dry_run)
39+
logger.info('Done scan for large conj sets')
40+
41+
42+
def _clear_conj_key(conj_key: bytes, chunk_size: int, dry_run: bool):
43+
"""Scan the cache keys in a conj set in batches and remove any that have expired."""
44+
count, removed = 0, 0
45+
for keys in _iter_keys_chunk(chunk_size, conj_key):
46+
count += len(keys)
47+
values = redis_client.mget(keys)
48+
expired = [k for k, v in zip(keys, values) if not v]
49+
if expired:
50+
if not dry_run:
51+
redis_client.srem(conj_key, *expired)
52+
removed += len(expired)
53+
logger.info('Removed %s/%s cache keys from %s', removed, count, conj_key)
54+
if removed and not dry_run:
55+
redis_client.execute_command('MEMORY PURGE')
56+
57+
58+
def _iter_keys_chunk(chunk_size, key):
59+
cursor = 0
60+
while True:
61+
cursor, items = redis_client.sscan(key, cursor, count=chunk_size)
62+
if items:
63+
yield items
64+
if cursor == 0:
65+
break

0 commit comments

Comments
 (0)