fix(app): reduce peak memory usage

psychedelicious · psychedelicious · commit 35c7c59455c0 · 2025-06-11T12:56:16.000+10:00
We've long suspected there is a memory leak in Invoke, but that may not be true. What looks like a memory leak may in fact be the expected behaviour for our allocation patterns.

We observe ~20 to ~30 MB increase in memory usage per session executed. I did some prolonged tests, where I measured the process's RSS in bytes while doing 200 SDXL generations. I found that it eventually leveled off at around 100 generations, at which point memory usage had climbed by ~900MB from its starting point.

I used tracemalloc to diff the allocations of single session executions and found that we are allocating ~20MB or so per session in `ModelPatcher.apply_ti()`.

In `ModelPatcher.apply_ti()` we add tokens to the tokenizer when handling TIs. The added tokens should be scoped to only the current invocation, but there is no simple way to remove the tokens afterwards.

As a workaround for this, we clone the tokenizer, add the TI tokens to the clone, and use the clone to when running compel. Afterwards, this cloned tokenizer is discarded.

The tokenizer uses ~20MB of memory, and it has referrers/referents to other compel stuff. This is what is causing the observed increases in memory per session!

We'd expect these objects to be GC'd but python doesn't do it immediately. After creating the cond tensors, we quickly move on to denoising. So there isn't any time for the GC to happen to free up its existing memory arenas/blocks to reuse them. Instead, python needs to request more memory from the OS.

We can improve the situation by immediately calling `del` on the tokenizer clone and related objects. In fact, we already had some code in the compel nodes to `del` some of these objects, but not all.

Adding the `del`s vastly improves things. We hit peak RSS in half the sessions (~50 or less) and it's now ~100MB more than starting value. There is still a gradual increase in memory usage until we level off.
diff --git a/invokeai/app/invocations/compel.py b/invokeai/app/invocations/compel.py
@@ -114,6 +114,13 @@ def _lora_loader() -> Iterator[Tuple[ModelPatchRaw, float]]:
 
             c, _options = compel.build_conditioning_tensor_for_conjunction(conjunction)
 
+        del compel
+        del patched_tokenizer
+        del tokenizer
+        del ti_manager
+        del text_encoder
+        del text_encoder_info
+
         c = c.detach().to("cpu")
 
         conditioning_data = ConditioningFieldData(conditionings=[BasicConditioningInfo(embeds=c)])
@@ -222,7 +229,10 @@ def _lora_loader() -> Iterator[Tuple[ModelPatchRaw, float]]:
             else:
                 c_pooled = None
 
+        del compel
+        del patched_tokenizer
         del tokenizer
+        del ti_manager
         del text_encoder
         del text_encoder_info