We run out of memory with MACE+LAMMPS for large structures. @keceli has some ideas, including: - [ ] `export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` - [ ] Garbage collectors