Skip to content

peak memory usage higher than expected when loading catalogs #155

@lgarrison

Description

@lgarrison

As first noted in #7, the peak RSS when loading catalogs is much higher than we'd like. For example, loading 227 GB of (unpacked) AbacusSummit catalog data uses nearly 500 GB of memory:

Example code

from pathlib import Path

from abacusnbody.data.compaso_halo_catalog import CompaSOHaloCatalog

suitedir = Path('/mnt/home/lgarrison/ceph/AbacusSummit')
catpath = suitedir / 'AbacusSummit_base_c000_ph000/halos/z0.100'

fields = [
    'N',
    'x_L2com',
    'v_L2com',
    'r90_L2com',
    'r25_L2com',
    'r98_L2com',
    'npstartA',
    'npoutA',
    'id',
    'sigmav3d_L2com',
]

CompaSOHaloCatalog(
        catpath,
        fields=fields,
        subsamples=dict(A=True, rv=True, pid=True),
        # unpack_bits=['pid', 'tagged'],
        unpack_bits=True,
        cleaned=True,
    )

Output:

❯ /usr/bin/time python issues/gh7/time_load.py
CompaSO Halo Catalog
====================
AbacusSummit_base_c000_ph000 @ z=0.1
------------------------------------
     Halos: 3.82e+08 halos,      10 fields,    24.5 GB
Subsamples:  3.7e+09 particles,   7 fields,     203 GB
Cleaned halos: True
Halo light cone: False
Total time: 194.38 s
389.83user 125.57system 10:07.45elapsed 84%CPU (0avgtext+0avgdata 494417824maxresident)k
0inputs+0outputs (0major+1439721minor)pagefaults 0swaps

We expect the peak to be somewhat higher than the final usage, since we have to unpack data in memory, but not this much. The overage should be pretty small since all the unpacking is done superslab-by-superslab.

I spent a little bit of time with a memory profiler trying to figure out why this is happening, but I didn't get very far. Part of the problem is that we can't read ASDF files into pre-allocated rows of the table, so we have one copy that gets read/decompressed from disk, and then a second when we fill the table. Even that shouldn't result in this much memory usage, I'm pretty sure, so maybe something is keeping references to buffers that we want to be garbage collected, causing leak-like behavior...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions