peak memory usage higher than expected when loading catalogs

As first noted in #7, the peak RSS when loading catalogs is much higher than we'd like. For example, loading 227 GB of (unpacked) AbacusSummit catalog data uses nearly 500 GB of memory:

<details><summary>Example code</summary>
<p>

```python
from pathlib import Path

from abacusnbody.data.compaso_halo_catalog import CompaSOHaloCatalog

suitedir = Path('/mnt/home/lgarrison/ceph/AbacusSummit')
catpath = suitedir / 'AbacusSummit_base_c000_ph000/halos/z0.100'

fields = [
    'N',
    'x_L2com',
    'v_L2com',
    'r90_L2com',
    'r25_L2com',
    'r98_L2com',
    'npstartA',
    'npoutA',
    'id',
    'sigmav3d_L2com',
]

CompaSOHaloCatalog(
        catpath,
        fields=fields,
        subsamples=dict(A=True, rv=True, pid=True),
        # unpack_bits=['pid', 'tagged'],
        unpack_bits=True,
        cleaned=True,
    )
```

</p>
</details> 

Output:
```
❯ /usr/bin/time python issues/gh7/time_load.py
CompaSO Halo Catalog
====================
AbacusSummit_base_c000_ph000 @ z=0.1
------------------------------------
     Halos: 3.82e+08 halos,      10 fields,    24.5 GB
Subsamples:  3.7e+09 particles,   7 fields,     203 GB
Cleaned halos: True
Halo light cone: False
Total time: 194.38 s
389.83user 125.57system 10:07.45elapsed 84%CPU (0avgtext+0avgdata 494417824maxresident)k
0inputs+0outputs (0major+1439721minor)pagefaults 0swaps
```

We expect the peak to be somewhat higher than the final usage, since we have to unpack data in memory, but not this much. The overage should be pretty small since all the unpacking is done superslab-by-superslab.

I spent a little bit of time with a memory profiler trying to figure out why this is happening, but I didn't get very far. Part of the problem is that we can't read ASDF files into pre-allocated rows of the table, so we have one copy that gets read/decompressed from disk, and then a second when we fill the table. Even that shouldn't result in this much memory usage, I'm pretty sure, so maybe something is keeping references to buffers that we want to be garbage collected, causing leak-like behavior...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

peak memory usage higher than expected when loading catalogs #155

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

peak memory usage higher than expected when loading catalogs #155

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions