Skip to content

Support for mmap / RssPrivate distinction in memray #778

@maxstr

Description

@maxstr

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

I'm currently using memray to try and track down where my application is consuming too much memory. My application is one that currently reads very large files into memory, so the mitigation that I am applying is to mmap the file and iterate over the mmap'd object instead.

However, the result of my fix is not being made clear in memray. There are two issues:

  1. The mmap itself is shown as an allocation of the size of the file (I understand space on the heap is being reserved, so this may technically be considered an allocation, but it's not memory reserved for my application)
  2. In the graph for "Resident size", what is graphed appears to be VmRSS, which includes RssFile. RssFile is data that is currently resident for the process, but is backed by a file and thus can be reclaimed at any time by the OS.

In other words, I cannot see the positive impact of my changes in memray and have to look to procfs to verify my fix is working.

Describe the solution you'd like

If the memray flamegraph charted RssPrivate (in addition to VmRSS, or instead of), I would be able to more easily verify my fix. Other users would also be able to see the distinction between allocations that were dedicated to their process and allocations which were reclaimable by the OS.

Additionally, if in the flame graph, file-backed allocations could be optionally shown / not shown, that would be very helpful as well.

Alternatives you considered

No response

Sample code for reproduction of the issue

First:

dd if=/dev/zero of=$(pwd)/test_file bs=1M count=5000

memray.py

import mmap
from hashlib import md5
from time import sleep

def hasher(mmap_):
    hash_ = md5()
    while values := mmap_.read(8192):
        hash_.update(values)
    return hash_.hexdigest()


with open("test_file", 'rb') as f:
    y = mmap.mmap(f.fileno(), length=0, prot=mmap.PROT_READ)
    hashed = hasher(y)
    print("done reading")
    print(open("/proc/self/status", 'r').read())
    while True:
        sleep(10)

Relevant output:

VmPeak:	 5231640 kB
VmSize:	 5231640 kB
VmLck:	       0 kB
VmPin:	       0 kB
VmHWM:	 5164776 kB
VmRSS:	 5164776 kB
RssAnon:	   27772 kB
RssFile:	 5137004 kB
RssShmem:	       0 kB

And the graphs shown:

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions