Unexpected memory use when using vmap #24660

segasai · 2024-11-01T12:47:05Z

segasai
Nov 1, 2024

Hi,

I'm encountering the situation of continuously increasing memory use with jax which doesn't quite make sense to me (see code below).
I have a very large array 500,000 x 4,000 that essentially need to vmap'ed to get 500,000 numbers (using func0 given below).
When I tried directly doing that I ran out of memory, so I thought, I'll split the vmap into chunks. I.e I'll apply vmap
to array [0:5000,:] then [5000:10000,:] and so forth to concatenate the results.
To my surprise when I did that when I look at the memory use I can still see it increasing by 3Gb per iteration (and in the end I run out of memory). I understand that each vmap will need to have information about the gradients, but that should be nowhere near 3gb.
So I'm wondering if this is a bug or am I am missing something here.
Thanks in advance !

The test code to illustrate the issue given below (requires 70GB ram to run with the current values of nspec,npix parameters)
Also I use CPU and jax 0.4.35

import numpy as np
import jax.numpy as jnp
from jax import value_and_grad, vmap

def func0(pcs, curd, curerr):
    resid = jnp.linalg.lstsq((pcs / curerr).T, curd / curerr)[1]
    return resid[0]

def like(p, dats=None, errs=None, npc=None, nspec=None, npix=None):
    pcs = p.reshape(npc, npix)
    vectorized_func = vmap(func0, in_axes=(None, 0, 0))
    nchunks = 1000
    rr = []
    c_size = nspec // nchunks + 1
    for i in range(0, nspec, c_size):
        print(i)
        results = jnp.sum(
            vectorized_func(pcs, dats[i:i + c_size], errs[i:i + c_size]))
        rr.append(results)
    total_result = jnp.sum(rr)
    return total_result

if __name__ == '__main__':
    npix = 4000
    nspec = 500000
    npc = 100
    dats = np.random.normal(size=(nspec, npix))
    errs = dats * 0 + 1
    data_tup = (jnp.array(dats), jnp.array(errs), npc, nspec, npix)
    rng = np.random.default_rng(22)
    p0 = jnp.array(rng.normal(size=npc * npix))
    like_grad_jit = value_and_grad(like)
    like_grad_jit(p0, *data_tup)

adam-hartshorne · 2024-11-01T18:35:36Z

adam-hartshorne
Nov 1, 2024

This doesn't answer your question, but have you considered using pyKeops to handle such large matrices in a smart way?

https://www.kernel-operations.io/keops/index.html

Although it states that bindings are only for Numpy and PyTorch, you can easily use https://github.com/rdyro/torch2jax as the go-between.

This all supports auto-diff, vmap, etc etc etc. And the extra overhead is very minimal, in fact, due to the hyper-efficient way Keops works it can be faster, even with going through an intermediate step.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected memory use when using vmap #24660

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Unexpected memory use when using vmap #24660

Uh oh!

segasai Nov 1, 2024

Replies: 1 comment

Uh oh!

adam-hartshorne Nov 1, 2024

segasai
Nov 1, 2024

adam-hartshorne
Nov 1, 2024