Vectorization of mask generation for a custom dataset #18449

mohamad-amin · 2023-11-09T02:45:57Z

mohamad-amin
Nov 9, 2023

Hey,

I have an input dataset in the form of (i, j, k) pairs such that each $i,j,k \in [m]$.
My goal is to construct a mask matrix of size $n \times m$ where n is the number of unique (i,j) pairs in the dataset (hence $n \le m^2$) and for each entry of this mask we have that mask((i,j), k) = 1 if (i, j, k) in dataset else 0.
So far I've came up with this snippet, which works, but requires a lot of memory and breaks (OOM) when $m$ grows above 300-ish:

# Given: dataset, an N x 3 matrix of input (i,j,k) pairs.

def maskup(data, row):
    vals = jnp.where((data[:, 0] == row[0]) & (data[:, 1] == row[1]), data[:, 2], -1)
    return jnp.zeros(p+1).at[vals].set(jnp.ones(len(vals)))[:p]

v_maskup = jax.jit(jax.vmap(maskup, in_axes=(None, 0)))
dataset_ij = jnp.unique(dataset[:, :2], axis=0)  # Find unique (i, j) pairs
mask = v_maskup(dataset, dataset_ij)

I'm wondering how this can be improved to avoid the OOM, but I'm not sure how I should approach this. I'd be grateful for any feedbacks or comments. Thanks!

Answered by jakevdp

Nov 9, 2023

I think you can compute the result you're after more efficiently using the return_inverse argument of jnp.unique:

import jax
import jax.numpy as jnp

def make_mask(data):
  m = data[:, 2].max() + 1
  pairs, idx = jnp.unique(data[:, :2], axis=0, return_inverse=True)
  return jnp.zeros_like(data, shape=(len(pairs), m)).at[idx, data[:, 2]].set(1)

m = 1000
N = 10000

key = jax.random.key(0)
data = jax.random.randint(key, shape=(N, 3), minval=0, maxval=1000)

print(make_mask(data).shape)
# (9952, 1000)

With a smaller dataset, I think it shows that it gives you what you described:

data = jnp.array([[1, 2, 3],
                  [1, 2, 4],
                  [1, 3, 2],
                  [2, 3, 0]])

View full answer

jakevdp · 2023-11-09T03:19:13Z

jakevdp
Nov 9, 2023
Maintainer

I think you can compute the result you're after more efficiently using the return_inverse argument of jnp.unique:

import jax
import jax.numpy as jnp

def make_mask(data):
  m = data[:, 2].max() + 1
  pairs, idx = jnp.unique(data[:, :2], axis=0, return_inverse=True)
  return jnp.zeros_like(data, shape=(len(pairs), m)).at[idx, data[:, 2]].set(1)

m = 1000
N = 10000

key = jax.random.key(0)
data = jax.random.randint(key, shape=(N, 3), minval=0, maxval=1000)

print(make_mask(data).shape)
# (9952, 1000)

With a smaller dataset, I think it shows that it gives you what you described:

data = jnp.array([[1, 2, 3],
                  [1, 2, 4],
                  [1, 3, 2],
                  [2, 3, 0]])
print(make_mask(data))
# [[0 0 0 1 1]
#  [0 0 1 0 0]
#  [1 0 0 0 0]]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vectorization of mask generation for a custom dataset #18449

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Vectorization of mask generation for a custom dataset #18449

Uh oh!

Uh oh!

mohamad-amin Nov 9, 2023

Replies: 1 comment

Uh oh!

Uh oh!

jakevdp Nov 9, 2023 Maintainer

mohamad-amin
Nov 9, 2023

jakevdp
Nov 9, 2023
Maintainer