`triu_indices` in JAX quite slow #14336

soraros · 2023-02-07T19:36:36Z

soraros
Feb 7, 2023

To avoid materialising the output during staging, I wrote a version of triu_indices in JAX. I think I did a ok job in using the numpy-style array DSL. However, even the jitted version is quite slow (at least on cpu) comparing with np.triu_indices. Could someone help understand why it is the case? Is it because, for instance, cumsum is not "entirely vectorized"?

@partial(jit, static_argnames=('n', 'k'))
def triu_indices(n: int, k: int = 0):
  assert n >= 0 and k >= 0
  N = max(n - k, 0)
  l = N * (N + 1) // 2
  iota = lax.iota(np.int32, N)
  idx = lax.cumsum(1 + iota, reverse=True)
  I = lax.cumsum(jnp.zeros(l, dtype=np.int32).at[idx].set(1))
  J = lax.iota(np.int32, l) - lax.cumsum(jnp.zeros(l, dtype=np.int32).at[idx].set(iota))
  return I, J + k

Answered by jakevdp

Feb 7, 2023

When I try your code on a Colab CPU runtime, I find that your JAX version is pretty fast. Can you say more about how you benchmarked this?

_ = jax.block_until_ready(triu_indices(10))
%timeit jax.block_until_ready(triu_indices(10))
# 8.37 µs ± 79.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.triu_indices(10)
# 23.3 µs ± 4.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

View full answer

jakevdp · 2023-02-07T20:01:21Z

jakevdp
Feb 7, 2023
Maintainer

When I try your code on a Colab CPU runtime, I find that your JAX version is pretty fast. Can you say more about how you benchmarked this?

_ = jax.block_until_ready(triu_indices(10))
%timeit jax.block_until_ready(triu_indices(10))
# 8.37 µs ± 79.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit np.triu_indices(10)
# 23.3 µs ± 4.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

9 replies

jakevdp Feb 7, 2023
Maintainer

But I can see some other operations (say bincount) implemented in the same fashion and they all seem to run pretty fast.

Have you benchmarked them? These operations were not implemented this way for performance, but rather because they're just about the only way to implement such things on top of the primitives provided by XLA.

soraros Feb 7, 2023
Author

but rather because they're just about the only way to implement such things on top of the primitives provided by XLA

This I'm aware of.

Have you benchmarked them?

Good point. I suppose I'm under the (false?) impression that they (histogram, bincount etc.) are faster because part of the overhead is hidden in jit-compiled code. Or, maybe I'm just too used to seeing JAX run way faster than numpy even on CPU due of fusion, DCE etc.

Regarding triu_indices, I can't think of an implementation with drastically better performance in terms of complexity than the two mentioned in our discussion. Does this mean we can't have a triu_indices provided by jax.numpy which doesn't eagerly materialise its output in the near future?

jakevdp Feb 7, 2023
Maintainer

We can have this kind of implementation (and we probably should) but it will be slower than NumPy on CPU. I don't think that's a deal-breaker; I think it's pretty typical of the performance of JAX wrappers of nontrivial numpy functions.

soraros Feb 7, 2023
Author

We can have this kind of implementation (and we probably should) but it will be slower than NumPy on CPU. I don't think that's a deal-breaker; I think it's pretty typical of the performance of JAX wrappers of nontrivial numpy functions.

Totally agree. I opened #14339 to track this.

jakevdp Feb 7, 2023
Maintainer

Thanks!

triu_indices in JAX quite slow #14336

Uh oh!

Uh oh!

soraros Feb 7, 2023

Replies: 1 comment · 9 replies

Uh oh!

jakevdp Feb 7, 2023 Maintainer

Uh oh!

jakevdp Feb 7, 2023 Maintainer

Uh oh!

Uh oh!

soraros Feb 7, 2023 Author

Uh oh!

jakevdp Feb 7, 2023 Maintainer

Uh oh!

Uh oh!

soraros Feb 7, 2023 Author

Uh oh!

jakevdp Feb 7, 2023 Maintainer

`triu_indices` in JAX quite slow #14336

soraros
Feb 7, 2023

Replies: 1 comment 9 replies

jakevdp
Feb 7, 2023
Maintainer

jakevdp Feb 7, 2023
Maintainer

soraros Feb 7, 2023
Author

jakevdp Feb 7, 2023
Maintainer

soraros Feb 7, 2023
Author

jakevdp Feb 7, 2023
Maintainer