Is it worth using `jnp.linalg.multi_dot` rather than `reduce jnp.dot`? #18299

nasyxx · 2023-10-27T02:02:31Z

nasyxx
Oct 27, 2023

I'm implementing a givens rotations that require a lot of matrix multiplications with same shape size.
They spend too much time on the jit.
Is there a way to run it faster?

Here is code:

from functools import reduce
import jax
import jax.numpy as jnp
import time


def gr() -> None:
  """Run givens rotation."""
  size = 30  # 40, 50, 90. # change size here.
  thetas = jax.random.uniform(jax.random.key(42),
                              (4, size *
                               (size - 1) // 2))  # batch 4, size 90*89//2 = 4005

  # batch givens rotation
  ix, iy = jnp.triu_indices(size, 1)
  g = jnp.eye(size) * jnp.ones_like(thetas)[..., None, None]
  c, s = jnp.cos(thetas), jnp.sin(thetas)
  cs = jnp.stack([c, -s], axis=1)
  sc = jnp.stack([s, c], axis=1)
  g = (
      g.at[..., jnp.arange(len(ix)), ix, [ix, iy]].set(cs).at[...,
                                                              jnp.arange(len(ix)), iy,
                                                              [ix, iy]].set(sc))

  # now g is (4, 4005, 90, 90)
  # run one givens rotations:
  t1 = time.time()
  multi_dot = jax.jit(jnp.linalg.multi_dot)
  ng = multi_dot(g[0]).block_until_ready()
  t2 = time.time()
  print("multi dot 1st run:", t2 - t1)
  t1 = time.time()
  ng = multi_dot(g[1]).block_until_ready()
  t2 = time.time()
  print("multi dot:", t2 - t1)

  t1 = time.time()

  @jax.jit
  def rdot(xx: jax.Array) -> jax.Array:
    return reduce(jnp.dot, xx)

  ng = rdot(g[0]).block_until_ready()
  t2 = time.time()
  print("reduce dot 1st run:", t2 - t1)
  t1 = time.time()
  ng = rdot(g[1]).block_until_ready()
  t2 = time.time()
  print("reduce dot:", t2 - t1)

I have run function gr with size 30, 40, 50, and 90.
The results here:

# 30
multi dot 1st run: 6.373845100402832
multi dot: 0.0009920597076416016
reduce dot 1st run: 1.6028120517730713
reduce dot: 0.0012731552124023438

# 40
multi dot 1st run: 31.00104808807373
multi dot: 0.002560138702392578
reduce dot 1st run: 3.654721975326538
reduce dot: 0.0026671886444091797

# 50
multi dot 1st run: 117.09066390991211
multi dot: 0.0068149566650390625
reduce dot 1st run: 8.86375379562378
reduce dot: 0.0063228607177734375

# 90 I ignored multi dot
reduce dot 1st run: 93.46390914916992
reduce dot: 0.10265707969665527

Answered by jakevdp

Oct 27, 2023

multi_dot is not much help here – its main purpose is to find an optimal ordering of a sequence of dot products given the size of the inputs. In your case, all inputs are the same size, so the order of operations doesn't matter.

The slow compilation times here come from the fact that in either case your operation is relying on Python control flow to dispatch a large number of JAX operations: Python loops are flattened leading to large programs, and large programs have slow compilation (compilation time grows roughly as the number of operations squared). In situations like this, you can often do better by re-expressing your loop in terms of a control flow primitive like lax.scan. For example:

View full answer

jakevdp · 2023-10-27T04:52:44Z

jakevdp
Oct 27, 2023
Maintainer

multi_dot is not much help here – its main purpose is to find an optimal ordering of a sequence of dot products given the size of the inputs. In your case, all inputs are the same size, so the order of operations doesn't matter.

The slow compilation times here come from the fact that in either case your operation is relying on Python control flow to dispatch a large number of JAX operations: Python loops are flattened leading to large programs, and large programs have slow compilation (compilation time grows roughly as the number of operations squared). In situations like this, you can often do better by re-expressing your loop in terms of a control flow primitive like lax.scan. For example:

@jax.jit
def scan_dot(xx: jax.Array) -> jax.Array:
  return lax.scan(lambda y, x: (jnp.dot(y, x), None), jnp.eye(size, dtype=xx.dtype), xx)[0]

For n=40, I get this result when adding the new approach to your code:

multi dot 1st run: 57.20575785636902
multi dot: 0.009573936462402344
reduce dot 1st run: 10.70662260055542
reduce dot: 0.0089874267578125
scan dot 1st run: 0.04943490028381348
scan dot: 0.007943391799926758

1 reply

nasyxx Oct 27, 2023
Author

Thanks! I totally forgot the scan function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it worth using `jnp.linalg.multi_dot` rather than `reduce jnp.dot`? #18299

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it worth using jnp.linalg.multi_dot rather than reduce jnp.dot? #18299

Uh oh!

nasyxx Oct 27, 2023

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

jakevdp Oct 27, 2023 Maintainer

Uh oh!

nasyxx Oct 27, 2023 Author

Is it worth using `jnp.linalg.multi_dot` rather than `reduce jnp.dot`? #18299

nasyxx
Oct 27, 2023

Replies: 1 comment 1 reply

jakevdp
Oct 27, 2023
Maintainer

nasyxx Oct 27, 2023
Author