jax.lax.cond is evaluating both the branches? #17544

dhyani15 · 2023-09-11T18:23:00Z

dhyani15
Sep 11, 2023

I am looping through each attention head and applying either function f1 or f2 depending on the value of parameter self.alpha. F1 is slower than F2 but my implementation always gives me the same runtime when i run for different values of alpha.

    def f1 (x):
        # print('f1')
        return x/x.shape[2]
    def f2 (x):
        # print('f2')
        temp = nn.relu(x)
        return temp/(jnp.sum(temp,axis=-1,keepdims=True) + 1e-5)
        
    def choose_attention(alpha, x):
        return jax.lax.cond(alpha[0, 0, 0, 0], lambda _: f2(x), lambda _: f1(x), operand=None)

    results = []
    for i in range(12):
        # print(i)
        alpha_i = self.alpha[:, i:i+1, :, :]
        x_i = attn_weights[:, i:i+1, :, :]
            
        result_i = choose_attention(alpha_i, x_i)
        results.append(result_i)

    final_result = jnp.concatenate(results, axis=1)

jakevdp · 2023-09-12T16:29:23Z

jakevdp
Sep 12, 2023
Maintainer

What you're describing is inconsistent with how cond should work. I suspect you are not accounting for JAX's execution model when running your microbenchmarks: see JAX FAQ: Benchmarking JAX Code for details on how to do robust micro-benchmarks of JAX code.

Here's an example of microbenchmarks showing that cond is in fact selecting between the two branches (using %timeit on a Colab CPU):

import jax.numpy as jnp
from jax import nn
import jax

@jax.jit
def f1(x):
    return x/x.shape[2]

@jax.jit
def f2(x):
    temp = nn.relu(x)
    return temp/(jnp.sum(temp,axis=-1,keepdims=True) + 1e-5)
    
@jax.jit
def choose_attention(alpha, x):
    return jax.lax.cond(alpha[0, 0, 0, 0], lambda _: f2(x), lambda _: f1(x), operand=None)

x = jnp.zeros((10, 10, 1000))

_ = f1(x)
%timeit f1(x).block_until_ready()
# 25.1 µs ± 784 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

_ = f2(x)
%timeit f2(x).block_until_ready()
# 119 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

alpha = jnp.zeros((1, 1, 1, 1))
_ = choose_attention(alpha, x)
%timeit choose_attention(alpha, x).block_until_ready()
# 26.9 µs ± 1.45 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

alpha = jnp.ones((1, 1, 1, 1))
_ = choose_attention(alpha, x)
%timeit choose_attention(alpha, x).block_until_ready()
# 121 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

jax.lax.cond is evaluating both the branches? #17544

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

jax.lax.cond is evaluating both the branches? #17544

Uh oh!

Uh oh!

dhyani15 Sep 11, 2023

Replies: 1 comment

Uh oh!

jakevdp Sep 12, 2023 Maintainer

dhyani15
Sep 11, 2023

jakevdp
Sep 12, 2023
Maintainer