Efficient calculation of gradient for binary tree traversal #26452

ChrisBoettner · 2025-02-11T09:49:22Z

ChrisBoettner
Feb 11, 2025

Hey everyone,

First of all, thanks for all the work! I've been learning jax more closely lately, and its a lot of fun. I am currently working on an implementation of the julia AutoGP.jl package in python using jax.
A big part of this work is that I have to evulate algebraic expressions that are defined over via a binary tree (tree leafs are kernel operations, and nodes are additions or multiplications). These expressions evolve dynamically over time, so I had to find a way to represent them statically for jax. In the end, I've decided to encode the tree structure as an array that get's traversed in level-order using a stack. The final functions looks like the following:

@partial(
    jit,
    static_argnames=(
        "atoms",
        "operators",
        "num_atoms",
        "max_depth",
        "max_nodes",
    ),
)
def _evaluate_tree(
    x: Float[jnp.ndarray, " "] | Float[jnp.ndarray, " D"] | Float[jnp.ndarray, "D 1"],
    y: Float[jnp.ndarray, " "] | Float[jnp.ndarray, " D"] | Float[jnp.ndarray, "D 1"],
    initial_state: tuple[Float[jnp.ndarray, "..."], ScalarInt],
    post_order_expression: Int[jnp.ndarray, " D"],
    post_level_map: Int[jnp.ndarray, " D"],
    is_operator: Bool[jnp.ndarray, " D"],
    parameters: Float[jnp.ndarray, " M N"],
    atoms: tuple[AbstractAtom | tp.Callable, ...],
    operators: tuple[AbstractOperator, ...],
    num_atoms: int,
    max_depth: int,
    max_nodes: int,
) -> Float[jnp.ndarray, "..."]:
    """
    Evaluate a tree expression by traversing the tree in post-order
    and applying the kernel atom functions and operators.

    Parameters
    ----------
    x : Float[jnp.ndarray, " "] | Float[jnp.ndarray, " D"] | Float[jnp.ndarray, "D 1"]
        Input x data (0D or 1D array of shape (D, ) filled with floats).
    y : Float[jnp.ndarray, " "] | Float[jnp.ndarray, " D"] | Float[jnp.ndarray, "D 1"]
        Input y data (0D or 1D array of shape (D, ) filled with floats).
    initial_state : tuple[Float[jnp.ndarray, "..."], ScalarInt]
        The initial state of the tree evaluation. First element is the stack,
        second element is the pointer.
    post_order_expression : Int[jnp.ndarray, " D"]
        The post-order expression of the tree.
    post_level_map : Int[jnp.ndarray, " D"]
        The map from post order index to level order index.
    is_operator : Bool[jnp.ndarray, " D"]
        A boolean array indicating whether a item in the atom library is an operator.
    parameters : Float[jnp.ndarray, " M N"]
        A jnp array containing the parameters of the kernel functions in the tree
        kernel. The shape of the array is (M, N), where M is the number of nodes in
        the tree and N is the maximum number of parameters of the kernel functions.
    atoms : tuple[AbstractAtom, ...] | tuple[tp.Callable, ...]
        A tuple of kernel atom functions.
    operators : tuple[AbstractOperator, ...]
        A tuple of kernel operators.
    num_atoms : int
        The number of atoms in the atom library.
    max_depth : int
        The maximum depth of the tree.
    max_nodes : int
        The number of nodes in the tree.

    Returns
    -------
    Float[jnp.ndarray, "..."]
        The result of evaluating the tree expression.
    """

    def evaluate(state: tuple[jnp.ndarray, ScalarInt], idx: ScalarInt) -> tuple:
        """Evaluate the tree expression at a given node index."""

        tree_level_idx = post_level_map[idx]
        node_value = post_order_expression[idx]
        is_op = is_operator[node_value]

        def eval_leaf_node(state: tuple) -> tuple:
            """Evaluate a leaf node."""
            stack, pointer = state
            kernel_evaluation = lax.switch(
                node_value,
                atoms,
                x,
                y,
                parameters[get_parameter_leaf_idx(tree_level_idx, max_depth)],
            )
            new_stack = stack.at[pointer].set(kernel_evaluation)
            return new_stack, pointer + 1

        def eval_operator_node(state: tuple) -> tuple:
            """Evaluate an operator node."""
            stack, pointer = state
            left_child = jnp.array([stack[pointer - 1]])
            right_child = jnp.array([stack[pointer - 2]])
            kernel_evaluation = lax.switch(
                node_value - num_atoms,
                operators,
                left_child,
                right_child,
            )
            new_stack = stack.at[pointer - 2].set(kernel_evaluation)
            return new_stack, pointer - 1

        new_state = lax.cond(
            is_op,
            eval_operator_node,
            eval_leaf_node,
            operand=state,
        )
        return new_state

    def loop_body(loop_state: tuple) -> tuple:
        """Loop body: Evaluate the tree expression, update state and
        iterate post level pointer."""
        idx, state = loop_state
        new_state = evaluate(state, idx)
        return idx + 1, new_state

    def condition(loop_state: tuple) -> Bool[jnp.ndarray, " "]:
        """Loop condition: Continue until the end of the tree expression (by
        encountering empty node, which is given by -1 value in
        post_order_expression).
        """
        idx, _ = loop_state
        return post_order_expression[idx] >= 0

    _, final_state = bounded_while_loop(
        condition,
        loop_body,
        (0, initial_state),
        max_steps=max_nodes,
        kind="checkpointed",
        checkpoints=5,  # using checkpoint equation in equinox and assuming,
        # kernels with only ever have up to ~20 nodes,
        # might need to be adjusted for larger trees
    )
    return final_state[0][0]

I am making use of an equinox bounded_while_loop, since the tree expression might be very long (corresponding to the maximum number of nodes of a binary tree of size d), but usually, only the first ~10ish entries are used, so that looping over the entire array would be a waste.

Now comes my problem though. This expression is reasonably fast to evaluate for two inputs, and can easily be vmap-ed (to calculate e.g. the cross_covariance and gram matrices). However, calculating the gradient slows down a lot. For singular inputs x and y, the kernel evaluation and gradient calculation take about the same time. But when vmap-ed, the computations that involve the gradient are about 10x slower than direct evaluation of the kernel. I am struggling to figure out why. Do you maybe have any ideas?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient calculation of gradient for binary tree traversal #26452

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Efficient calculation of gradient for binary tree traversal #26452

Uh oh!

ChrisBoettner Feb 11, 2025

Replies: 0 comments

ChrisBoettner
Feb 11, 2025