Understanding transpose rules #18346

dionhaefner · 2023-11-01T15:47:50Z

dionhaefner
Nov 1, 2023

I've gone over How JAX primitives work and Autodidax, but still don't have a good intuition for what a transpose rule actually represents mathematically.

I understand that a JVP rule provides a recipe to compute $(f(x), J(x) \cdot x_\text{tan})$ where $J(x)$ is the Jacobian matrix $J_{ij} = \partial f_i / \partial x_j$ at point $x$. The JVP is clearly linear in $x_\text{tan}$.

Now the goal is to compute the VJP, $x_\text{tan} \cdot J(x)$. Is is valid to say that the transpose rule defines a recipe to compute $J^T$, so we can do something like this?

$$x_\text{tan} \cdot J(x) = (J^T \cdot x_\text{tan}^T)^T$$

Answered by froystig

Nov 3, 2023

The transposition rule for a linear function $f$ computes its transpose $f^T$ at a particular point. Transposition is indeed used by jax's implementation of VJPs, where the linear function is the Jacobian map $J$ you mention.

An extra bit of complexity is that some primitives are only linear in some of their inputs. We sometimes use the term "conditionally linear" to describe this. In this case the transpose rule only transposes the "linear part" of the function.

To be a bit more precise: let's say $f$ takes two arguments, and is linear only in the second, for any value of the first. The transposition rule for $f$ then computes "the transpose of the function $f$, restricted to a particula…

View full answer

froystig · 2023-11-03T23:30:22Z

froystig
Nov 3, 2023
Maintainer

The transposition rule for a linear function $f$ computes its transpose $f^T$ at a particular point. Transposition is indeed used by jax's implementation of VJPs, where the linear function is the Jacobian map $J$ you mention.

An extra bit of complexity is that some primitives are only linear in some of their inputs. We sometimes use the term "conditionally linear" to describe this. In this case the transpose rule only transposes the "linear part" of the function.

To be a bit more precise: let's say $f$ takes two arguments, and is linear only in the second, for any value of the first. The transposition rule for $f$ then computes "the transpose of the function $f$, restricted to a particular value of its first argument," at a point.

Internally, we have a helper function (is_undefined_primal) to identify the linear vs. non-linear operands of a primitive. I find that a useful internal example to follow is the div primitive's rule, since division ((x, y) -> x / y) is very simply linear in one argument, but not the other:

https://github.com/google/jax/blob/953f4670d88d2a1c168a4ad0b44ed940f6c58829/jax/_src/lax/lax.py#L2186-L2191

3 replies

dionhaefner Nov 6, 2023
Author

Thanks! Two follow-up questions to that:

What exactly does f^T mean? Is this a transposition of a linear map (matrix transpose), or is it a transposition from primal to tangent space?
An extra bit of complexity is that some primitives are only linear in some of their inputs. We sometimes use the term "conditionally linear" to describe this. In this case the transpose rule only transposes the "linear part" of the function.

As I take it the JVP of a function is always linear in the tangent values. Why do we need this additional complexity?

froystig Nov 6, 2023
Maintainer

It's the transposition of a linear map, which corresponds to a matrix transpose when mapping between finite-dimensional vector spaces over the same field.
Yes, a Jacobian map is always linear in its input (tangents). But in JAX internals, the term "JVP" typically refers to something that is not a function only of tangents. Rather, it's a function from (primals, tangents) to (primals, tangents), and it is linear in the tangents but not necessarily in the primals. In other words, it is conditionally linear in tangents, given primals.

dionhaefner Nov 7, 2023
Author

Thanks again! I think that makes it clearer.

I got confused because the output of the transpose rules looked more like partial derivatives of the primitive's JVP rule wrt. the tangents to me, but if the transposed function is always linear in the tangents these definitions should be identical. So I guess it's just a matter of perspective.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding transpose rules #18346

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Understanding transpose rules #18346

Uh oh!

dionhaefner Nov 1, 2023

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

froystig Nov 3, 2023 Maintainer

Uh oh!

dionhaefner Nov 6, 2023 Author

Uh oh!

Uh oh!

froystig Nov 6, 2023 Maintainer

Uh oh!

Uh oh!

dionhaefner Nov 7, 2023 Author

dionhaefner
Nov 1, 2023

Replies: 1 comment 3 replies

froystig
Nov 3, 2023
Maintainer

dionhaefner Nov 6, 2023
Author

froystig Nov 6, 2023
Maintainer

dionhaefner Nov 7, 2023
Author