Skip to content

Addition of generic / introductory glossary #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions docs/src/glossary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# ChainRules Glossary

This glossary serves as a quick reference for common terms used in the field of Automatic Differentiation, as well as those used throughout the documentation relating specifically to ChainRules.

##Definitions:

###Adjoint:

The conjugate transpose of the Jacobian for a given function `f`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the adjoint always the conjugate transpose of the Jacobian specifically? In the below definitions which reference the adjoint, it's always the "adjoint of the Jacobian".

This makes me think the adjoint definition should be "The conjugate transpose of a matrix" and the subsequent definitions can refer to "the adjoint of the Jacobian".

Alternatively if, when we mention the adjoint, we're always talking about the adjoint of a Jacobian, this definition can stay as it is and in subsequent definitions we can just say "the adjoint".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we want to title the Adjoint of a function.
ANd start by saying that the adjoint of a matrix is another word for it's conjugate transpose.
Then mentioning that it can also be applied to a linear operator as every linear operator can be described as y = Jx and that as an adjoint linear operator of y' = x'J'.
Then say that people say as a shorthand/abuse of terminoly the adjoint of a function,
when what they actually mean is to get a function which is the adjoint of pushfoward linear operator.
The pushforward linear operator is the the linear operator that has the same jacobian as the function at that point.
The pullback is its adjoint.
linearization of the function at a point, to get a linear operator (the pushforward),
and then
Then say that people occationally say the adjoint of a function,
when what they really mean is: the adjoint of the jacobian of the function,
or they mean the pullback.
Sometimes people say adjoint of a function to mean pullback.

Copy link
Contributor Author

@thomasgudjonwright thomasgudjonwright Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree this def isn't sufficient. The adjoint is super broad as a term, so I am having a hard time figuring out how much / how little to include


###Derivative:

The derivative of a function `y = f(x)` with respect to the independent variable `x` denoted `f'(x)` or `dy/dx` is the rate of change of the dependent variable `y` with respect to the change of the independent variable `x`. In multiple dimensions, we may refer to the gradient of a function.

###Differential:

The differential of a given function `y = f(x)` denoted `dy` is the product of the derivative function `f'(x)` and the increment of the independent variable `dx`. In multiple dimensions, it is the sum of these products across each dimension (using the partial derivative and the given independent variable's increment).

In ChainRules, differentials are types ("differential types") and correspond to primal types. A differential should represent a difference between two primal values.

####Natural Differential:

A natural differential type for a given primal type is the type people would intuitively associate with representing the difference between two values of the primal type.

####Structural Differential:

If a given primal type `P` does not have a natural differential, we need to come up with one that makes sense. These are called structural differentials and are represented as `Composite{P, <:NamedTuple}`.

####Semi-Structural Differential:

A structural differential that contains at least one natural differential field.

####Thunk:

An "unnatural" differential type. If we wish to delay the computation of a derivative for whatever reason, we wrap it in a `Thunk` or `ImplaceableThunk`. It holds off on computing the wrapped derivative until it is needed.

####Zero:

`Zero()` can also be a differential type. If you have trouble understanding the rules enforced upon differential types, consider this one first, as `Zero()` is the trivial vector space.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sentence is unclear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with more fundamental info


###Directional Derivative:

The directional derivative of a function `f` at any given point in any given unit-direction is the gradient multiplied by the direction. It represents the rate of change of `f` in that direction.

###F-rule:

A function used in forward-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pushforward.

###Gradient:

The gradient of a scalar function `f` represented by `∇f` is a vector function whose components are the partial derivatives of `f` with respect to each dimension of the domain of `f`.

###Jacobian:

The Jacobian of a vector-valued function `f` is the matrix of `f`'s first-order partial derivatives.

###Jacobian Transpose Vector Product (j'vp):

The product of the adjoint of the Jacobian and the vector in question. A description of the pullback in terms of its Jacobian.

###Jacobian Vector Product (jvp):

The product of the Jacobian and the vector in question. It is a description of the pushforward in terms of its Jacobian.

###Primal:

Something relating to the original problem, as opposed to relating to the derivative. In ChainRules, primals are types ("primal types").

###Pullback:

`Pullback(f)` describes the sensitivity of the input of `f` as a function of (for the relative change to) the sensitivity of the output of `f`. Can be represented as the dot product of a vector (left) the adjoint Jacobian (right).

###Pushforward:

`Pushforward(f)` describes the sensitivity of the output of `f` as a function of (for the relative change to) the sensitivity of the input of `f`. Can be represented as the dot product of the Jacobian (left) and a vector (right).

###R-rule:

A function used in reverse-mode differentiation. For a given function `f`, it takes in the positional and keyword arguments of `f` and returns the primal result and the pullback.