Added script for comparing pullbacks and added an explanation for why we implemented the pullback the way we did.

benedict-96 · benedict-96 · commit d8512a30e9e3 · 2024-12-03T16:00:16.000+01:00
diff --git a/scripts/pullback_comparison.jl b/scripts/pullback_comparison.jl
@@ -0,0 +1,26 @@
+using SymbolicNeuralNetworks
+using AbstractNeuralNetworks
+using GeometricMachineLearning
+using AbstractNeuralNetworks: FeedForwardLoss
+using GeometricMachineLearning: ZygotePullback
+import Random
+Random.seed!(123)
+
+c = Chain(Dense(2, 3, tanh), Dense(3, 1, tanh))
+nn = SymbolicNeuralNetwork(c)
+nn_cpu = NeuralNetwork(c, CPU())
+loss = FeedForwardLoss()
+spb = SymbolicPullback(nn, loss)
+zpb = ZygotePullback(loss)
+
+batch_size = 10000
+input = rand(2, batch_size)
+output = rand(1, batch_size)
+# output sensitivities
+_do = 1.
+
+spb(nn_cpu.params, nn.model, (input, output))[2](_do)
+zpb(nn_cpu.params, nn.model, (input, output))[2](_do)
+@time spb_evaluated = spb(nn_cpu.params, nn.model, (input, output))[2](_do)
+@time zpb_evaluated = zpb(nn_cpu.params, nn.model, (input, output))[2](_do)[1].params
+# @assert values(spb_evaluated) .≈ values(zpb_evaluated)
diff --git a/src/pullback.jl b/src/pullback.jl
@@ -1,4 +1,4 @@
-"""
+@doc raw"""
     SymbolicPullback <: AbstractPullback
 
 `SymbolicPullback` computes the *symbolic pullback* of a loss function.
@@ -22,6 +22,63 @@ pv_values = pb(ps, nn.model, (rand(2), rand(1)))[2](1) |> typeof
 
 @NamedTuple{L1::@NamedTuple{W::Matrix{Float64}, b::Vector{Float64}}}
 ```
+
+# Implementation
+
+An instance of `SymbolicPullback` stores
+- `loss`: an instance of a `NetworkLoss`,
+- `fun`: a function that is used to compute the pullback.
+
+If we call the functor of an instance of `SymbolicPullback` on `model`, `ps` and `input` it returns:
+```julia
+_pullback.loss(model, ps, input...), _pullback.fun(input..., ps)
+```
+where the second output argument is again a function.
+
+# Extended help
+
+We note the following seeming peculiarity:
+
+```jldoctest
+using SymbolicNeuralNetworks
+using AbstractNeuralNetworks
+using Symbolics
+import Random
+Random.seed!(123)
+
+c = Chain(Dense(2, 1, tanh))
+nn = SymbolicNeuralNetwork(c)
+loss = FeedForwardLoss()
+pb = SymbolicPullback(nn, loss)
+ps = initialparameters(c) |> NeuralNetworkParameters
+input_output = (rand(2), rand(1))
+loss_and_pullback = pb(ps, nn.model, input_output)
+pv_values = loss_and_pullback[2](1)
+
+@variables soutput[1:SymbolicNeuralNetworks.output_dimension(nn.model)]
+symbolic_pullbacks = SymbolicNeuralNetworks.symbolic_pullback(loss(nn.model, nn.params, nn.input, soutput), nn)
+pv_values2 = build_nn_function(symbolic_pullbacks, nn.params, nn.input, soutput)(input_output[1], input_output[2], ps)
+
+pv_values == (pv_values2 |> SymbolicNeuralNetworks._get_params |> SymbolicNeuralNetworks._get_contents)
+
+# output
+
+true
+```
+
+See the docstrings for [`symbolic_pullback`](@ref), [`build_nn_function`](@ref), [`_get_params`](@ref) and [`_get_contents`](@ref) for more info on the functions that we used here.
+The noteworthy thing in the expression above is that the functor of `SymbolicPullback` returns two objects: the first one is the loss value evaluated for the relevant parameters and inputs. The second one is a function that takes again an input argument and then finally returns the partial derivatives. But why do we need this extra step with another function?
+
+!!! info "Reverse Accumulation"
+    In machine learning we typically do [reverse accumulation](https://en.wikipedia.org/wiki/Automatic_differentiation#Forward_and_reverse_accumulation) to perform automatic differentiation (AD).
+    Assuming we are given a function that is the composition of simpler functions ``f = f_1\circ{}f_2\circ\cdots\circ{}f_n:\mathbb{R}^n\to\mathbb{R}^m`` *reverse differentiation* starts with *output sensitivities* and then successively feeds them through ``f_n``, ``f_{n-1}`` etc. So it does:
+    ```math
+    (\nabla_xf)^T = (\nabla_{x}f_1)^T(\nabla_{f_1(x)}f_2)^T\cdots(\nabla_{f_{n-1}(\cdots{}x)}f_n)^T(do),
+    ```
+    where ``do\in\mathbb{R}^m`` are the *output sensitivities* and the jacobians are stepwise multiplied from the left. So we propagate from the output stepwise back to the input. If we have ``m=1``, i.e. if the output is one-dimensional, then the *output sensitivities* may simply be taken to be ``do = 1``.
+
+So in theory we could leave out this extra step: returning an object (that is stored in `pb.fun`) can be seen as unnecessary as we could simply store the equivalent of `pb.fun(1.)` in an instance of `SymbolicPullback`.
+It is however customary for a pullback to return a callable function (that depends on the *output sensitivities*), which is why we also choose to do this here, even if the *output sensitivities* are a scalar quantity.
 """
 struct SymbolicPullback{NNLT, FT} <: AbstractPullback{NNLT}
     loss::NNLT
@@ -38,17 +95,40 @@ function SymbolicPullback(nn::SymbolicNeuralNetwork, loss::NetworkLoss)
     symbolic_pullbacks = symbolic_pullback(symbolic_loss, nn)
     pbs_executable = build_nn_function(symbolic_pullbacks, nn.params, nn.input, soutput)
     function pbs(input, output, params)
-        _ -> (pbs_executable(input, output, params) |> _get_params |> _get_contents)
+        pullback(::Union{Real, AbstractArray{<:Real}}) = _get_contents(_get_params(pbs_executable(input, output, params)))
+        pullback
     end
     SymbolicPullback(loss, pbs)
 end
 
 SymbolicPullback(nn::SymbolicNeuralNetwork) = SymbolicPullback(nn, AbstractNeuralNetworks.FeedForwardLoss())
 
+"""
+    _get_params(ps::NeuralNetworkParameters)
+
+Return the `NamedTuple` that's equivalent to the `NeuralNetworkParameters`.
+"""
 _get_params(nt::NamedTuple) = nt
 _get_params(ps::NeuralNetworkParameters) = ps.params
 _get_params(ps::AbstractArray{<:Union{NamedTuple, NeuralNetworkParameters}}) = [_get_params(nt) for nt in ps]
 
+"""
+    _get_contents(nt::AbstractArray{<:NamedTuple})
+
+Return the contents of a one-dimensional vector.
+
+# Examples
+
+```jldoctest
+using SymbolicNeuralNetworks: _get_contents
+
+_get_contents([(a = "element_contained_in_vector", )])
+
+# output
+
+(a = "element_contained_in_vector",)
+```
+"""
 _get_contents(nt::NamedTuple) = nt
 function _get_contents(nt::AbstractVector{<:NamedTuple})
     length(nt) == 1 ? nt[1] : __get_contents(nt)