Skip to content

Commit 081654c

Browse files
kellertuerTomRiem
andauthored
Introduce Quasi Newton Methods (#48)
* Introduces a large variety of quasi Newton methods * Introduces two new line search methods for Wolfe conditions Co-authored-by: Tom-Christian Riemer <riemertom720@googlemail.com>
1 parent 871e25b commit 081654c

24 files changed

+1789
-58
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
runs-on: ${{ matrix.os }}
1212
strategy:
1313
matrix:
14-
julia-version: [1.3, 1.5]
14+
julia-version: [1.4, 1.5]
1515
os: [ubuntu-latest, macOS-latest, windows-latest]
1616
steps:
1717
- uses: actions/checkout@v2

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,3 +13,4 @@ examples/**/*.mp4
1313
examples/**/*.jld2
1414
docs/src/tutorials/*.md
1515
.vscode
16+
Manifest.toml

Project.toml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
name = "Manopt"
22
uuid = "0fc0a36d-df90-57f3-8f93-d78a9fc72bb5"
33
authors = ["Ronny Bergmann <manopt@ronnybergmann.net>"]
4-
version = "0.2.14"
4+
version = "0.2.15"
55

66
[deps]
77
ColorSchemes = "35d6a980-a343-548e-a6ea-1d62b119f2f4"
88
ColorTypes = "3da002f7-5984-5a60-b8a6-cbb66c0b333f"
99
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
10+
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
1011
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
1112
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
1213
Manifolds = "1cead3c2-87b3-11e9-0ccd-23c62b72b94e"
@@ -22,10 +23,11 @@ Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
2223
ColorSchemes = "3.5.0"
2324
ColorTypes = "0.9.1, 0.10"
2425
Colors = "0.11.2, 0.12"
25-
Manifolds = "0.4.11"
26+
DataStructures = "0.17, 0.18"
27+
Manifolds = "0.4.14"
2628
ManifoldsBase = "0.10.0"
2729
StaticArrays = "0.12, 1.0"
28-
julia = "1.3"
30+
julia = "1.4"
2931

3032
[extras]
3133
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

docs/make.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,12 @@ makedocs(;
4444
"Gradient Descent" => "solvers/gradient_descent.md",
4545
"Nelder–Mead" => "solvers/NelderMead.md",
4646
"Particle Swarm Optimization" => "solvers/particle_swarm.md",
47+
"Quasi-Newton" => "solvers/quasi_Newton.md",
4748
"Stochastic Gradient Descent" => "solvers/stochastic_gradient_descent.md",
4849
"Subgradient method" => "solvers/subgradient.md",
4950
"Steihaug-Toint TCG Method" =>
5051
"solvers/truncated_conjugate_gradient_descent.md",
51-
"Riemannian Trust-Regions Solver" => "solvers/trust_regions.md",
52+
"Trust-Regions Solver" => "solvers/trust_regions.md",
5253
],
5354
"Functions" => [
5455
"Introduction" => "functions/index.md",

docs/src/solvers/ChambollePock.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ CurrentModule = Manopt
5656

5757
```@docs
5858
ChambollePock
59+
ChambollePock!
5960
```
6061

6162
## Problem & Options

docs/src/solvers/gradient_descent.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ GradientDescentOptions
1818

1919
## Direction Update Rules
2020

21-
A field of the options is the `direction`, a [`DirectionUpdateRule`](@ref), which by default [`Gradient`](@ref) just evaluates the gradient but can be enhanced for example to
21+
A field of the options is the `direction`, a [`DirectionUpdateRule`](@ref), which by default [`IdentityUpdateRule`](@ref) just evaluates the gradient but can be enhanced for example to
2222

2323
```@docs
2424
DirectionUpdateRule
25-
Gradient
25+
IdentityUpdateRule
2626
MomentumGradient
2727
AverageGradient
2828
Nesterov

docs/src/solvers/quasi_Newton.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# [Riemannian quasi-Newton methods](@id quasiNewton)
2+
3+
```@meta
4+
CurrentModule = Manopt
5+
```
6+
7+
```@docs
8+
quasi_Newton
9+
quasi_Newton!
10+
```
11+
12+
## Background
13+
14+
The aim is to minimize a real-valued function on a Riemannian manifold, i.e.
15+
16+
```math
17+
\min f(x), \quad x \in \mathcal{M}.
18+
```
19+
20+
Riemannian quasi-Newtonian methods are as generalizations of their Euclidean counterparts Riemannian line search methods. These methods determine a search direction ``η_k ∈ T_{x_k} \mathcal{M}`` at the current iterate ``x_k`` and a suitable stepsize ``α_k`` along ``\gamma(α) = R_{x_k}(α η_k)``, where ``R \colon T \mathcal{M} \to \mathcal{M}`` is a retraction. The next iterate is obtained by
21+
22+
```math
23+
x_{k+1} = R_{x_k}(α_k η_k).
24+
```
25+
26+
In quasi-Newton methods, the search direction is given by
27+
28+
```math
29+
η_k = -{\mathcal{H}_k}^{-1}[∇ f (x_k)] = -\mathcal{B}_k [∇f (x_k)],
30+
```
31+
32+
where ``\mathcal{H}_k \colon T_{x_k} \mathcal{M} \to T_{x_k} \mathcal{M}`` is a positive definite self-adjoint operator, which approximates the action of the Hessian ``\operatorname{Hess} f (x_k)[\cdot]`` and ``\mathcal{B}_k = {\mathcal{H}_k}^{-1}``. The idea of quasi-Newton methods is instead of creating a complete new approximation of the Hessian operator ``\operatorname{Hess} f(x_{k+1})`` or its inverse at every iteration, the previous operator ``\mathcal{H}_k`` or ``\mathcal{B}_k`` is updated by a convenient formula using the obtained information about the curvature of the objective function during the iteration. The resulting operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` acts on the tangent space ``T_{x_{k+1}} \mathcal{M}`` of the freshly computed iterate ``x_{k+1}``.
33+
In order to get a well-defined method, the following requirements are placed on the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` that is created by an update. Since the Hessian ``\operatorname{Hess} f(x_{k+1})`` is a self-adjoint operator on the tangent space ``T_{x_{k+1}} \mathcal{M}``, and ``\mathcal{H}_{k+1}`` approximates it, we require that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is also self-adjoint on ``T_{x_{k+1}} \mathcal{M}``. In order to achieve a steady descent, we want ``η_k`` to be a descent direction in each iteration. Therefore we require, that ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` is a positive definite operator on ``T_{x_{k+1}} \mathcal{M}``. In order to get information about the cruvature of the objective function into the new operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``, we require that it satisfies a form of a Riemannian quasi-Newton equation:
34+
35+
```math
36+
\mathcal{H}_{k+1} [T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))] = ∇f(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(∇f(x_k))
37+
```
38+
39+
or
40+
41+
```math
42+
\mathcal{B}_{k+1} [∇f(x_{k+1}) - T_{x_k \rightarrow x_{k+1}}(∇f(x_k))] = T_{x_k \rightarrow x_{k+1}}({R_{x_k}}^{-1}(x_{k+1}))
43+
```
44+
45+
where ``T_{x_k \rightarrow x_{k+1}} \colon T_{x_k} \mathcal{M} \to T_{x_{k+1}} \mathcal{M}`` and the chosen retraction ``R`` is the associated retraction of ``T``. We note that, of course, not all updates in all situations will meet these conditions in every iteration.
46+
For specific quasi-Newton updates, the fulfilment of the Riemannian curvature condition, which requires that
47+
48+
```math
49+
g_{x_{k+1}}(s_k, y_k) > 0
50+
```
51+
52+
holds, is a requirement for the inheritance of the self-adjointness and positive definiteness of the ``\mathcal{H}_k`` or ``\mathcal{B}_k`` to the operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}``. Unfortunately, the fulfillment of the Riemannian curvature condition is not given by a step size ```\alpha_k > 0`` that satisfies the generalised Wolfe conditions. However, in order to create a positive definite operator ``\mathcal{H}_{k+1}`` or ``\mathcal{B}_{k+1}`` in each iteration, in [^HuangGallivanAbsil2015] the so-called locking condition was introduced, which requires that the isometric vector transport ``T^S``, which is used in the update formula, and its associate retraction ``R`` fulfill
53+
54+
```math
55+
T^{S}{x, \xi_x}(\xi_x) = \beta T^{R}{x, \xi_x}(\xi_x), \quad \beta = \frac{\lVert \xi_x \rVert_x}{\lVert T^{R}{x, \xi_x}(\xi_x) \rVert_{R_{x}(\xi_x)}},
56+
```
57+
58+
where ``T^R`` is the vector transport by differentiated retraction. With the requirement that the isometric vector transport ``T^S`` and its associated retraction ``R`` satisfies the locking condition and using the tangent vector
59+
60+
```math
61+
y_k = {\beta_k}^{-1} ∇f(x_{k+1}) - T^{S}{x_k, α_k η_k}(∇f(x_k)),
62+
```
63+
64+
where
65+
66+
```math
67+
\beta_k = \frac{\lVert α_k η_k \rVert_{x_k}}{\lVert T^{R}{x_k, α_k η_k}(α_k η_k) \rVert_{x_{k+1}}},
68+
```
69+
70+
in the update, it can be shown that choosing a stepsize ``α_k > 0`` that satisfies the Riemannian wolfe conditions leads to the fulfilment of the Riemannian curvature condition, which in turn implies that the operator generated by the updates is positive definite.
71+
In the following we denote the specific operators in matrix notation and hence use ``H_k`` and ``B_k``, respectively.
72+
73+
## Direction Updates
74+
75+
In general there are different ways to compute a fixed [`AbstractQuasiNewtonUpdateRule`](@ref).
76+
In general these are represented by
77+
78+
```@docs
79+
AbstractQuasiNewtonDirectionUpdate
80+
QuasiNewtonMatrixDirectionUpdate
81+
QuasiNewtonLimitedMemoryDirectionUpdate
82+
QuasiNewtonCautiousDirectionUpdate
83+
```
84+
85+
## Hessian Update Rules
86+
87+
Using
88+
89+
```@docs
90+
update_hessian!
91+
```
92+
93+
the following update formulae for either ``H_{k+1}`` or `` B_{k+1}`` are available.
94+
95+
```@docs
96+
AbstractQuasiNewtonUpdateRule
97+
BFGS
98+
DFP
99+
Broyden
100+
SR1
101+
InverseBFGS
102+
InverseDFP
103+
InverseBroyden
104+
InverseSR1
105+
```
106+
107+
## Options
108+
109+
The quasi Newton algorithm is based on a [`GradientProblem`](@ref).
110+
111+
```@docs
112+
QuasiNewtonOptions
113+
```
114+
115+
## Literature
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
using Manifolds, Manopt, Random, LinearAlgebra, BenchmarkTools, Profile
2+
import Manifolds: vector_transport_to!
3+
vector_transport_to!(M::Stiefel, Y, p, X, q, ::ProjectionTransport) = project!(M, Y, q, X)
4+
5+
struct GradF
6+
A::Matrix{Float64}
7+
N::Diagonal{Float64,Vector{Float64}}
8+
end
9+
function (∇F::GradF)(X::Array{Float64,2})
10+
AX = ∇F.A * X
11+
XpAX = X' * AX
12+
return 2 .* AX * ∇F.N .- X * XpAX * ∇F.N .- X * ∇F.N * XpAX
13+
end
14+
15+
function run_brocket_experiment(n::Int, k::Int, m::Int; seed=42)
16+
Random.seed!(42)
17+
M = Stiefel(n, k)
18+
A = randn(n, n)
19+
A = (A + A') / 2
20+
F(X::Array{Float64,2}) = tr((X' * A * X) * Diagonal(k:-1:1))
21+
∇F = GradF(A, Diagonal(Float64.(collect(k:-1:1))))
22+
x = random_point(M)
23+
return quasi_Newton(
24+
M,
25+
F,
26+
∇F,
27+
x;
28+
memory_size=m,
29+
vector_transport_method=ProjectionTransport(),
30+
retraction_method=QRRetraction(),
31+
stopping_criterion=StopWhenGradientNormLess(norm(M, x, ∇F(x)) * 10^(-6)),
32+
cautious_update=true,
33+
# debug = [:Iteration," ", :Cost, " ", DebugGradientNorm(), "\n", 10],
34+
)
35+
end
36+
37+
io = IOBuffer()
38+
39+
for e in [
40+
(32, 32, 1),
41+
(32, 32, 2),
42+
(32, 32, 4),
43+
(32, 32, 8),
44+
(32, 32, 16),
45+
(32, 32, 32),
46+
(1000, 2, 4),
47+
(1000, 3, 4),
48+
(1000, 4, 4),
49+
(1000, 5, 4),
50+
]
51+
println("Benchmarking $(e):")
52+
b = @benchmark run_brocket_experiment($(e[1]), $(e[2]), $(e[3])) samples = 50
53+
#run_brocket_experiment(e[1], e[2], e[3])
54+
show(io, "text/plain", b)
55+
s = String(take!(io))
56+
println(s, "\n\n")
57+
end
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#
2+
# The joint diagonalization problem on the Stiefel manifold St(n,k)
3+
#
4+
using Manopt, Manifolds, ManifoldsBase, LinearAlgebra, Random
5+
import Manifolds: vector_transport_to!
6+
struct IdentityTransport <: AbstractVectorTransportMethod end
7+
vector_transport_to!(::Stiefel, Y, p, X, q, ::IdentityTransport) = (Y .= project(M, q, X))
8+
Random.seed!(42)
9+
n = 12
10+
k = 8
11+
m = 512
12+
A = randn(n, n, m)
13+
14+
for i in 1:m
15+
A[:, :, i] = diagm(n:-1:1) + 0.1 * (transpose(A[:, :, i]) + A[:, :, i])
16+
end
17+
18+
M = Stiefel(n, k)
19+
F(X::Array{Float64,2}) = -sum([norm(diag(X' * A[:, :, i] * X))^2 for i in 1:m])
20+
function ∇F(X::Array{Float64,2})
21+
return project(
22+
M, X, -4 * sum([A[:, :, i] * X * norm(diag(X' * A[:, :, i] * X)) for i in 1:m])
23+
)
24+
end
25+
x = random_point(M)
26+
@time quasi_Newton(
27+
M,
28+
F,
29+
∇F,
30+
x;
31+
memory_size=32,
32+
cautious_update=true,
33+
vector_transport_method=IdentityTransport(),
34+
stopping_criterion=StopWhenGradientNormLess(norm(M, x, ∇F(x)) * 10^(-6)),
35+
debug=[:Iteration, " ", :Cost, "\n", 1, :Stop],
36+
)
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#
2+
# Posituve Definite Karcher Mean (Matlab Manopt)
3+
#
4+
using Manopt, Manifolds, ManifoldsBase, LinearAlgebra, Random
5+
Random.seed!(42)
6+
n = 5
7+
m = 5
8+
M = SymmetricPositiveDefinite(n)
9+
x = random_point(M)
10+
A = [random_point(M) for _ in 1:m]
11+
A = [Symmetric(a) for a in A]
12+
F(X::Array{Float64,2}) = sum([distance(M, X, B)^2 for B in A]) / (2 * m)
13+
∇F(X::Array{Float64,2}) = -sum([log(M, X, B) for B in A]) / m
14+
15+
@time quasi_Newton(
16+
M,
17+
F,
18+
∇F,
19+
x;
20+
memory_size=100,
21+
stopping_criterion=StopWhenGradientNormLess(norm(M, x, ∇F(x)) * 10^(-6)),
22+
debug=[:Iteration, " ", :Cost, "\n", 1, :Stop],
23+
)
24+
25+
# B1 = quasi_Newton(M,F,∇F,x; memory_size = 100, debug = [:Iteration, " ", :Cost, "\n", 1, :Stop])
26+
# B2 = mean(M,A)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
using Manopt, Manifolds, ManifoldsBase, Random, LinearAlgebra, BenchmarkTools
2+
Random.seed!(42)
3+
4+
function run_rayleigh_experiment(n::Int)
5+
A = randn(n, n)
6+
A = (A + A') / 2
7+
F(X::Array{Float64,1}) = X' * A * X
8+
∇F(X::Array{Float64,1}) = 2 * (A * X - X * X' * A * X)
9+
M = Sphere(n - 1)
10+
x = random_point(M)
11+
return quasi_Newton(
12+
M,
13+
F,
14+
∇F,
15+
x;
16+
#memory_size=-1,
17+
stopping_criterion=StopWhenAny(
18+
StopAfterIteration(max(1000)), StopWhenGradientNormLess(10^(-6))
19+
),
20+
debug=[:Iteration, " ", :Cost, "\n", 1, :Stop],
21+
)
22+
end
23+
io = IOBuffer()
24+
25+
for n in [100]
26+
b = @benchmark run_rayleigh_experiment($n) samples = 30
27+
show(io, "text/plain", b)
28+
s = String(take!(io))
29+
println("Benchmarking $(n):\n", s, "\n\n")
30+
end
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#
2+
# Thomson Problem on Oblique(n,m)
3+
#
4+
using Manopt, Manifolds, ManifoldsBase, LinearAlgebra, Random
5+
Random.seed!(42)
6+
n = 50
7+
m = 20
8+
M = Oblique(n, m)
9+
10+
function F(X::Array{Float64,2})
11+
f = 0
12+
for i in 1:m
13+
for j in 1:m
14+
if i != j
15+
f = f + 1 / (norm(X[:, i] - X[:, j])^2)
16+
end
17+
end
18+
end
19+
return f
20+
end
21+
22+
function ∇F(X::Array{Float64,2})
23+
g = zeros(n, m)
24+
Id = Matrix(I, n, n)
25+
for i in 1:m
26+
f = zeros(n, 1)
27+
for j in 1:m
28+
if i != j
29+
f = f + 1 / (1.0 - X[:, i]' * X[:, j]) * X[:, j]
30+
end
31+
end
32+
g[:, i] = (Id - X[:, i] * X[:, i]') * f
33+
end
34+
return g
35+
end
36+
37+
x = random_point(M)
38+
39+
@time quasi_Newton(
40+
M,
41+
F,
42+
∇F,
43+
x;
44+
memory_size=100,
45+
vector_transport_method=PowerVectorTransport(ParallelTransport()),
46+
stopping_criterion=StopWhenGradientNormLess(norm(M, x, ∇F(x)) * 10^(-6)),
47+
debug=[:Iteration, " ", :Cost, "\n", 1, :Stop],
48+
)

0 commit comments

Comments
 (0)