Skip to content

Commit 13fc61f

Browse files
author
Andy Ferris
committed
Fixed 3x3 eigendecomposition
Now using a more robust eigenvector solver
1 parent cb5552a commit 13fc61f

File tree

5 files changed

+267
-91
lines changed

5 files changed

+267
-91
lines changed

LICENSE.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
The StaticArrays.jl package is licensed under the MIT "Expat" License:
1+
The majority of the StaticArrays.jl package is licensed under the MIT "Expat"
2+
License:
23

34
> Copyright (c) 2016: Andy Ferris.
45
>
@@ -20,3 +21,6 @@ The StaticArrays.jl package is licensed under the MIT "Expat" License:
2021
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
2122
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
2223
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
24+
25+
However, a small part of the 3x3 eigenvalue solver is under the Boost Software
26+
License - see src/eigen.jl

README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,14 @@ than `Base.Array`. See this simplified benchmark (or see the full results [here]
3232
Benchmarks for 3×3 Float64 matrices
3333
============================================
3434
35-
Matrix multiplication -> 8.2x speedup
36-
Matrix multiplication (mutating) -> 3.1x speedup
37-
Matrix addition -> 45x speedup
38-
Matrix addition (mutating) -> 5.1x speedup
39-
Matrix determinant -> 170x speedup
40-
Matrix inverse -> 125x speedup
41-
Matrix symmetric eigenvalue -> 105x speedup
42-
Matrix Cholesky decomposition -> 23.6x speedup
35+
Matrix multiplication -> 8.2x speedup
36+
Matrix multiplication (mutating) -> 3.1x speedup
37+
Matrix addition -> 45x speedup
38+
Matrix addition (mutating) -> 5.1x speedup
39+
Matrix determinant -> 170x speedup
40+
Matrix inverse -> 125x speedup
41+
Matrix symmetric eigendecomposition -> 82x speedup
42+
Matrix Cholesky decomposition -> 23.6x speedup
4343
```
4444

4545
(Run with `julia -O3` for even faster SIMD code with immutable static arrays!)

perf/bench11.txt

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
2 December 2016
2+
3+
=====================================
4+
Benchmarks for 3×3 matrices
5+
=====================================
6+
7+
Matrix multiplication
8+
---------------------
9+
Array -> 1.871689 seconds (74.07 M allocations: 6.623 GB, 8.33% gc time)
10+
SArray -> 0.245292 seconds (5 allocations: 240 bytes)
11+
MArray -> 0.975676 seconds (37.04 M allocations: 2.759 GB, 5.06% gc time)
12+
SizedArray -> 2.117939 seconds (74.07 M allocations: 6.071 GB, 8.77% gc time)
13+
14+
Matrix multiplication (mutating)
15+
--------------------------------
16+
Array -> 1.275434 seconds (6 allocations: 480 bytes)
17+
MArray -> 0.437265 seconds (7 allocations: 400 bytes)
18+
SizedArray -> 0.679027 seconds (7 allocations: 416 bytes)
19+
20+
Matrix addition
21+
---------------
22+
Array -> 1.447606 seconds (44.44 M allocations: 3.974 GB, 7.38% gc time)
23+
SArray -> 0.030649 seconds (5 allocations: 240 bytes)
24+
MArray -> 0.311798 seconds (22.22 M allocations: 1.656 GB, 8.91% gc time)
25+
SizedArray -> 1.089546 seconds (44.44 M allocations: 3.643 GB, 10.15% gc time)
26+
27+
Matrix addition (mutating)
28+
--------------------------
29+
Array -> 0.492890 seconds (5 allocations: 320 bytes)
30+
MArray -> 0.093319 seconds (5 allocations: 240 bytes)
31+
SizedArray -> 0.145942 seconds (6 allocations: 336 bytes)
32+
33+
Matrix determinant
34+
------------------
35+
Array -> 10.574721 seconds (222.22 M allocations: 12.694 GB, 7.38% gc time)
36+
SArray -> 0.088204 seconds (4 allocations: 160 bytes)
37+
MArray -> 0.087665 seconds (4 allocations: 160 bytes)
38+
SizedArray -> 0.109605 seconds (4 allocations: 160 bytes)
39+
40+
Matrix inverse
41+
--------------
42+
Array -> 41.495036 seconds (407.41 M allocations: 82.232 GB, 14.90% gc time)
43+
SArray -> 0.371094 seconds (4 allocations: 160 bytes)
44+
MArray -> 0.671434 seconds (37.04 M allocations: 2.759 GB, 6.75% gc time)
45+
SizedArray -> 2.078264 seconds (74.07 M allocations: 6.071 GB, 8.45% gc time)
46+
47+
Matrix symmetric eigenvalue (82x speedup)
48+
-----------------------------------------
49+
Array ->444.241336 seconds (740.74 M allocations: 89.407 GB, 1.45% gc time)
50+
SArray -> 5.857789 seconds (5 allocations: 256 bytes)
51+
MArray -> 5.658499 seconds (6 allocations: 272 bytes)
52+
SizedArray -> 5.448629 seconds (6 allocations: 208 bytes)
53+
54+
Matrix Cholesky
55+
---------------
56+
Array -> 8.080100 seconds (222.22 M allocations: 9.934 GB, 6.10% gc time)
57+
SArray -> 0.341636 seconds (5 allocations: 256 bytes)
58+
MArray -> 0.814622 seconds (37.04 M allocations: 2.759 GB, 5.70% gc time)
59+
SizedArray -> 2.131756 seconds (74.07 M allocations: 6.071 GB, 10.03% gc time)

src/eigen.jl

Lines changed: 170 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -83,105 +83,193 @@ end
8383
end
8484
end
8585

86-
# TODO fix for complex case
86+
# A small part of the code in the following method was inspired by works of David
87+
# Eberly, Geometric Tools LLC, in code released under the Boost Software
88+
# License (included at the end of this file).
8789
@inline function _eig{T<:Real}(::Size{(3,3)}, A::Base.LinAlg.RealHermSymComplexHerm{T}, permute, scale)
8890
S = typeof((one(T)*zero(T) + zero(T))/one(T))
91+
Sreal = real(S)
8992

90-
uplo = A.uplo
91-
data = A.data
92-
if uplo == 'U'
93-
@inbounds Afull = SMatrix{3,3}(data[1], data[4], data[7], data[4], data[5], data[8], data[7], data[8], data[9])
93+
@inbounds a11 = convert(Sreal, A.data[1])
94+
@inbounds a22 = convert(Sreal, A.data[5])
95+
@inbounds a33 = convert(Sreal, A.data[9])
96+
if A.uplo == 'U'
97+
@inbounds a12 = convert(S, A.data[4])
98+
@inbounds a13 = convert(S, A.data[7])
99+
@inbounds a23 = convert(S, A.data[8])
94100
else
95-
@inbounds Afull = SMatrix{3,3}(data[1], data[2], data[3], data[2], data[5], data[6], data[3], data[6], data[9])
101+
@inbounds a12 = conj(convert(S, A.data[2]))
102+
@inbounds a13 = conj(convert(S, A.data[3]))
103+
@inbounds a23 = conj(convert(S, A.data[6]))
96104
end
97105

98-
# Adapted from Wikipedia
99-
@inbounds p1 = Afull[4]*Afull[4] + Afull[7]*Afull[7] + Afull[8]*Afull[8]
106+
p1 = abs2(a12) + abs2(a13) + abs2(a23)
100107
if (p1 == 0)
101-
# Afull is diagonal.
102-
@inbounds eig1 = Afull[1]
103-
@inbounds eig2 = Afull[5]
104-
@inbounds eig3 = Afull[9]
108+
# Matrix is diagonal
109+
# TODO need to sort the eigenvalues
110+
return (SVector(a11, a22, a33), eye(SMatrix{3,3,S}))
111+
end
112+
113+
q = (a11 + a22 + a33) / 3
114+
p2 = abs2(a11 - q) + abs2(a22 - q) + abs2(a33 - q) + 2 * p1
115+
p = sqrt(p2 / 6)
116+
invp = inv(p)
117+
b11 = (a11 - q) * invp
118+
b22 = (a22 - q) * invp
119+
b33 = (a33 - q) * invp
120+
b12 = a12 * invp
121+
b13 = a13 * invp
122+
b23 = a23 * invp
123+
B = SMatrix{3,3,S}((b11, conj(b12), conj(b13), b12, b22, conj(b23), b13, b23, b33))
124+
r = real(det(B)) / 2
125+
126+
# In exact arithmetic for a symmetric matrix -1 <= r <= 1
127+
# but computation error can leave it slightly outside this range.
128+
if (r <= -1)
129+
phi = Sreal(pi) / 3
130+
elseif (r >= 1)
131+
phi = zero(Sreal)
132+
else
133+
phi = acos(r) / 3
134+
end
135+
136+
eig3 = q + 2 * p * cos(phi)
137+
eig1 = q + 2 * p * cos(phi + (2*Sreal(pi)/3))
138+
eig2 = 3 * q - eig1 - eig3 # since trace(A) = eig1 + eig2 + eig3
139+
140+
if r > 0 # Helps with conditioning the eigenvector calculation
141+
(eig1, eig3) = (eig3, eig1)
142+
end
105143

106-
return (SVector{3,S}(eig1, eig2, eig3), eye(SMatrix{3,3,S}))
144+
# Calculate the first eigenvector
145+
# This should be orthogonal to these three rows of A - eig1*I
146+
# Use all combinations of cross products and choose the "best" one
147+
r₁ = SVector(a11 - eig1, a12, a13)
148+
r₂ = SVector(conj(a12), a22 - eig1, a23)
149+
r₃ = SVector(conj(a13), conj(a23), a33 - eig1)
150+
n₁ = sumabs2(r₁)
151+
n₂ = sumabs2(r₂)
152+
n₃ = sumabs2(r₃)
153+
154+
r₁₂ = r₁ × r₂
155+
r₂₃ = r₂ × r₃
156+
r₃₁ = r₃ × r₁
157+
n₁₂ = sumabs2(r₁₂)
158+
n₂₃ = sumabs2(r₂₃)
159+
n₃₁ = sumabs2(r₃₁)
160+
161+
# we want best angle so we put all norms on same footing
162+
# (cheaper to multiply by third nᵢ rather than divide by the two involved)
163+
if n₁₂ * n₃ > n₂₃ * n₁
164+
if n₁₂ * n₃ > n₃₁ * n₂
165+
eigvec1 = r₁₂ / sqrt(n₁₂)
166+
else
167+
eigvec1 = r₃₁ / sqrt(n₃₁)
168+
end
107169
else
108-
q = trace(Afull)/3
109-
@inbounds p2 = (Afull[1] - q)^2 + (Afull[5] - q)^2 + (Afull[9] - q)^2 + 2 * p1
110-
p = sqrt(p2 / 6)
111-
B = (1 / p) * (Afull - UniformScaling(q)) # q*I
112-
r = det(B) / 2
113-
114-
# In exact arithmetic for a symmetric matrix -1 <= r <= 1
115-
# but computation error can leave it slightly outside this range.
116-
if (r <= -1)
117-
phi = S(pi) / 3
118-
elseif (r >= 1)
119-
phi = zero(S)
170+
if n₂₃ * n₁ > n₃₁ * n₂
171+
eigvec1 = r₂₃ / sqrt(n₂₃)
120172
else
121-
phi = acos(r) / 3
173+
eigvec1 = r₃₁ / sqrt(n₃₁)
122174
end
175+
end
123176

124-
# the eigenvalues satisfy eig1 <= eig2 <= eig3
125-
eig3 = q + 2 * p * cos(phi)
126-
eig1 = q + 2 * p * cos(phi + (2*S(pi)/3))
127-
eig2 = 3 * q - eig1 - eig3 # since trace(Afull) = eig1 + eig2 + eig3
128-
129-
# Now get the eigenvectors
130-
131-
# To avoid problems with double degeneracies, we tackle the most distinct
132-
# eigenvalue first
133-
if eig2 - eig1 > eig3 - eig2
134-
# The first eigenvalue is "most distinct"
135-
@inbounds tmp1 = SVector(Afull[1] - eig3, Afull[2], Afull[3])
136-
@inbounds tmp2 = SVector(Afull[4], Afull[5] - eig3, Afull[6])
137-
v3 = cross(tmp1, tmp2)
138-
n3 = vecnorm(v3)
139-
v3 = v3 / n3
140-
141-
# Find the second one from this one
142-
@inbounds tmp3 = normalize(SVector(Afull[1] - eig2, Afull[2], Afull[3]))
143-
@inbounds tmp4 = normalize(SVector(Afull[4], Afull[5] - eig2, Afull[6]))
144-
v2_1 = cross(tmp3, v3)
145-
v2_2 = cross(tmp4, v3)
146-
n2_1 = vecnorm(v2_1)
147-
n2_2 = vecnorm(v2_2)
148-
if n2_1 > n2_2
149-
v2 = v2_1 / n2_1
150-
else
151-
v2 = v2_2 / n2_2
152-
end
177+
# Calculate the second eigenvector
178+
# This should be orthogonal to the previous eigenvector and the three
179+
# rows of A - eig2*I. However, we need to "solve" the remaining 2x2 subspace
180+
# problem in case the cross products are identically or nearly zero
153181

154-
# The third is easy
155-
v1 = cross(v2, v3) # should be normalized already
182+
# The remaing 2x2 subspace is:
183+
@inbounds if abs(eigvec1[1]) < abs(eigvec1[2]) # safe to set one component to zero, depending on this
184+
orthogonal1 = SVector(-eigvec1[3], zero(S), eigvec1[1]) / sqrt(abs2(eigvec1[1]) + abs2(eigvec1[3]))
185+
else
186+
orthogonal1 = SVector(zero(S), eigvec1[3], -eigvec1[2]) / sqrt(abs2(eigvec1[2]) + abs2(eigvec1[3]))
187+
end
188+
orthogonal2 = eigvec1 × orthogonal1
156189

157-
@inbounds return (SVector((eig1, eig2, eig3)), SMatrix{3,3}((v1[1], v1[2], v1[3], v2[1], v2[2], v2[3], v3[1], v3[2], v3[3])))
158-
else
159-
# The third eigenvalue is "most distinct"
160-
@inbounds tmp1 = SVector(Afull[1] - eig1, Afull[2], Afull[3])
161-
@inbounds tmp2 = SVector(Afull[4], Afull[5] - eig1, Afull[6])
162-
v1 = cross(tmp1, tmp2)
163-
n1 = vecnorm(v1)
164-
v1 = v1 / n1
165-
166-
# Find the second one from this one
167-
@inbounds tmp3 = normalize(SVector(Afull[1] - eig2, Afull[2], Afull[3]))
168-
@inbounds tmp4 = normalize(SVector(Afull[4], Afull[5] - eig2, Afull[6]))
169-
v2_1 = cross(tmp3, v1)
170-
v2_2 = cross(tmp4, v1)
171-
n2_1 = vecnorm(v2_1)
172-
n2_2 = vecnorm(v2_2)
173-
if n2_1 > n2_2
174-
v2 = v2_1 / n2_1
175-
else
176-
v2 = v2_2 / n2_2
177-
end
190+
# The projected 2x2 eigenvalue problem is C x = 0 where C is the projection
191+
# of (A - eig2*I) onto the subspace {orthogonal1, orthogonal2}
192+
@inbounds a_orth1_1 = a11 * orthogonal1[1] + a12 * orthogonal1[2] + a13 * orthogonal1[3]
193+
@inbounds a_orth1_2 = conj(a12) * orthogonal1[1] + a22 * orthogonal1[2] + a23 * orthogonal1[3]
194+
@inbounds a_orth1_3 = conj(a13) * orthogonal1[1] + conj(a23) * orthogonal1[2] + a33 * orthogonal1[3]
178195

179-
# The third is easy
180-
v3 = cross(v1, v2) # should be normalized already
196+
@inbounds a_orth2_1 = a11 * orthogonal2[1] + a12 * orthogonal2[2] + a13 * orthogonal2[3]
197+
@inbounds a_orth2_2 = conj(a12) * orthogonal2[1] + a22 * orthogonal2[2] + a23 * orthogonal2[3]
198+
@inbounds a_orth2_3 = conj(a13) * orthogonal2[1] + conj(a23) * orthogonal2[2] + a33 * orthogonal2[3]
181199

182-
@inbounds return (SVector((eig1, eig2, eig3)), SMatrix{3,3}((v1[1], v1[2], v1[3], v2[1], v2[2], v2[3], v3[1], v3[2], v3[3])))
200+
@inbounds c11 = conj(orthogonal1[1])*a_orth1_1 + conj(orthogonal1[2])*a_orth1_2 + conj(orthogonal1[3])*a_orth1_3 - eig2
201+
@inbounds c12 = conj(orthogonal1[1])*a_orth2_1 + conj(orthogonal1[2])*a_orth2_2 + conj(orthogonal1[3])*a_orth2_3
202+
@inbounds c22 = conj(orthogonal2[1])*a_orth2_1 + conj(orthogonal2[2])*a_orth2_2 + conj(orthogonal2[3])*a_orth2_3 - eig2
203+
204+
# Solve this robustly (some values might be small or zero)
205+
c11² = abs2(c11)
206+
c12² = abs2(c12)
207+
c22² = abs2(c22)
208+
if c11² >= c22²
209+
if c11² > 0 || c12² > 0
210+
if c11² >= c12²
211+
tmp = c12 / c11 # TODO check for compex input
212+
p2 = inv(sqrt(1 + abs2(tmp)))
213+
p1 = tmp * p2
214+
else
215+
tmp = c11 / c12 # TODO check for compex input
216+
p1 = inv(sqrt(1 + abs2(tmp)))
217+
p2 = tmp * p1
218+
end
219+
eigvec2 = p1*orthogonal1 - p2*orthogonal2
220+
else # c11 == 0 && c12 == 0 && c22 == 0 (smaller than c11)
221+
eigvec2 = orthogonal1
222+
end
223+
else
224+
if c22² >= c12²
225+
tmp = c12 / c22 # TODO check for compex input
226+
p1 = inv(sqrt(1 + abs2(tmp)))
227+
p2 = tmp * p1
228+
else
229+
tmp = c22 / c12 # TODO check for compex input
230+
p2 = inv(sqrt(1 + abs2(tmp)))
231+
p1 = tmp * p2
183232
end
233+
eigvec2 = p1*orthogonal1 - p2*orthogonal2
234+
end
235+
236+
# The third eigenvector is a simple cross product of the other two
237+
eigvec3 = eigvec1 × eigvec2 # should be normalized already
184238

185-
@inbounds return (SVector((eig1, eig2, eig3)), SMatrix{3,3}((v1[1], v1[2], v1[3], v2[1], v2[2], v2[3], v3[1], v3[2], v3[3])))
239+
# Sort them back to the original ordering, if necessary
240+
if r > 0
241+
(eig1, eig3) = (eig3, eig1)
242+
(eigvec1, eigvec3) = (eigvec3, eigvec1)
186243
end
244+
245+
return (SVector(eig1, eig2, eig3), hcat(eigvec1, eigvec2, eigvec3))
187246
end
247+
248+
# NOTE: The following Boost Software License applies to parts of the method:
249+
# _eig{T<:Real}(::Size{(3,3)}, A::Base.LinAlg.RealHermSymComplexHerm{T}, permute, scale)
250+
251+
#=
252+
Boost Software License - Version 1.0 - August 17th, 2003
253+
254+
Permission is hereby granted, free of charge, to any person or organization
255+
obtaining a copy of the software and accompanying documentation covered by
256+
this license (the "Software") to use, reproduce, display, distribute,
257+
execute, and transmit the Software, and to prepare derivative works of the
258+
Software, and to permit third-parties to whom the Software is furnished to
259+
do so, all subject to the following:
260+
261+
The copyright notices in the Software and this entire statement, including
262+
the above license grant, this restriction and the following disclaimer,
263+
must be included in all copies of the Software, in whole or in part, and
264+
all derivative works of the Software, unless such copies or derivative
265+
works are solely in the form of machine-executable object code generated by
266+
a source language processor.
267+
268+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
269+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
270+
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
271+
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
272+
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
273+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
274+
DEALINGS IN THE SOFTWARE.
275+
=#

0 commit comments

Comments
 (0)