-
Notifications
You must be signed in to change notification settings - Fork 47
Add Two-sided Arnoldi method #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for this; it seems like an impressive amount of work. I will have to look deeper into this, since I am on my way out of the office right now. Feel free to remind me if you haven't heard back in a few days. A first comment: look at Related to this: since this algorithm can not have the exact same arguments and outputs as the other |
|
Great to hear that this may find its way into the library! I've now updated the function signature to be closer to However, in my tests (which I copied from the eigsolve tests), I get fails because Also, the return type of |
|
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #124 +/- ##
==========================================
- Coverage 88.90% 88.70% -0.20%
==========================================
Files 34 36 +2
Lines 3685 3930 +245
==========================================
+ Hits 3276 3486 +210
- Misses 409 444 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@tipfom , are you ok with me pushing some changes directly to your branch? |
|
@Jutho |
|
@Jutho, sorry, I misread your message. Of course you're very much welcome to push to this branch :) |
|
Ok, I've simplified the BiArnoldi factorization and iteration to simply reuse everything from Arnoldi, as I think that was wat it amounted to. It's easier to maintain the code and make improvements in the future if there is less code duplication. In |
|
The eigenvalues are now properly sorted and the eigenvalues properly biorthonormalized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly browsed through this, definitely a very nice PR, thank you for this work!
I left some minor remarks, mostly just cosmetic and minor details about allocations, which I don't think should matter too much. Feel free to ignore if you disagree with anything.
src/eigsolve/biarnoldi.jl
Outdated
| firstunusedT += 1 | ||
| end | ||
| for j in firstunusedT:length(valuesT) | ||
| if !usedvaluesT[j] && isapprox(valuesS[i], conj(valuesT[j])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an idea about what the tolerance should be for this isapprox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went with the default, but I'm open to suggestions! I think if they are reasonably converged then sqrt(eps) is sufficient. Maybe one could also implement a match-as-good-as-possible logic, but the isapprox approach worked fine for me. I'll do more tests how this behaves for larger tolerances.
src/eigsolve/biarnoldi.jl
Outdated
| # For the first case (1.), we use the Ritz values instead of the Rayleigh quotients | ||
| # as suggested by the authors | ||
|
|
||
| # This is Eq. 10 in the paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by the following as the paper in Eq.10 defines this quantity for the Ritz vectors, which are the approximate eigenvectors obtained within the subspace. Here, however, you only have the approximate Schur vectors; eigenvectors are computed later. Maybe that is good enough; I do the same in the regular Arnoldi.
However, another question is that the expression for xh and xk are devided by M[converged+1,converged+1], but that is the overlap of the original v and w vectors before doing the Schur decomposition; M was not transformed with Q and Z.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the rightmost expressions to determine
More generally, I think we may also drop this more complex approach and may just use the plain residuals. This worked for my earlier testings and I simply adjusted this function to be more in line with the paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am certainly perfectly fine with operating solely on the Schur vectors, which is what I also do in the Arnoldi case. Am I correct about the fact that you rather want to use something like Z' * M * Q instead of just M for the denominators of xh and xk. I am happy to make these changes myself, but I did not want to do it before consulting you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read your comment once more. The Ritz values (eigenvalues of H and K) are indeed obtained from the Schur decomposition. But it is about the corresponding "Ritz vector", i.e. the vector c_j. That is the corresponding eigenvector of H, not the Schur vector. But also for single-sided Arnoldi I only use the Schur vector, although I then recompute the norm of the residual for the eigenvectors in the final eigsolve routine. I will implement a similar strategy here and then get back to you for comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the residuals computation and now the returned residuals are a bit closer to what they actually are (I compared
However, I also noticed that in order to use the formula proposed above Eq. (10), we need _schursolve to use the already computed values. Feel free to let me know, what you think! I'm also very open to using a different method to return the correct residuals :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still a bit confused. We start with rV and rW, which are respectively orthogonal to V and W and have respectively norm βv and βw. Now you first take out these norms, such that rV and rW have unit norm, before applying Step 2 where rV and rW are redefined such that they are orthogonal to W and V respectively, i.e. at that point W' * rV = 0 (approximately). This of course changes the norm, such that rV and rW no longer have norm one.
We also have the corresponding h and k row vectors, which initially are just the last basis vector, i.e. unit vectors along the last coordinate axis, except that we now have absorbed the factors βv and βw in them. By doing the Schur decomposition (and reordering), there are now also transformed, and become general vectors corresponding to the last row of Q, but you need to take the non-unit norm into account, which you do in h = mul!(h, view(Q, L, :), βv).
Furthermore, these vectors still appear as rV * h' and rW * k' in the Krylov / partial Schur factorization, and also rV and rW are no longer unit vectors. So when looking at a given column (corresponding to a specific Schur vector), isn't the norm of the residual at that point not xh = norm(rV) * h[columnindex], or thus xh = βrV * h[columnindex] (and analoguously for xk?
I was wondering why you take out the initial norm βv and βw before doing step 2 & 3, as you need to compensate for this again by scaling with that norm again in add!(view(H, :, L), MWv, βv) and in the aforementioned h = mul!(h, view(Q, L, :), βv). Is that for numerical stability, because the norm of rV might be small? It does make it a bit more complicated to reason about the code mathematically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because (as far as I understand) the authors of the original papers follow the convention that the residual is normalized and its norm is absorbed in h and k respectively.
I do however agree that this is not necessary and thus may be inconsistent with the rest of KrylovKit.
I'll add a commit which follows a different convention!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok that I work on the code for a bit now and you don't push further commits at this point? I think actually extracting the norms βv and βw like it was before the last commits is actually fine; I do like it when implementations follow the reference paper. I am just adding some comments to clarify things, and I was mostly concerned that xh requires the extra factor βrV. If it is ok I will do a force push to overwite the last commits and add some changes of my own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's totally fine on my end! I just wanted to help as much as possible :)
Regarding the norms: I've noticed that the updated version with the norms in rV and rW converges a bit faster (400 iterations vs 600 in an example on my disk) than the one in which I manually kept track of the norms. Thus, the original version may contain a bug in which I forgot to update the norm somewhere; this may however be a fluke for this one example. I won't investigate this further and may look at it when you pushed your version, if that's fine!
|
My apologies for the slow progress from my side; it's a busy week. I've left another question in the comments. |
src/eigsolve/biarnoldi.jl
Outdated
| keep = div(3 * krylovdim + 2 * converged, 5) # strictly smaller than krylovdim since converged < howmany <= krylovdim, at least equal to converged | ||
| while keep < krylovdim && | ||
| (isapprox(valuesH[pH[keep]], conj(valuesH[pH[keep + 1]]); rtol=1e-3) || | ||
| isapprox(valuesK[pK[keep]], conj(valuesK[pK[keep + 1]]); rtol=1e-3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason not to simply check !iszero(H[keep+1,keep]) as in arnoldi.jl? Also, I think this is only a problem if H and K are real, i.e. when working in real arithmetic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would work as well. However, I have the following question regarding this approach: in which sense is it reasonable to only run this check if eltype(H) <: Real? Can't I trick this mechanism by supplying a real matrix A as ComplexF64.(A) thus changing the eltype but not the fundamental properties of the matrix?
Regardless, you may replace this code block by the following or something similar if you like this better:
while H[keep + 1, keep] != 0 || K[keep + 1, keep] != 0
# we are in the middle of a 2x2 block; this cannot happen if keep == converged, so we can decrease keep
# however, we have to make sure that we do not end up with keep = 0
if keep > 1
keep -= 1 # conservative choice
else
keep += 1
if krylovdim == 2
alg.verbosity >= WARN_LEVEL &&
@warn "Arnoldi iteration got stuck in a 2x2 block, consider increasing the Krylov dimension"
break
end
end
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this works is that, if the vectors and the linear map are real, the whole computation will happen in real arithmetic except for the final step. It is only in the real arithmetic case, that the Schur factorisation will return a matrix that is not strictly uppertriangular, but quasi-uppertriangular, i.e. where complex eigenvalues (which necessarily appear in complex conjugate pairs) appear in 2x2 diagonal blocks. So the real Schur decomposition returns a matrix that is block uppertriangular with only 1x1 and 2x2 diagonal blocks.
If the computation happens in complex arithmetic, H and K will be strictly upper triangular. Even if everything was real but it was manually converted to complex, still the Schur decomposition result will be different, i.e.
complex.(schur(some_real_matrix)) != schur(complex(some_real_matrix))
So when working in complex arithmetic, the problem will never pose itself. (Of course, in that case, you might be splitting in between a pair of complex conjugate eigenvalues in some arbitrary way, but that is not something the algorithm can control).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's really nice to know! I think my observation regarding the "splitting" of eigenvalue pairs boils down to what you mentioned and was only a symptom. Hence, your suggestion works as well and I'll change the corresponding line.
|
My apologies for the long silence, I was on holiday last week and didn't really get to continue on this. I will try to finish my review / changes from my side this week. |
|
Looking at this once again, I am making some more changes locally and will push them asap. However, I have two more questions:
|
|
Actually, looking more in the computation of the convergence criterion, I don't understand why the authors want to use a relative precision where they divide by rho_j, being the eigenvalue. This doesn't make sense in my opinion. It can be perfectly valid to target eigenvalues that happen to be close to zero, as long as they are extremal. Furthermore, adding identity to an operator changes nothing to the Krylov subspace and thus to the method, but arbitrarily shifts eigenvalues and thus the meaning of relative errors. If you agree with this, I would rather opt for a convergence criterion that only uses absolute norms of residuals in comparison to |
Yes, that is correct. I'm however fine with dropping this prefactor alltogether as you suggested in your other comment. :)
I've only encountered this problem when working with (mathematically) real matrices, i.e., independent of the arithmetic type (complex or real), but those who have the property that both Thats as much as I know. Sadly, I did not find any information on this in the paper. When I tried enforcing this explicit ordering in the Krylov decomposition by adjusting |
|
Ok thanks for the comments; that is very helpful. When working with real arithmetic, a valid |
|
Yes. Exactly due to this fact, the matching logic currently boils down to a linear loop which also normalizes the vectors. I'm also very open to simplifying it more :) |
|
I've now gotten around to using this method some more and implemented non-Hermitian DMRG using it. In this context, I noticed that the concerns raised by @lkdvos about the precision used in isapprox is very important, i.e., for early sweeps the left/right eigenvalues may not match perfectly as is the case with the more constructed examples in the test suite. Currently, the method should at least (i) allow the required precision as a parameter and (ii) not return an error if the algorithm does not converge. Have you @Jutho implemented a different way to return the "correctly" matched eigenvalue pairs? Otherwise, I will think some more about it and commit a suggested fix. |
|
I did some more digging and in SLEPc, they've also implemented the algorithm. However, they also need to run a match loop for the eigenvalues to be sorted properly when using the same LAPACK functionality, cf. here. |
|
My apologies for the silence, I was unavailable all of last week. I will try to pick it up again this week and finally push this over the finish line. |
|
Is there something else I can contribute to help get this finished @Jutho ? 😊 |
|
My apologies for the silence here; I got a bit distracted by other code reviews. This week I am also not actively at work very much, but I will still try to find some time. If not, then definitely early next week I will make time for finishing this. |
|
I've now updated the residual norm convention to be with |
src/eigsolve/biarnoldi.jl
Outdated
|
|
||
| # Partially Step 7 & 8 - Correction of hm and km | ||
| h = mul!(h, view(Q, L, :), 1.0) | ||
| k = mul!(k, view(Z, L, :), 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that actually this part is wrong in the complex case. The rule is h̃ = Q' * h, where h was a vector along the last coordinate axis, either with unit length or in the old case with length βv. Therefore, h̃ will indeed correspond to the last column of Q', or thus, the last row of Q, but with an extra complex conjugation. The same applies to k. I will also fix this myself in a next commit, but want to make sure that you agree with this reasoning?
|
@tipfom , I finally made some time to look at this again, and would like to get this merged before tagging a new release of KrylovKit. However, currently, I get completely wrong results as soon as I did rebase this PR on the current master. Apparently I also had some uncommited changes about the role of the |
|
It seems to be the |
|
The |
|
Alternatively, we may go with commit 35fa97e which achieves the same result by fixing the return type of |
Yes, I also noticed this last night but don't understand why. The |
|
Thanks for the very clear commit history, and for fixing the |
|
Ok, CI is looking good so far, fingers crossed. I'll hit merge when it finishes and then tag a version of KrylovKit later tonight. Thanks for this very nice contribution @tipfom, and my apologies for the long time span over which my review materialized. |
|
Very nice! I think it turned out pretty nicely, thanks for your work :) |
I've added an implementation of the Two-sided Krylov-Schur Restarted Arnoldi method in Ref. [1, Alg. 2] to KrylovKit.jl. The implementation is not done yet and needs some improvements (e.g., unit tests and an update to the convergence criterion).
It is currently only tested for finding the smallest real eigenvalue :SR for dense Julia matrices. The interface is as close as possible to the other methods, i.e., for a dense matrix A calling
eigsolve(A, collect(adjoint(A)), v0, w0, 4, :SR, BiArnoldi(; krylovdim=20, verbosity=10, maxiter=100))yields the 4 smallest real value left and right eigenvalues, vectors and convergence information.
Are you interested in merging this into KrylovKit.jl? If so, what should I additionally add (next to tests and the improved convergence criterion)?
I'm interested in using this method downstream (in ITensors.jl) and hence incorporating this into the package would be very much appreciated. I'm willing to do the extra work required, if it is not too much.
Thanks for your work!
[1] Krylov-Schur-Type Restarts for the Two-Sided Arnoldi Method, Ian N. Zwaan and Michiel E. Hochstenbach