Description
Hello developers!
I found that functions dsyev
and dsyevx
seems not fully paralleled, when
- compiled by gcc (11.4/12.3)
- AMD (16 cores @ Ryzen 7945HX of laptop / 2 cores @ EPYC 7763 of github action)
- make options
TARGET=ZEN USE_64BITINT=1 DYNAMIC_ARCH=1 NO_CBLAS=0 NO_LAPACK=0 NO_LAPACKE=0 NO_AFFINITY=1 USE_OPENMP=1
Preliminary testing on Intel CPU may also show similar problem.
I'm not sure whether if it's the problem of make configurations, or OpenBLAS currently not fully implemented parallel version of dsyev
and dsyevx
.
Hope to hear any thoughts or advices, and thanks in advance!
I guess that dsyevr
and dsyevd
could be better replacements to dsyev
. dsyevd
is the fastest but consumes more memory, while dsyevr
uses much smaller temporary memory.
So additionally, as a programmer not very familiar to low-level BLAS/LAPACK, I wonder that if it's common to use dsyevr
and dsyevd
as eigen-solvers, instead of dsyev
? If so, this may not be such important issue.
Benchmark results (16 cores @ Ryzen 7945HX)
- CPU ratio refers to (CPU time) / (elapsed time)
- eigen problem of matrix of 2048 x 2048, eigenvectors required, filled with values from 1 -- 2048
OpenBLAS 0.3.27 | MKL 2024.1 | |||
---|---|---|---|---|
function | elapsed time | CPU ratio | elapsed time | CPU ratio |
dsyev |
4861.5 msec | 1.65 | 588.7 msec | 15.75 |
dsyevd |
392.2 msec | 13.69 | 225.9 msec | 15.42 |
dsyevr |
805.6 msec | 6.30 | 698.6 msec | 4.69 |
dsyevx |
3969.8 msec | 1.82 | 542.2 msec | 15.76 |
Reproduction of this issue can be found in Github Action CI (2 physical cores @ EPYC 7763 of github action)(https://github.com/ajz34/issue_openblas_dsyev/actions/runs/9578584638/job/26409147609).
For scripts used in 16 cores @ Ryzen 7945HX, also see https://github.com/ajz34/issue_openblas_dsyev/tree/16-cores-Ryzen-7945HX.