|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.30 |
| 4 | +19-Jun-2025 |
| 5 | + |
| 6 | +general: |
| 7 | + - fixed an installation problem with the thread safety test in gmake builds |
| 8 | + - fixed spurious overwriting of an input array in complex GEMMT/GEMMTR |
| 9 | + - fixed naming of GEMMTR in error messages from XERBLA |
| 10 | + - fixed compilation of SBGEMMT/SBGEMMTR in CMake builds |
| 11 | + - fixed the implementation of ?NRM2 to handle INCX=0 correctly |
| 12 | + - removed tests for CSROT and ZDROT that relied on unspecified behavior |
| 13 | + - fixed a performance regression in multithreaded GEMM that was particularly |
| 14 | + serious on POWER targets |
| 15 | + - fixed linking issues when using LLVM's flang-new with gmake |
| 16 | + - fixed a potential thread safety problem with C11 atomic operations |
| 17 | + - further improved the workload partitioning in parallel GEMM |
| 18 | + - fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in |
| 19 | + CMake builds |
| 20 | + - fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies |
| 21 | + for LAPACK function SPMV in CMake builds |
| 22 | + - added explicit CMake options for building LAPACKE and shared libraries |
| 23 | + - simplified and improved handling of OpenMP options in CMake builds |
| 24 | + - reworked Windows DLL generation in CMake builds to ensure correct symbol |
| 25 | + renaming (pre/postfixing) and optional generation of PDB files for debugging |
| 26 | + - updated the Perl script version of the gensymbol utility for use with |
| 27 | + Windows-on-Arm |
| 28 | + - Fixed building with (Mingw) gmake on Windows to ensure completeness of the |
| 29 | + LAPACK included in the static library (potential race condition due to the |
| 30 | + Windows version of the "ln" utility creating snapshot copies rather than links) |
| 31 | + - fixed unwanted deletion of the lapacke_mangling.h file by "make clean" |
| 32 | + - fixed potential duplication of a _64 suffix on library names in CMake builds |
| 33 | + - fixed compilation of the C fallback copies of the LAPACK code with GCC 15 |
| 34 | + - included fixed from the Reference-LAPACK project: |
| 35 | + - fixed a truncated error message in the EIG part of the testsuite |
| 36 | + (Reference-LAPACK PR 1119) |
| 37 | + - fixed too strict check in LAPACKE_?gesdd_work (PR #1126) |
| 38 | + - fixed memory corruption when calling ?GEEV with non-finite data (PR #1128) |
| 39 | + - fixed missing initialization of a variable in C/GEQP3RK (PR #1131) |
| 40 | + - fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135) |
| 41 | + |
| 42 | +x86_64: |
| 43 | + - fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids |
| 44 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 45 | + - improved the compiler identification code for flang-new |
| 46 | + - fixed a potential build issue in the ZSUM kernel |
| 47 | + - fixed "argument list too long" errors when building on MacOS |
| 48 | + - added cpu autodetection support for several new Arrow Lake models |
| 49 | + - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH |
| 50 | + - fixed compilation with the MinGW build of GCC 15 |
| 51 | + |
| 52 | +arm64: |
| 53 | + - fixed cpu type detection of A64FX and some ThunderX models (broken in 0.3.29) |
| 54 | + - added support for the AmpereOne/1A cpus in DYNAMIC_ ARCH builds |
| 55 | + - added an optimized SBGEMM kernel for NEOVERSEV1 |
| 56 | + - improved 1xN SBGEMM performance by forwarding to SBGEMV |
| 57 | + - introduced a stepwise increase of the thread count used for |
| 58 | + SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size |
| 59 | + - introduced a stepwise increase of the thread count used for |
| 60 | + DGEMV on NEOVERSEV1 in relation to problem size |
| 61 | + - introduced a stepwise increase of the thread count used for |
| 62 | + SDOT and DDOT on NEOVERSEV1 in relation to problem size |
| 63 | + - worked around assembler limitations in LLVM for Windows-on-Arm |
| 64 | + - enabled cpu type autodetection from the registry on Windows-on-Arm |
| 65 | + - improved multithreading threshold for GEMV and GESV on Windows-on-Arm |
| 66 | + - fixed overoptimization issues with LLVM's flang in Windows-on-Arm |
| 67 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 68 | + - added a fast path SGEMM kernel for small workloads on SME capable targets |
| 69 | + - improved performance of SGEMM and DGEMM kernels for small workloads |
| 70 | + - improved performance of SGEMV and DGEMV on SVE-capable targets |
| 71 | + - improved performance of SGEMV on NEOVERSEN1 and Apple M |
| 72 | + - added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all |
| 73 | + SVE capable targets |
| 74 | + - added optimized SBGEMV kernels for NEOVERSEV1/V2/N2 |
| 75 | + - improved performance of SGEMM through faster NCOPY kernels |
| 76 | + - added compiler options for the NVIDIA HPC Compiler Suite |
| 77 | + - fixed compilation on OSX with XCode 16.3 and later |
| 78 | + - fixed cpu core type and cache size detection on Apple M4 |
| 79 | + - updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake |
| 80 | + - fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds |
| 81 | + - fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH |
| 82 | + - fixed potential miscompilation of the non-SVE SDOT kernel |
| 83 | + |
| 84 | +riscv64: |
| 85 | + - added optimized SROTM and DROTM kernels for x280 |
| 86 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 87 | + - improved performance of GEMM_TCOPY on RVV1.0 targets with |
| 88 | + VLEN of 128 or 256 |
| 89 | + - improved performance of OMATCOPY on targets with VLEN 256 |
| 90 | + - greatly improved performance of SGEMV/DGEMV |
| 91 | + - improved performance of CGEMV and ZGEMV on C910V and all RVV targets |
| 92 | + with VLEN 256 |
| 93 | + - improved performance of SAXPBY and DAXPBY on C910V and all RVV targets |
| 94 | + with VLEN 256 |
| 95 | + - improved performance of AXPY and DOT on C910V and ZVL256B targets by |
| 96 | + falling back to non-vectorized code for very small N. (Thereby fixing |
| 97 | + poor performance of CHBMV/ZHBMV for very small K) |
| 98 | + - fixed CMake build failures of the TRMM kernels |
| 99 | + |
| 100 | +loongarch64: |
| 101 | + - improved performance of the LSX versions of SSYMV/DSYMV |
| 102 | + - made the LASX versions of the DSYMV and SSYMV kernels |
| 103 | + compatible with hardware changes in LA664 and future targets |
| 104 | + - fixed inaccuracies in several LASX kernels |
| 105 | + - improved compatibility of LSX kernels with LA264 targets |
| 106 | + - fixed handling of deprecated target names in CMake builds |
| 107 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 108 | + |
| 109 | +power: |
| 110 | + - fixed building for PPCG4 with CMake |
| 111 | + - fixed SSCAL/DSCAL on PPC970 running FreeBSD |
| 112 | + - fixed a potential alignment issue in the POWER8 SGEMV kernel |
| 113 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 114 | + |
| 115 | +zarch: |
| 116 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 117 | + - fixed unwanted generation of object files with a writable stack |
| 118 | + |
| 119 | +x86: |
| 120 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 121 | + - worked around potential miscompilation of CDOT with very old binutils |
| 122 | + |
| 123 | +arm: |
| 124 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 125 | + - fixed unwanted generation of object files with a writable stack |
| 126 | + |
| 127 | +sparc: |
| 128 | + - fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL |
| 129 | + |
| 130 | +alpha: |
| 131 | + - fixed build failure caused by spurious Windows-only typecasts |
| 132 | + |
| 133 | +cell: |
| 134 | + - fixed probable build issue caused by spurious Windows-only typecasts |
| 135 | + |
2 | 136 | ====================================================================
|
3 | 137 | Version 0.3.29
|
4 | 138 | 12-Jan-2025
|
|
0 commit comments