Skip to content

Commit d339bd5

Browse files
authored
Merge pull request #5308 from martin-frbg/changelog0330
Update the Changelog for version 0.3.30
2 parents 1546599 + 157273f commit d339bd5

File tree

1 file changed

+134
-0
lines changed

1 file changed

+134
-0
lines changed

Changelog.txt

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,138 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.30
4+
19-Jun-2025
5+
6+
general:
7+
- fixed an installation problem with the thread safety test in gmake builds
8+
- fixed spurious overwriting of an input array in complex GEMMT/GEMMTR
9+
- fixed naming of GEMMTR in error messages from XERBLA
10+
- fixed compilation of SBGEMMT/SBGEMMTR in CMake builds
11+
- fixed the implementation of ?NRM2 to handle INCX=0 correctly
12+
- removed tests for CSROT and ZDROT that relied on unspecified behavior
13+
- fixed a performance regression in multithreaded GEMM that was particularly
14+
serious on POWER targets
15+
- fixed linking issues when using LLVM's flang-new with gmake
16+
- fixed a potential thread safety problem with C11 atomic operations
17+
- further improved the workload partitioning in parallel GEMM
18+
- fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in
19+
CMake builds
20+
- fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies
21+
for LAPACK function SPMV in CMake builds
22+
- added explicit CMake options for building LAPACKE and shared libraries
23+
- simplified and improved handling of OpenMP options in CMake builds
24+
- reworked Windows DLL generation in CMake builds to ensure correct symbol
25+
renaming (pre/postfixing) and optional generation of PDB files for debugging
26+
- updated the Perl script version of the gensymbol utility for use with
27+
Windows-on-Arm
28+
- Fixed building with (Mingw) gmake on Windows to ensure completeness of the
29+
LAPACK included in the static library (potential race condition due to the
30+
Windows version of the "ln" utility creating snapshot copies rather than links)
31+
- fixed unwanted deletion of the lapacke_mangling.h file by "make clean"
32+
- fixed potential duplication of a _64 suffix on library names in CMake builds
33+
- fixed compilation of the C fallback copies of the LAPACK code with GCC 15
34+
- included fixed from the Reference-LAPACK project:
35+
- fixed a truncated error message in the EIG part of the testsuite
36+
(Reference-LAPACK PR 1119)
37+
- fixed too strict check in LAPACKE_?gesdd_work (PR #1126)
38+
- fixed memory corruption when calling ?GEEV with non-finite data (PR #1128)
39+
- fixed missing initialization of a variable in C/GEQP3RK (PR #1131)
40+
- fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135)
41+
42+
x86_64:
43+
- fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids
44+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
45+
- improved the compiler identification code for flang-new
46+
- fixed a potential build issue in the ZSUM kernel
47+
- fixed "argument list too long" errors when building on MacOS
48+
- added cpu autodetection support for several new Arrow Lake models
49+
- fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH
50+
- fixed compilation with the MinGW build of GCC 15
51+
52+
arm64:
53+
- fixed cpu type detection of A64FX and some ThunderX models (broken in 0.3.29)
54+
- added support for the AmpereOne/1A cpus in DYNAMIC_ ARCH builds
55+
- added an optimized SBGEMM kernel for NEOVERSEV1
56+
- improved 1xN SBGEMM performance by forwarding to SBGEMV
57+
- introduced a stepwise increase of the thread count used for
58+
SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size
59+
- introduced a stepwise increase of the thread count used for
60+
DGEMV on NEOVERSEV1 in relation to problem size
61+
- introduced a stepwise increase of the thread count used for
62+
SDOT and DDOT on NEOVERSEV1 in relation to problem size
63+
- worked around assembler limitations in LLVM for Windows-on-Arm
64+
- enabled cpu type autodetection from the registry on Windows-on-Arm
65+
- improved multithreading threshold for GEMV and GESV on Windows-on-Arm
66+
- fixed overoptimization issues with LLVM's flang in Windows-on-Arm
67+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
68+
- added a fast path SGEMM kernel for small workloads on SME capable targets
69+
- improved performance of SGEMM and DGEMM kernels for small workloads
70+
- improved performance of SGEMV and DGEMV on SVE-capable targets
71+
- improved performance of SGEMV on NEOVERSEN1 and Apple M
72+
- added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all
73+
SVE capable targets
74+
- added optimized SBGEMV kernels for NEOVERSEV1/V2/N2
75+
- improved performance of SGEMM through faster NCOPY kernels
76+
- added compiler options for the NVIDIA HPC Compiler Suite
77+
- fixed compilation on OSX with XCode 16.3 and later
78+
- fixed cpu core type and cache size detection on Apple M4
79+
- updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake
80+
- fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds
81+
- fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH
82+
- fixed potential miscompilation of the non-SVE SDOT kernel
83+
84+
riscv64:
85+
- added optimized SROTM and DROTM kernels for x280
86+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
87+
- improved performance of GEMM_TCOPY on RVV1.0 targets with
88+
VLEN of 128 or 256
89+
- improved performance of OMATCOPY on targets with VLEN 256
90+
- greatly improved performance of SGEMV/DGEMV
91+
- improved performance of CGEMV and ZGEMV on C910V and all RVV targets
92+
with VLEN 256
93+
- improved performance of SAXPBY and DAXPBY on C910V and all RVV targets
94+
with VLEN 256
95+
- improved performance of AXPY and DOT on C910V and ZVL256B targets by
96+
falling back to non-vectorized code for very small N. (Thereby fixing
97+
poor performance of CHBMV/ZHBMV for very small K)
98+
- fixed CMake build failures of the TRMM kernels
99+
100+
loongarch64:
101+
- improved performance of the LSX versions of SSYMV/DSYMV
102+
- made the LASX versions of the DSYMV and SSYMV kernels
103+
compatible with hardware changes in LA664 and future targets
104+
- fixed inaccuracies in several LASX kernels
105+
- improved compatibility of LSX kernels with LA264 targets
106+
- fixed handling of deprecated target names in CMake builds
107+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
108+
109+
power:
110+
- fixed building for PPCG4 with CMake
111+
- fixed SSCAL/DSCAL on PPC970 running FreeBSD
112+
- fixed a potential alignment issue in the POWER8 SGEMV kernel
113+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
114+
115+
zarch:
116+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
117+
- fixed unwanted generation of object files with a writable stack
118+
119+
x86:
120+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
121+
- worked around potential miscompilation of CDOT with very old binutils
122+
123+
arm:
124+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
125+
- fixed unwanted generation of object files with a writable stack
126+
127+
sparc:
128+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
129+
130+
alpha:
131+
- fixed build failure caused by spurious Windows-only typecasts
132+
133+
cell:
134+
- fixed probable build issue caused by spurious Windows-only typecasts
135+
2136
====================================================================
3137
Version 0.3.29
4138
12-Jan-2025

0 commit comments

Comments
 (0)