Skip to content

Commit 3fe7f19

Browse files
authored
Update the Changelog for version 0.3.30
1 parent bad47bd commit 3fe7f19

File tree

1 file changed

+130
-0
lines changed

1 file changed

+130
-0
lines changed

Changelog.txt

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,134 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.30
4+
16-Jun-2025
5+
6+
general:
7+
- fixed an installation problem with the thread safety test in gmake builds
8+
- fixed spurious overwriting of an input array in complex GEMMT/GEMMTR
9+
- fixed naming of GEMMTR in error messages from XERBLA
10+
- fixed compilation of SBGEMMT/SBGEMMTR in CMake builds
11+
- fixed the implementation of ?NRM2 to handle INCX=0 correctly
12+
- removed tests for CSROT and ZDROT that relied on unspecified behavior
13+
- fixed a performance regression in multithreaded GEMM that was particularly
14+
serious on POWER targets
15+
- fixed linking issues when using LLVM's flang-new with gmake
16+
- fixed a potential thread safety problem with C11 atomic operations
17+
- further improved the workload partitioning in parallel GEMM
18+
- fixed omission of LAPACKE interfaces for CGESVDQ,CTRSYL3 and ?GEQPF in
19+
CMake builds
20+
- fixed mishandling of setting NO_LAPACK to FALSE, and incorrect dependencies
21+
for LAPACK function SPMV in CMake builds
22+
- added explicit CMake options for building LAPACKE and shared libraries
23+
- simplified and improved handling of OpenMP options in CMake builds
24+
- reworked Windows DLL generation in CMake builds to ensure correct symbol
25+
renaming (pre/postfixing) and optional generation of PDB files for debugging
26+
- updated the Perl script version of the gensymbol utility for use with
27+
Windows-on-Arm
28+
- Fixed building with (Mingw) gmake on Windows to ensure completeness of the
29+
LAPACK included in the static library (potential race condition due to the
30+
Windows version of the "ln" utility creating snapshot copies rather than links)
31+
- fixed unwanted deletion of the lapacke_mangling.h file by "make clean"
32+
- fixed potential duplication of a _64 suffix on library names in CMake builds
33+
- fixed compilation of the C fallback copies of the LAPACK code with GCC 15
34+
- included fixed from the Reference-LAPACK project:
35+
- fixed a truncated error message in the EIG part of the testsuite
36+
(Reference-LAPACK PR 1119)
37+
- fixed too strict check in LAPACKE_?gesdd_work (PR #1126)
38+
- fixed memory corruption when calling ?GEEV with non-finite data (PR #1128)
39+
- fixed missing initialization of a variable in C/GEQP3RK (PR #1131)
40+
- fixed 2nd dimension chosen in C/ZUNMLQ transposition operation (PR #1135)
41+
42+
x86_64:
43+
- fixed an error in the SBGEMV kernel for Cooper Lake/Sapphire Rapids
44+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
45+
- improved the compiler identification code for flang-new
46+
- fixed a potential build issue in the ZSUM kernel
47+
- fixed "argument list too long" errors when building on MacOS
48+
- added cpu autodetection support for several new Arrow Lake models
49+
- fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH
50+
- fixed compilation with the MinGW build of GCC 15
51+
52+
arm64:
53+
- added an optimized SBGEMM kernel for NEOVERSEV1
54+
- improved 1xN SBGEMM performance by forwarding to SBGEMV
55+
- introduced a stepwise increase of the thread count used for
56+
SGEMM and SGEMV on NEOVERSEV1/V2 in relation to problem size
57+
- introduced a stepwise increase of the thread count used for
58+
DGEMV on NEOVERSEV1 in relation to problem size
59+
- introduced a stepwise increase of the thread count used for
60+
SDOT and DDOT on NEOVERSEV1 in relation to problem size
61+
- worked around assembler limitations in LLVM for Windows-on-Arm
62+
- enabled cpu type autodetection from the registry on Windows-on-Arm
63+
- improved multithreading threshold for GEMV and GESV on Windows-on-Arm
64+
- fixed overoptimization issues with LLVM's flang in Windows-on-Arm
65+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
66+
- added a fast path SGEMM kernel for small workloads on SME capable targets
67+
- improved performance of SGEMM and DGEMM kernels for small workloads
68+
- improved performance of SGEMV and DGEMV on SVE-capable targets
69+
- improved performance of SGEMV on NEOVERSEN1 and Apple M
70+
- added optimized SSYMV and DSYMV kernels for NEOVERSEN1, Apple M and all
71+
SVE capable targets
72+
- added optimized SBGEMV kernels for NEOVERSEV1/V2/N2
73+
- improved performance of SGEMM through faster NCOPY kernels
74+
- added compiler options for the NVIDIA HPC Compiler Suite
75+
- fixed compilation on OSX with XCode 16.3 and later
76+
- fixed cpu core type and cache size detection on Apple M4
77+
- updated GEMM parameter settings for Neoverse cpus in cross-builds with CMake
78+
- fixed default compiler options for NEOVERSEN1 and CORTEXX2 in CMake builds
79+
- fixed conditional inclusion of the fast path SGEMM kernel in DYNAMIC_ARCH
80+
- fixed potential miscompilation of the non-SVE SDOT kernel
81+
82+
riscv64:
83+
- added optimized SROTM and DROTM kernels for x280
84+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
85+
- improved performance of GEMM_TCOPY on RVV1.0 targets with
86+
VLEN of 128 or 256
87+
- improved performance of OMATCOPY on targets with VLEN 256
88+
- greatly improved performance of SGEMV/DGEMV
89+
- improved performance of CGEMV and ZGEMV on C910V and all RVV targets
90+
with VLEN 256
91+
- improved performance of SAXPBY and DAXPBY on C910V and all RVV targets
92+
with VLEN 256
93+
- improved performance of AXPY and DOT on C910V and ZVL256B targets by
94+
falling back to non-vectorized code for very small N. (Thereby fixing
95+
poor performance of CHBMV/ZHBMV for very small K)
96+
- fixed CMake build failures of the TRMM kernels
97+
98+
loongarch64:
99+
- improved performance of the LSX versions of SSYMV/DSYMV
100+
- made the LASX versions of the DSYMV and SSYMV kernels
101+
compatible with hardware changes in LA664 and future targets
102+
- fixed inaccuracies in several LASX kernels
103+
- improved compatibility of LSX kernels with LA264 targets
104+
- fixed handling of deprecated target names in CMake builds
105+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
106+
107+
power:
108+
- fixed building for PPCG4 with CMake
109+
- fixed SSCAL/DSCAL on PPC970 running FreeBSD
110+
- fixed a potential alignment issue in the POWER8 SGEMV kernel
111+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
112+
113+
zarch:
114+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
115+
- fixed unwanted generation of object files with a writable stack
116+
117+
x86:
118+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
119+
120+
arm:
121+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
122+
123+
sparc:
124+
- fixed corner cases of NAN and INF input handling in CSCAL and ZSCAL
125+
126+
alpha:
127+
- fixed build failure caused by spurious Windows-only typecasts
128+
129+
cell:
130+
- fixed probable build issue caused by spurious Windows-only typecasts
131+
2132
====================================================================
3133
Version 0.3.29
4134
12-Jan-2025

0 commit comments

Comments
 (0)