Skip to content

Commit 2c8e001

Browse files
authored
Merge pull request #4853 from martin-frbg/changelog0328
Update Changelog.txt for 0.3.28
2 parents 1c2bfea + 1df95bb commit 2c8e001

File tree

1 file changed

+123
-0
lines changed

1 file changed

+123
-0
lines changed

Changelog.txt

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,127 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.28
4+
8-Aug-2024
5+
6+
general:
7+
- Reworked the unfinished implementation of HUGETLB from GotoBLAS
8+
for allocating huge memory pages as buffers on suitable systems
9+
- Changed the unfinished implementation of GEMM3M for the generic
10+
target on all architectures to at least forward to regular GEMM
11+
- Improved multithreaded GEMM performance for large non-skinny matrices
12+
- Improved BLAS3 performance on larger multicore systems through improved
13+
parallelism
14+
- Improved performance of the initial memory allocation by reducing
15+
locking overhead
16+
- Improved performance of GBMV at small problem sizes by introducing
17+
a size barrier for the switch to multithreading
18+
- Added an implementation of the CBLAS_GEMM_BATCH extension
19+
- Fixed miscompilation of CAXPYC and ZAXPYC on all architectures in
20+
CMAKE builds (error introduced in 0.3.27)
21+
- Fixed corner cases involving the handling of NAN and INFINITY
22+
arguments in ?SCAL on all architectures
23+
- Added support for cross-compiling to WEBM with CMAKE (in addition
24+
to the already present makefile support)
25+
- Fixed NAN handling and potential accuracy issues in compilations with
26+
Intel ICX by supplying a suitable fp-model option by default
27+
- The contents of the github project wiki have been converted into
28+
a new set of documentation included with the source code.
29+
- It is now possible to register a callback function that replaces
30+
the built-in support for multithreading with an external backend
31+
like TBB (openblas_set_threads_callback_function)
32+
- Fixed potential duplication of suffixes in shared library naming
33+
- Improved C compiler detection by the build system to tolerate more
34+
naming variants for gcc builds
35+
- Fixed an unnecessary dependency of the utest on CBLAS
36+
- Fixed spurious error reports from the BLAS extensions utest
37+
- Fixed unwanted invocation of the GEMM3M tests in cross-compilation
38+
- Fixed a flaw in the makefile build that could lead to the pkgconfig
39+
file containing an entry of UNKNOWN for the target cpu after installing
40+
- Integrated fixes from the Reference-LAPACK project:
41+
- Fixed uninitialized variables in the LAPACK tests for ?QP3RK (PR 961)
42+
- Fixed potential bounds error in ?UNHR_COL/?ORHR_COL (PR 1018)
43+
- Fixed potential infinite loop in the LAPACK testsuite (PR 1024)
44+
- Make the variable type used for hidden length arguments configurable (PR 1025)
45+
- Fixed SYTRD workspace computation and various typos (PR 1030)
46+
- Prevent compiler use of FMA that could increase numerical error in ?GEEVX (PR 1033)
47+
48+
x86-64:
49+
- reverted thread management under Windows to its state before 0.3.26
50+
due to signs of race conditions in some circumstances now under study
51+
- fixed accidental selection of the unoptimized generic SBGEMM kernel
52+
in CMAKE builds for CooperLake and SapphireRapids targets
53+
- fixed a potential thread buffer overrun in SBSTOBF16 on small systems
54+
- fixed an accuracy issue in ZSCAL introduced in 0.3.26
55+
- fixed compilation with CMAKE and recent releases of LLVM
56+
- added support for Intel Emerald Rapids and Meteor Lake cpus
57+
- added autodetection support for the Zhaoxin KX-7000 cpu
58+
- fixed autodetection of Intel Prescott (probably broken since 0.3.19)
59+
- fixed compilation for older targets with the Yocto SDK
60+
- fixed compilation of the converter-generated C versions
61+
of the LAPACK sources with gcc-14
62+
- improved compiler options when building with CMAKE and LLVM for
63+
AVX512-capable targets
64+
- added support for supplying the L2 cache size via an environment
65+
variable (OPENBLAS_L2_SIZE) in case it is not correctly reported
66+
(as in some VM configurations)
67+
- improved the error message shown when thread creation fails on startup
68+
- fixed setting the rpath entry of the dylib in CMAKE builds on MacOS
69+
70+
arm:
71+
- fixed building for baremetal targets with make
72+
73+
arm64:
74+
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
75+
matrix to the corresponding GEMV kernel
76+
- added optimized SGEMV and DGEMV kernels for A64FX
77+
- added optimized SVE kernels for small-matrix GEMM
78+
- added A64FX to the cpu list for DYNAMIC_ARCH
79+
- fixed building with support for cpu affinity
80+
- worked around accuracy problems with C/ZNRM2 on NeoverseN1 and
81+
Apple M targets
82+
- improved GEMM performance on Neoverse V1
83+
- fixed compilation for NEOVERSEN2 with older compilers
84+
- fixed potential miscompilation of the SVE SDOT and DDOT kernels
85+
- fixed potential miscompilation of the non-SVE CDOT and ZDOT kernels
86+
- fixed a potential overflow when using very large user-defined BUFFERSIZE
87+
- fixed setting the rpath entry of the dylib in CMAKE builds on MacOS
88+
89+
power:
90+
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
91+
matrix to the corresponding GEMV kernel
92+
- significantly improved performance of SBGEMM on POWER10
93+
- fixed compilation with OpenMP and the XLF compiler
94+
- fixed building of the BLAS extension utests under AIX
95+
- fixed building of parts of the LAPACK testsuite with XLF
96+
- fixed CSWAP/ZSWAP on big-endian POWER10 targets
97+
- fixed a performance regression in SAXPY on POWER10 with OpenXL
98+
- fixed accuracy issues in CSCAL/ZSCAL when compiled with LLVM
99+
- fixed building for POWER9 under FreeBSD
100+
- fixed a potential overflow when using very large user-defined BUFFERSIZE
101+
- fixed an accuracy issue in the POWER6 kernels for GEMM and GEMV
102+
103+
riscv64:
104+
- Added a fast path forwarding SGEMM and DGEMM calls with a 1xN or Mx1
105+
matrix to the corresponding GEMV kernel
106+
- fixed building for RISCV64_GENERIC with OpenMP enabled
107+
- added DYNAMIC_ARCH support (comprising GENERIC_RISCV64 and the two
108+
RVV 1.0 targets with vector length of 128 and 256)
109+
- worked around the ZVL128B kernels for AXPBY mishandling the special
110+
case of zero Y increment
111+
112+
loongarch64:
113+
- improved GEMM performance on servers of the 3C5000 generation
114+
- improved performance and stability of DGEMM
115+
- improved GEMV and TRSM kernels for LSX and LASX vector ABIs
116+
- fixed CMAKE compilation with the INTERFACE64 option set
117+
- fixed compilation with CMAKE
118+
- worked around spurious errors flagged by the BLAS3 tests
119+
- worked around a miscompilation of the POTRS utest by gcc 14.1
120+
121+
mips64:
122+
- fixed ASUM and SUM kernels to accept negative step sizes in X
123+
- fixed complex GEMV kernels for MSA
124+
2125
====================================================================
3126
Version 0.3.27
4127
4-Apr-2024

0 commit comments

Comments
 (0)