Skip to content

Commit 7f5b703

Browse files
authored
Merge pull request #5065 from martin-frbg/changelog0329
Update the Changelog for version 0.3.29
2 parents f422845 + 20f6114 commit 7f5b703

File tree

1 file changed

+95
-0
lines changed

1 file changed

+95
-0
lines changed

Changelog.txt

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,99 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.29
4+
12-Jan-2025
5+
6+
general:
7+
- fixed a potential NULL pointer dereference in multithreaded builds
8+
- added function aliases for GEMMT using its new name GEMMTR adopted by Reference-BLAS
9+
- fixed a build failure when building without LAPACK_DEPRECATED functions
10+
- the minimum required CMake version for CMake-based builds was raised to 3.16.0 in order
11+
to remove many compatibility and deprecation warnings
12+
- added more detailed CMake rules for OpenMP builds (mainly to support recent LLVM)
13+
- fixed the behavior of the recently added CBLAS_?GEMMT functions with row-major data
14+
- improved thread scaling of multithreaded SBGEMV
15+
- improved thread scaling of multithreaded TRTRI
16+
- fixed compilation of the CBLAS testsuite with gcc14 (and no Fortran compiler)
17+
- added support for option handling changes in flang-new from LLVM18 onwards
18+
- added support for recent calling conventions changes in Cray and NVIDIA compilers
19+
- added support for compilation with the NAG Fortran compiler
20+
- fixed placement of the -fopenmp flag and libsuffix in the generated pkgconfig file
21+
- improved the CMakeConfig file generated by the Makefile build
22+
- fixed const-correctness of cblas_?geadd in cblas.h
23+
- fixed a potential inaccuracy in multithreaded BLAS3 calls
24+
- fixed empty implementations of get/set_affinity that print a warning in OpenMP builds
25+
- fixed function signatures for TRTRS in the converted C version of LAPACK
26+
- fixed omission of several single-precision LAPACK symbols in the shared library
27+
- improved build instructions for the provided "pybench" benchmarks
28+
- improved documentation, including added build instructions for WoA and HarmonyOS
29+
as well as descriptions of environment variables that affect build and runtime behavior
30+
- added a separate "make install_tests" target for use with cross-compilations
31+
- integrated improvements and corrections from Reference-LAPACK:
32+
- removed a comparison in LAPACKE ?tpmqrt that is always false (LAPACK PR 1062)
33+
- fixed the leading dimension for B in tests for GGEV (LAPACK PR 1064)
34+
- replaced the ?LARFT functions with a recursive implementation (LAPACK PR 1080)
35+
36+
arm:
37+
- fixed build with recent versions of the NDK (missing .type declaration of symbols)
38+
39+
arm64:
40+
- fixed a long-standing bug in the (generic) c/zgemm_beta kernel that could lead to
41+
reads and writes outside the array bounds in some circumstances
42+
- rewrote cpu autodetection to scan all cores and return the highest performing type
43+
- improved the DGEMM performance for SVE targets and small matrix sizes
44+
- improved dimension criteria for forwarding from GEMM to GEMV kernels
45+
- added SVE kernels for ROT and SWAP
46+
- improved SVE kernels for SGEMV and DGEMV on A64FX and NEOVERSEV1
47+
- added support for using the "small matrix" kernels with CMake as well
48+
- fixed compilation on Windows on Arm
49+
- improved compile-time detection of SVE capability
50+
- added cpu autodetection and initial support for Apple M4
51+
- added support for compilation on systems running IOS
52+
- added support for compilation on NetBSD ("evbarm" architecture)
53+
- fixed NRM2 implementations for generic SVE targets and the Neoverse N2
54+
- fixed compilation for SVE-capable targets with the NVIDIA compiler
55+
56+
x86_64:
57+
- fixed a wrong storage size in the SBGEMV kernel for Cooper Lake
58+
- added cpu autodetection for Intel Granite Rapids
59+
- added cpu autodetection for AMD Ryzen 5 series
60+
- added optimized SOMATCOPY_CT for AVX-capable targets
61+
- fixed the fallback implementation of GEMM3M in GENERIC builds
62+
- tentatively re-enabled builds with the EXPRECISION option
63+
- worked around a miscompilation of tests with mingw32-gfortran14
64+
- added support for compilation with the Intel oneAPI 2025.0 compiler on Windows
65+
66+
power:
67+
- fixed multithreaded SBGEMM
68+
- fixed a CMake build problem on POWER10
69+
- improved the performance of SGEMV
70+
- added vectorized implementations of SBGEMV and support for forwarding 1xN SBGEMM to them
71+
- fixed illegal instructions and potential memory overflow in SGEMM on PPCG4
72+
- fixed handling of NaN and Inf arguments in SSCAL and DSCAL on PPC440,G4 and 970
73+
- added improved CGEMM and ZGEMM kernels for POWER10
74+
- added Makefile logic to remove all optimization flags in DEBUG builds
75+
76+
mips64:
77+
- fixed compilation with gcc14
78+
- fixed GEMM parameter selection for the MIPS64_GENERIC target
79+
- fixed a potential build failure when compiling with OpenMP
80+
81+
loongarch64:
82+
- fixed compilation for Loongson3 with recent versions of gmake
83+
- fixed a potential loss of precision in Loongson3A GEMM
84+
- fixed a potential build failure when compiling with OpenMP
85+
- added optimized SOMATCOPY for LASX-capable targets
86+
- introduced a new cpu naming scheme while retaining compatibility
87+
- added support for cross-compiling Loongarch64 targets with CMake
88+
- added support for compilation with LLVM
89+
90+
riscv64:
91+
- removed thread yielding overhead caused by sched_yield
92+
- replaced some non-standard intrinsics with their official names
93+
- fixed and sped up the implementations of CGEMM/ZGEMM TCOPY for vector lenghts 128 and 256
94+
- improved the performance of SNRM2/DNRM2 for RVV1.0 targets
95+
- added optimized ?OMATCOPY_CN kernels for RVV1.0 targets
96+
297
====================================================================
398
Version 0.3.28
499
8-Aug-2024

0 commit comments

Comments
 (0)