|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.25 |
| 4 | + 12-Nov-2023 |
| 5 | + |
| 6 | +general: |
| 7 | +- improved the error message shown on exceeding the maximum thread count |
| 8 | +- improved the code to add supplementary thread buffers in case of overflow |
| 9 | +- fixed a potential division by zero in ?ROTG |
| 10 | +- improved the ?MATCOPY functions to accept zero-sized rows or columns |
| 11 | +- corrected empty prototypes in function declarations |
| 12 | +- cleaned up unused declarations in the f2c-converted versions of the LAPACK sources |
| 13 | +- fixed compilation with the Cray CCE Compiler suite |
| 14 | +- improved link line rewriting to avoid mixed libgomp/libomp builds with clang&gfortran |
| 15 | +- worked around OPENMP builds with LLVM14's libomp hanging on FreeBSD |
| 16 | +- improved the Makefiles to require less option duplication on "make install" |
| 17 | +- imported the following changes from the upcoming release 3.12 of Reference-LAPACK |
| 18 | + - deprecate utility functions ?GELQS and ?GEQRS (LAPACK PR 900) |
| 19 | + - apply rounding up to workspace calculations done in floating point (LAPACK PR 904) |
| 20 | + - avoid overflow in STGEX2/DTGEX2 (LAPACK PR 907) |
| 21 | + - fix accumulation in ?LASSQ (LAPACK PR 909) |
| 22 | + - fix handling of NaN values in ?GECON (LAPACK PR 926) |
| 23 | + - avoid overflow in CBDSQR/ZBDSQR (LAPACK PR 927) |
| 24 | + - fix poor vector orthogonalizations in ?ORBDB5/?UNBDB5 (LAPACK PR 928 & 930) |
| 25 | + |
| 26 | +x86-64: |
| 27 | +- fixed compile-time autodetection of AMD Ryzen3 and Ryzen4 cpus |
| 28 | +- fixed capability-based fallback selection for unknown cpus in DYNAMIC_ARCH |
| 29 | +- added AVX512 optimizations for ?ASUM on Sapphire Rapids and Cooper Lake |
| 30 | + |
| 31 | +ARM64: |
| 32 | +- fixed building on Apple with homebrew gcc |
| 33 | +- fixed building with XCODE 15 |
| 34 | +- fixed building on A64FX and Cortex A710/X1/X2 |
| 35 | +- increased the default buffer size for recent ARM server cpus |
| 36 | + |
| 37 | +POWER: |
| 38 | +- fixed building with the IBM xlf 16.1.1 compiler |
| 39 | +- fixed building with IBM XL C |
| 40 | +- added support for DYNAMIC_ARCH builds with clang |
| 41 | +- fixed union declaration in the BFLOAT16 test case |
| 42 | +- enable optimizations for the AIX assembler on POWER10 |
| 43 | + |
| 44 | +LOONGARCH64: |
| 45 | +- added an optimized SGEMV kernel |
| 46 | +- added an optimized DTRSM kernel |
| 47 | + |
2 | 48 | ====================================================================
|
3 | 49 | Version 0.3.24
|
4 | 50 | 03-Sep-2023
|
|
0 commit comments