|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.24 |
| 4 | + 03-Sep-2023 |
| 5 | + |
| 6 | +general: |
| 7 | + - declared the arguments of cblas_xerbla as const (in accordance with the reference implementation |
| 8 | + and others, the previous discrepancy appears to have dated back to GotoBLAS) |
| 9 | + - fixed the implementation of ?GEMMT that was added in 0.3.23 |
| 10 | + - made cpu-specific SWITCH_RATIO parameters for GEMM available to DYNAMIC_ARCH builds |
| 11 | + - fixed application of SYMBOLSUFFIX in CMAKE builds |
| 12 | + - fixed missing SSYCONVF function in the shared library |
| 13 | + - fixed parallel build logic used with gmake |
| 14 | + - added support for compilation with LLVM17, in particular its new Fortran compiler |
| 15 | + - added support for CMAKE builds using the NVIDIA HPC compiler |
| 16 | + - fixed INTERFACE64 builds with CMAKE and the f95 Fortran compiler |
| 17 | + - fixed cross-build detection and management in c_check |
| 18 | + - disabled building of the tests with CMAKE when ONLY_CBLAS is defined |
| 19 | + - fixed several issues with the handling of runtime limits on the number of OPENMP threads |
| 20 | + - corrected the error code returned by SGEADD/DGEADD when LDA is too small |
| 21 | + - corrected the error code returned by IMATCOPY when LDB is too small |
| 22 | + - updated ?NRM2 to support negative increment values (as introduced in release 3.10 |
| 23 | + of the reference BLAS) |
| 24 | + - fixed OpenMP builds with CLANG for the case where libomp is not in a standard location |
| 25 | + - fixed a potential overwrite of unrelated memory during thread initialisation on startup |
| 26 | + - fixed a potential integer overflow in the multithreading threshold for ?SYMM/?SYRK |
| 27 | + - fixed build of the LAPACKE interfaces for the LAPACK 3.11.0 ?TRSYL functions added in 0.3.22 |
| 28 | + - fixed installation of .cmake files in concurrent 32 and 64bit builds with CMAKE |
| 29 | + - applied additions and corrections from the development branch of Reference-LAPACK: |
| 30 | + - fixed actual arguments passed to a number of LAPACK functions (from Reference-LAPACK PR 885) |
| 31 | + - fixed workspace query results in LAPACK ?SYTRF/?TRECV3 (from Reference-LAPACK PR 883) |
| 32 | + - fixed derivation of the UPLO parameter in LAPACKE_?larfb (from Reference-LAPACK PR 878) |
| 33 | + - fixed a crash in LAPACK ?GELSDD on NRHS=0 (from Reference-LAPACK PR 876) |
| 34 | + - added new LAPACK utility functions CRSCL and ZRSCL (from Reference-LAPACK PR 839) |
| 35 | + - corrected the order of eigenvalues for 2x2 matrices in ?STEMR (Reference-LAPACK PR 867) |
| 36 | + - removed spurious reference to OpenMP variables outside OpenMP contexts (Reference-LAPACK PR 860) |
| 37 | + - updated file comments on use of LAMBDA variable in LAPACK (Reference-LAPACK PR 852) |
| 38 | + - fixed documentation of LAPACK SLASD0/DLASD0 (Reference-LAPACK PR 855) |
| 39 | + - fixed confusing use of "minor" in LAPACK documentation (Reference-LAPACK PR 849) |
| 40 | + - added new LAPACK functions ?GEDMD for dynamic mode decomposition (Reference-LAPACK PR 736) |
| 41 | + - fixed potential stack overflows in the EIG part of the LAPACK testsuite (Reference-LAPACK PR 854) |
| 42 | + - applied small improvements to the variants of Cholesky and QR functions (Reference-LAPACK PR 847) |
| 43 | + - removed unused variables from LAPACK ?BDSQR (Reference-LAPACK PR 832) |
| 44 | + - fixed a potential crash on allocation failure in LAPACKE SGEESX/DGEESX (Reference-LAPACK PR 836) |
| 45 | + - added a quick return from SLARUV/DLARUV for N < 1 (Reference-LAPACK PR 837) |
| 46 | + - updated function descriptions in LAPACK ?GEGS/?GEGV (Reference-LAPACK PR 831) |
| 47 | + - improved algorithm description in ?GELSY (Reference-LAPACK PR 833) |
| 48 | + - fixed scaling in LAPACK STGSNA/DTGSNA (Reference-LAPACK PR 830) |
| 49 | + - fixed crash in LAPACKE_?geqrt with row-major data (Reference-LAPACK PR 768) |
| 50 | + - added LAPACKE interfaces for C/ZUNHR_COL and S/DORHR_COL (Reference-LAPACK PR 827) |
| 51 | + - added error exit tests for SYSV/SYTD2/GEHD2 to the testsuite (Reference-LAPACK PR 795) |
| 52 | + - fixed typos in LAPACK source and comments (Reference-LAPACK PRs 809,811,812,814,820) |
| 53 | + - adopt refactored ?GEBAL implementation (Reference-LAPACK PR 808) |
| 54 | + |
| 55 | +x86_64: |
| 56 | + - added cpu model autodetection for Intel Alder Lake N |
| 57 | + - added activation of the AMX tile to the Sapphire Rapids SBGEMM kernel |
| 58 | + - worked around miscompilations of GEMV/SYMV kernels by gcc's tree-vectorizer |
| 59 | + - fixed compilation of Cooperlake and Sapphire Rapids kernels with CLANG |
| 60 | + - fixed runtime detection of Cooperlake and Sapphire Rapids in DYNAMIC_ARCH |
| 61 | + - fixed feature-based cputype fallback in DYNAMIC_ARCH |
| 62 | + - added support for building the AVX512 kernels with the NVIDIA HPC compiler |
| 63 | + - corrected ZAXPY result on old pre-AVX hardware for the INCX=0 case |
| 64 | + - fixed a potential use of uninitialized variables in ZTRSM |
| 65 | + |
| 66 | +ARM64: |
| 67 | + - added cpu model autodetection for Apple M2 |
| 68 | + - fixed wrong results of CGEMM/CTRMM/DNRM2 under OSX (use of reserved register) |
| 69 | + - added support for building the SVE kernels with the NVIDIA HPC compiler |
| 70 | + - added support for building the SVE kernels with the Apple Clang compiler |
| 71 | + - fixed compiler option handling for building the SVE kernels with LLVM |
| 72 | + - implemented SWITCH_RATIO parameter for improved GEMM performance on Neoverse |
| 73 | + - activated SVE SGEMM and DGEMM kernels for Neoverse V1 |
| 74 | + - improved performance of the SVE CGEMM and ZGEMM kernels on Neoverse V1 |
| 75 | + - improved kernel selection for the ARMV8SVE target and added it to DYNAMIC_ARCH |
| 76 | + - fixed runtime check for SVE availability in DYNAMIC_ARCH builds to take OS or |
| 77 | + container restrictions into account |
| 78 | + - fixed a potential use of uninitialized variables in ZTRSM |
| 79 | + - fix a potential misdetection of ARMV8 hardware as 32bit in CMAKE builds |
| 80 | + |
| 81 | +LOONGARCH64: |
| 82 | + - added ABI detection |
| 83 | + - added support for cpu affinity handling |
| 84 | + - fixed compilation with early versions of the Loongson toolchain |
| 85 | + - added an optimized SGEMM kernel for 3A5000 |
| 86 | + - added optimized DGEMV kernels for 3A5000 |
| 87 | + - improved the performance of the DGEMM kernel for 3A5000 |
| 88 | + |
| 89 | +MIPS64: |
| 90 | + - fixed miscompilation of TRMM kernels for the MIPS64_GENERIC target |
| 91 | + |
| 92 | +POWER: |
| 93 | + - fixed compiler warnings in the POWER10 SBGEMM kernel |
| 94 | + |
| 95 | +RISCV: |
| 96 | + - fixed application of the INTERFACE64 option when building with CMAKE |
| 97 | + - fix a potential misdetection of RISCV hardware as 32bit in CMAKE builds |
| 98 | + - fixed IDAMAX and DOT kernels for C910V |
| 99 | + - fixed corner cases in the ROT and SWAP kernels for C910V |
| 100 | + - fixed compilation of the C910V target with recent vendor compilers |
| 101 | + |
2 | 102 | ====================================================================
|
3 | 103 | Version 0.3.23
|
4 | 104 | 01-Apr-2023
|
|
0 commit comments