|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.22 |
| 4 | + 26-Mar-2023 |
| 5 | + |
| 6 | +general: |
| 7 | + - Updated the included LAPACK to Reference-LAPACK release 3.11.0 |
| 8 | + plus post-release corrections and improvements |
| 9 | + - Added initial support for processing with the EMSCRIPTEN javascript |
| 10 | + converter (yielding a single-threaded build only) |
| 11 | + - Added a threshold for multithreading in SYMM, SYMV and SYR2K |
| 12 | + - Increased the threshold for multithreading in SYRK |
| 13 | + - OpenBLAS no longer decreases the global OMP_NUM_THREADS when it |
| 14 | + exceeds the maximum thread count the library was compiled for. |
| 15 | + - fixed ?GETF2 potentially returning NaN with tiny matrix elements |
| 16 | + - fixed openblas_set_num_threads to work in USE_OPENMP builds |
| 17 | + - fixed cpu core counting in USE_OPENMP builds returning the number |
| 18 | + of OMP "places" rather than cores |
| 19 | + - fixed interpretation of USE_PERL=0 in build scripts |
| 20 | + - fixed linking of the library with libm in CMAKE builds |
| 21 | + - fixed startup delays resulting from a wrong default setting of |
| 22 | + NO_WARMUP in CMAKE builds |
| 23 | + - fixed inconsistent defaults for overriding of LAPACK SPMV, SPR, |
| 24 | + SYMV, SYR functions in gmake and CMAKE builds |
| 25 | + - fixed stride calculation in the optimized small-matrix path of |
| 26 | + complex SYR |
| 27 | + - fixed compilation of ReLAPACK with CMAKE |
| 28 | + - fixed pkgconfig file contents for INTERFACE64 builds |
| 29 | + - fixed building of Reference-LAPACK with recent gfortran |
| 30 | + - fixed building with only a subset of precision types on Windows |
| 31 | + - added new environment variable OPENBLAS_DEFAULT_NUM_THREADS |
| 32 | + - added a GEMV-based implementation of GEMMT |
| 33 | + - added support for building under QNX |
| 34 | + - updated support for (cross-)building for ALPHA targets |
| 35 | + |
| 36 | +x86_64: |
| 37 | + - added autodetection of Intel Raptor Lake cpu models |
| 38 | + - added SSCAL microkernels for Haswell and newer targets |
| 39 | + - improved the performance of the Haswell DSCAL microkernel |
| 40 | + - added CSCAL and ZSCAL microkernels for SkylakeX targets |
| 41 | + - fixed detection of gfortran and Cray CCE compilers |
| 42 | + - fixed detection of recent versions of the Intel Fortran compiler |
| 43 | + - fixed compilation with LLVM to no longer run out of AVX512 registers |
| 44 | + - fix cpu type option setting with recent NVIDIA HPC compiler versions |
| 45 | + - fixed compilation for/on AMD Ryzen 4 cpus |
| 46 | + - fixed compilation of AVX2-capable targets with Apple Clang |
| 47 | + - fixed runtime selection of COOPERLAKE in DYNAMIC_ARCH builds |
| 48 | + - worked around gcc/llvm using risky FMA operations in CSCAL/ZSCAL |
| 49 | + - worked around miscompilations of GEMV, SYMV and ZDOT kernels |
| 50 | + by gcc12's tree-vectorizer on OSX and Windows |
| 51 | + |
| 52 | +ARM: |
| 53 | + - fixed cross-compilation to ARMV5 and ARMV6 targets with CMAKE |
| 54 | + |
| 55 | +ARMV8: |
| 56 | + - fixed cross-compilation to CortexA53 with CMAKE |
| 57 | + - fixed compilation with CMAKE and "Arm Compiler for Linux 22.1" |
| 58 | + - added cpu autodetection for Cortex X3 and A715 |
| 59 | + - fixed conditional compilation of SVE-capable targets in DYNAMIC_ARCH |
| 60 | + - sped up SVE kernels by removing unnecessary prefetches |
| 61 | + - improved the GEMM performance of Neoverse V1 |
| 62 | + - added SVE kernels for SDOT and DDOT |
| 63 | + - added an SBGEMM kernel for Neoverse N2 |
| 64 | + - improved cpu-specific compiler option selection for Neoverse cpus |
| 65 | + - added support for setting CONSISTENT_FPCSR |
| 66 | + |
| 67 | +MIPS64: |
| 68 | + - improved MSA capability detection and handling |
| 69 | + - added a MIPS64_GENERIC build target |
| 70 | + - fixed corner cases in DNRM2 |
| 71 | + |
| 72 | +LOONGARCH64: |
| 73 | + - fixed handling of the INTERFACE64 option |
| 74 | + |
| 75 | +RISCV: |
| 76 | + - fixed handling of the INTERFACE64 option |
| 77 | + |
2 | 78 | ====================================================================
|
3 | 79 | Version 0.3.21
|
4 | 80 | 07-Aug-2022
|
|
0 commit comments