|
1 | 1 | OpenBLAS ChangeLog
|
| 2 | +==================================================================== |
| 3 | +Version 0.3.6 |
| 4 | +29-Apr-2019 |
| 5 | + |
| 6 | +common: |
| 7 | + * the build tools now check that a given cpu TARGET is actually valid |
| 8 | + * the build-time check of system features (c_check) has been made |
| 9 | + less dependent on particular perl features (this should mainly |
| 10 | + benefit building on Windows) |
| 11 | + * several problem with the ReLAPACK integration were fixed, |
| 12 | + including INTERFACE64 support and building a shared library |
| 13 | + * building with CMAKE on BSD systems was improved |
| 14 | + * a non-absolute SUM function was added based on the |
| 15 | + existing optimized code for ASUM |
| 16 | + * CBLAS interfaces to the IxMIN and IxMAX functions were added |
| 17 | + * a name clash between LAPACKE and BOOST headers was resolved |
| 18 | + * CMAKE builds with OpenMP failed to include the appropriate getrf_parallel |
| 19 | + kernels |
| 20 | + * a crash on thread (key) deletion with the USE_TLS=1 memory management |
| 21 | + option was fixed |
| 22 | + * restored several earlier fixes, in particular for OpenMP performance, |
| 23 | + building on BSD, and calling fork on CYGWIN, which had inadvertently |
| 24 | + been dropped in the 0.3.3 rewrite of the memory management code. |
| 25 | + |
| 26 | +x86_64: |
| 27 | + * the AVX512 DGEMM kernel has been disabled again due to unsolved problems |
| 28 | + * building with old versions of MSVC was fixed |
| 29 | + * it is now possible to build a static library on Windows with CMAKE |
| 30 | + * accessing environment variables on CYGWIN at run time was fixed |
| 31 | + * the CMAKE build system now recognizes 32bit userspace on 64bit hardware |
| 32 | + * Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected |
| 33 | + * building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported |
| 34 | + with CMAKE as well |
| 35 | + * building for DYNAMIC_ARCH with GENERIC as the default target is now supported |
| 36 | + * a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed |
| 37 | + * assembly bugs involving undeclared modification of input operands were fixed |
| 38 | + in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem, |
| 39 | + Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause |
| 40 | + test failures or segfaults when compiled with recent versions of gcc from 8 onward. |
| 41 | + * a similar bug was fixed in the blas_quickdivide code used to split workloads |
| 42 | + in most functions |
| 43 | + * a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX |
| 44 | + * fixed building on SkylakeX systems when either the compiler or the (emulated) operating |
| 45 | + environment does not support AVX512 |
| 46 | + * improved GEMM performance on ZEN targets |
| 47 | + |
| 48 | +x86: |
| 49 | + * build failures caused by the recently added checks for AVX512 were fixed |
| 50 | + * an inline assembly bug involving undeclared modification of an input argument was |
| 51 | + fixed in the blas_quickdivide code used to split workloads in most functions |
| 52 | + * a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX |
| 53 | + |
| 54 | +MIPS32: |
| 55 | + * a bug in the IMIN implementation made it return the result of IMAX |
| 56 | + |
| 57 | +POWER: |
| 58 | + * single precision BLAS1/2 functions have received optimized POWER8 kernels |
| 59 | + * POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel |
| 60 | + * building on PPC970 systems under OSX Leopard or Tiger is now supported |
| 61 | + * out-of-bounds memory accesses in the gemm_beta microkernels were fixed |
| 62 | + * building a shared library on AIX is now supported for POWER6 |
| 63 | + * DYNAMIC_ARCH support has been added for POWER6 and newer |
| 64 | + |
| 65 | +ARMv7: |
| 66 | + * corrected xDOT behaviour with zero INC_X or INC_Y |
| 67 | + * a bug in the IMIN implementation made it return the result of IMAX |
| 68 | + |
| 69 | +ARMv8: |
| 70 | + * added support for HiSilicon TSV110 cpus |
| 71 | + * the CMAKE build system now recognizes 32bit userspace on 64bit hardware |
| 72 | + * cross-compilation with CMAKE now works again |
| 73 | + * a bug in the IMIN implementation made it return the result of IMAX |
| 74 | + * ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7 |
| 75 | + |
| 76 | +IBM Z: |
| 77 | + * optimized microkernels for single precicion BLAS1/2 functions have been added |
| 78 | + for both Z13 and Z14 |
| 79 | + |
2 | 80 | ====================================================================
|
3 | 81 | Version 0.3.5
|
4 | 82 | 31-Dec-2018
|
|
0 commit comments