Skip to content

Commit 9763f87

Browse files
authored
Update Changelog with changes from 0.3.6
1 parent 9c4edd3 commit 9763f87

File tree

1 file changed

+78
-0
lines changed

1 file changed

+78
-0
lines changed

Changelog.txt

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,82 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.6
4+
29-Apr-2019
5+
6+
common:
7+
* the build tools now check that a given cpu TARGET is actually valid
8+
* the build-time check of system features (c_check) has been made
9+
less dependent on particular perl features (this should mainly
10+
benefit building on Windows)
11+
* several problem with the ReLAPACK integration were fixed,
12+
including INTERFACE64 support and building a shared library
13+
* building with CMAKE on BSD systems was improved
14+
* a non-absolute SUM function was added based on the
15+
existing optimized code for ASUM
16+
* CBLAS interfaces to the IxMIN and IxMAX functions were added
17+
* a name clash between LAPACKE and BOOST headers was resolved
18+
* CMAKE builds with OpenMP failed to include the appropriate getrf_parallel
19+
kernels
20+
* a crash on thread (key) deletion with the USE_TLS=1 memory management
21+
option was fixed
22+
* restored several earlier fixes, in particular for OpenMP performance,
23+
building on BSD, and calling fork on CYGWIN, which had inadvertently
24+
been dropped in the 0.3.3 rewrite of the memory management code.
25+
26+
x86_64:
27+
* the AVX512 DGEMM kernel has been disabled again due to unsolved problems
28+
* building with old versions of MSVC was fixed
29+
* it is now possible to build a static library on Windows with CMAKE
30+
* accessing environment variables on CYGWIN at run time was fixed
31+
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
32+
* Intel "Denverton" atom and Hygon "Dhyana" zen CPUs are now autodetected
33+
* building for DYNAMIC_ARCH with a DYNAMIC_LIST of targets is now supported
34+
with CMAKE as well
35+
* building for DYNAMIC_ARCH with GENERIC as the default target is now supported
36+
* a buffer overflow in the SSE GEMM kernel for Intel Nano targets was fixed
37+
* assembly bugs involving undeclared modification of input operands were fixed
38+
in the AXPY, DOT, GEMV, GER, SCAL, SYMV and TRSM microkernels for Nehalem,
39+
Sandybridge, Haswell, Bulldozer and Piledriver. These would typically cause
40+
test failures or segfaults when compiled with recent versions of gcc from 8 onward.
41+
* a similar bug was fixed in the blas_quickdivide code used to split workloads
42+
in most functions
43+
* a bug in the IxMIN implementation for the GENERIC target made it return the result of IxMAX
44+
* fixed building on SkylakeX systems when either the compiler or the (emulated) operating
45+
environment does not support AVX512
46+
* improved GEMM performance on ZEN targets
47+
48+
x86:
49+
* build failures caused by the recently added checks for AVX512 were fixed
50+
* an inline assembly bug involving undeclared modification of an input argument was
51+
fixed in the blas_quickdivide code used to split workloads in most functions
52+
* a bug in the IMIN implementation for the GENERIC target made it return the result of IMAX
53+
54+
MIPS32:
55+
* a bug in the IMIN implementation made it return the result of IMAX
56+
57+
POWER:
58+
* single precision BLAS1/2 functions have received optimized POWER8 kernels
59+
* POWER9 is now a separate target, with an optimized DGEMM/DTRMM kernel
60+
* building on PPC970 systems under OSX Leopard or Tiger is now supported
61+
* out-of-bounds memory accesses in the gemm_beta microkernels were fixed
62+
* building a shared library on AIX is now supported for POWER6
63+
* DYNAMIC_ARCH support has been added for POWER6 and newer
64+
65+
ARMv7:
66+
* corrected xDOT behaviour with zero INC_X or INC_Y
67+
* a bug in the IMIN implementation made it return the result of IMAX
68+
69+
ARMv8:
70+
* added support for HiSilicon TSV110 cpus
71+
* the CMAKE build system now recognizes 32bit userspace on 64bit hardware
72+
* cross-compilation with CMAKE now works again
73+
* a bug in the IMIN implementation made it return the result of IMAX
74+
* ARMV8 builds with the BINARY=32 option are now automatically handled as ARMV7
75+
76+
IBM Z:
77+
* optimized microkernels for single precicion BLAS1/2 functions have been added
78+
for both Z13 and Z14
79+
280
====================================================================
381
Version 0.3.5
482
31-Dec-2018

0 commit comments

Comments
 (0)