Releases: marekandreas/elpa
Releases · marekandreas/elpa
ELPA_2019.11.001_release
- solve a bug when using parallel make builds
- check the cpuid set during build time
- add experimental feature "heterogenous-cluster-support"
- add experimental feature for 64bit integer LAS/LAPACK/SCALAPACK support
- add experimental feature for 64bit integer MPI support
- support of ELPA for real valued skew-symmetric matrices, please cite:
https://arxiv.org/abs/1912.04062 - cleanup of the GPU version
- bugfix in the OpenMP version
- bugfix on the Power8/9 kernels
- bugfix on ARM aarch64 FMA kernels
ELPA_2019.05.002_release
- repacking of the src since the legacy interface has been forgotten in the
2019.05.001 release - elpa_print_kernels supports GPU usage
- fix an error if PAPI measurements are activated
- new simple real kernels: block4 and block6
- c functions can be build with optional arguments if compiler supports it
(configure option) - allow measurements with the likwid tool
- users can define the default-kernel at build time
- ELPA versioning number is provided in the C header files
- as announced a year ago, the following deprecated routines have been finally
removed; see DEPRECATED_FEATURES for the replacement routines , which have
been introduced a year ago. Removed routines:
-> mult_at_b_real
-> mult_ah_b_complex
-> invert_trm_real
-> invert_trm_complex
-> cholesky_real
-> cholesky_complex
-> solve_tridi - new kernels for ARM arch64 added
- fix an out-of-bound-error in elpa2
ELPA_2018.11.001_release
- improved autotuning
- improved performance of generalized problem via Cannon's algorithm
- check pointing functionality of elpa objects
- store/read/resume of autotuning
- Python interface for ELPA
- more ELPA functions have an optional error argument (Fortran) or required
error argument (C) => ABI and API change
ELPA_2018.05.001_release
- significant improved performance on K-computer
- added interface for the generalized eigenvalue problem
- extended autotuning functionality
ELPA_2017.11.001_release
- significant improvement of performance of GPU version
- added new compute kernels for IBM Power8 and Fujistu Sparc64
processors - a first implementation of autotuning capability
- correct some type statements in Fortran
- correct detection of PAPI in configure step
ELPA_2017.05.003_release
- remove bug in invert_triangular, which had been introduced
in ELPA 2017.05.002
ELPA_2017.05.002_release
Mainly bugfixes for ELPA 2017.05.001:
- fix memory leak of MPI communicators
- tests for hermitian_multiply, cholesky decomposition and
- deal with a problem on Debian (mawk)
ELPA_2017.05.001_release
- faster GPU implementation, especially for ELPA 1stage
- the restriction of the block-cyclic distribution blocksize = 128 in the GPU
case is relaxed - Faster CPU implementation due to better blocking
- support of already banded matrices (new API only!)
- improved KNL support
- add missing script "manual_cpp"
- cleanup of code
ELPA_2016.05.004_release
- fix a problem with the private state of module precision
- distribute test_project with dist tarball
- generic driver routine for ELPA 1stage and 2stage
- test case for elpa_mult_at_b_real
- test case for elpa_mult_ah_b_complex
- test case for elpa_cholesky_real
- test case for elpa_cholesky_complex
- test case for elpa_invert_trm_real
- test case for elpa_invert_trm_complex
- fix building of static library
- better choice of AVX, AVX2, AVX512 kernels
- make assumed size Fortran arrays default
ELPA_2016.05.003_release
- fix a problem with the build of SSE kernels
- make some (internal) functions public, such that they
can be used outside of ELPA - add documentation and interfaces for new public functions
- shorten file namses and directory names for test programs
in under to by pass "make agrument list too long" error