Releases: marekandreas/elpa
Releases · marekandreas/elpa
ELPA 2024.05.001 release
new_release_2024.05.001 Merge branch 'master_pre_stage'
ELPA 2023.11.001 release
- enable gpu-streams per default for NVIDIA and AMD GPUs
- Updated / improved documentation and man pages
- Fixed compilation error on AMD GPUs
- Fixed SVE 256 compute kernels
- Allow (currently in parts of ELPA) to use NVIDIA NCCL for device to device
commpunication - Speed up of GPU version of hermitian_multiply by up to an factor of 4
- significantly faster full-to-tridiagonal step in ELPA 1stage GPU
- significatnly faster ELPA 2stage solver on Intel GPUs
- Consistent enabling/disabling of SKEW_SYMMETRIC in header files
- new setup_gpu API function
ELPA 2023.05.001
- added CITATION.cff file
- allow test programs to be run with 1 MPI task
- correct a memory leak in the gpu stream setup
- better handling of GPU BLAS handles
- implement the execution of the AMD HIP code path on NVIDIA GPUs
- implement the execution of the SYCL GPU code path on CPUs (debugging)
- port generalized routines to SYCL GPU
- PoC to use NVIDIA NCCL instead of MPI (not production ready)
- somewhat cleanup of documentation
ELPA 2023.05.001.rc1
- added CITATION.cff file
- allow test programs to be run with 1 MPI task
- correct a memory leak in the gpu stream setup
- better handling of GPU BLAS handles
- implement the execution of the AMD HIP code path on NVIDIA GPUs
- implement the execution of the SYCL GPU code path on CPUs (debugging)
- port generalized routines to SYCL GPU
- PoC to use NVIDIA NCCL instead of MPI (not production ready)
- somewhat cleanup of documentation
ELPA 2022.11.001
Merge branch 'master_pre_stage' into 'master' Master pre stage See merge request elpa/elpa!128
ELPA_2022.05.001_release
- implement OpenMP offloading to GPU for Intel GPU for ELPA 1 and 2 stage (
except for "step tridi_to_band") - implement SYCL offloading to Intel GPUs for ELPA 1 and 2 stage
- AMD GPU offload has been tested on Mi200 (also with MPI)
- can use ELPA with one individual "gpu stream" per MPI task (Nvidia and AMD
only) - allow steps "cholesky", "invert_trm", and "multiply_ab" to be called
directly with GPU device pointers - on error ELPA returns rather than aborting to give controll to calling
application and to allow for error recovery and/or graceful abort - allow ELPA to build with OpenMP and GPU
- fix an FPE with the Intel compiler and AVX-512 instructions and optimization
level > -O2 - better checking of user defined options in configure
ELPA_2021.11.002_release
- fix an error when choosing the Nvidia GPU kernel (fallback to CPU might have
been selected) - support of Nvidia cusolver library to accelerate some routines (needs CUDA >= 11.4)
- experimental Nvidia GPU versions for "elpa_invert_trm" and "elpa_cholesky"
can be tested by setting elpa_set("gpu_invert_trm",1) and
elpa_set("gpu_cholesky",1). Is not used otherwise - BUGFIX: error in resort_ev (also backported to 2021.05.002 and 2020.11.001)
- allow to call ELPA eigenvectors and eigenvalues also with GPU device
pointers for the input matrix, the vectors of eigenvalues and the output
matrix for the eigenvectors - BUGFIX: error in resort_ev
- EXPERIMENTAL feature:g new real GPU kernel for Nvidia A100 (provided by Nvidia): can show a
performance boost if number of vectors per MPI task is > 20000. Most likely
most benifit in non-MPI version - as anounced, droping the legacy interface
- more autotuning features, for example using non blocking MPI collectives
- new version of autotunig avoiding a combinatorial grow of possibilities
(the old autotune version can be still used if
elpa%autotune_set_api_version(API_VERSION, error) is set to API_VERSION <
20211125)
ELPA_2021.05.002_release
- no feature changes
- correct the SO version which was wrong in ELPA 2021.05.001
- allow the user to set the mapping of MPI tasks to GPU id per set/get
- experimental feature: port to AMD GPUS, works correctly, performance yet
unclear; only tested --with-mpi=0 - On request, ELPA can print the pinning of MPI tasks and OpenMP thread
- support for FUGAKU: some minor fix still have to be fixed due to compiler
issues - BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT
ALLOW a bandwidth = 1, since this would imply that the input matrix is
already diagonal which the ELPA algorithms do not support - BUG FIX in internal test programs: do not consider a residual of 0.0 to be
an error - support for skew-symmetric matrices now enabled by default
- BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validate_real_double_generalized_1stage_random 90 90 45`
- ELPA_SETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does
not rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails
then ELPA returns with an error - limit number of OpenMP threads to one, if MPI thread level is not at least MPI_THREAD_SERIALIZED
- allow checking of the supported threading level of the MPI library at build time
ELPA_2020.11.001_release
- this release containts mostly bugfixes:
-
fix determination whether a _ is needed to link Fortran to C
-
fix an error in the real block4 kernel for arch64 NEON
-
add missing test_scalapack_template.F90 to EXTRA_DIST list
-
fix error in the GPU kernel
-
do not use MPI_COMM_WORLD but mpi_parent instead
- switch form python2 to python3
- experimental feature: complex kernels for arch64 NEON
- experimental feature: kernels for ARM SVE
ELPA_2020.05.001_release
- Enable compilation with gcc v10
- Fix a bug in elpa_multiply_a_b (GPU)
- improved documentation, including fixing of typos and errors in markdown
- Fix a bug in the calling of Cannons algorithm which might lead to crashes
for a squared process grid - improvements and bugfixes of the ELPA2 stage GPU version, see
https://arxiv.org/abs/2002.10991 - bugfix for the build of AVX-512 KNL kernels
- clean seperation of SIMD instructions for AVX and AVX2 kernels
- better error checking for allocations / deallocations of CPU and GPU memory
- experimental feature of matrix redistribution
- bugfix in the cpuid tests
- bugfix in elpa2_print_kernels
- bugfix when configuring --with-gpu-support-only