Releases · marekandreas/elpa

27 Aug 06:58

new_release_2024.05.001

d4996b9

ELPA 2024.05.001 release Latest

Latest

new_release_2024.05.001

Merge branch 'master_pre_stage'

Assets 2

18 Dec 11:10

marekandreas

new_release_2023.11.001

ddd2a33

ELPA 2023.11.001 release

enable gpu-streams per default for NVIDIA and AMD GPUs
Updated / improved documentation and man pages
Fixed compilation error on AMD GPUs
Fixed SVE 256 compute kernels
Allow (currently in parts of ELPA) to use NVIDIA NCCL for device to device
commpunication
Speed up of GPU version of hermitian_multiply by up to an factor of 4
significantly faster full-to-tridiagonal step in ELPA 1stage GPU
significatnly faster ELPA 2stage solver on Intel GPUs
Consistent enabling/disabling of SKEW_SYMMETRIC in header files
new setup_gpu API function

Assets 2

19 Jun 06:25

marekandreas

new_release_2023.05.001

c394aed

ELPA 2023.05.001

added CITATION.cff file
allow test programs to be run with 1 MPI task
correct a memory leak in the gpu stream setup
better handling of GPU BLAS handles
implement the execution of the AMD HIP code path on NVIDIA GPUs
implement the execution of the SYCL GPU code path on CPUs (debugging)
port generalized routines to SYCL GPU
PoC to use NVIDIA NCCL instead of MPI (not production ready)
somewhat cleanup of documentation

Assets 2

22 May 09:45

marekandreas

new_release_2023.05.001.rc1

7855518

ELPA 2023.05.001.rc1

added CITATION.cff file
allow test programs to be run with 1 MPI task
correct a memory leak in the gpu stream setup
better handling of GPU BLAS handles
implement the execution of the AMD HIP code path on NVIDIA GPUs
implement the execution of the SYCL GPU code path on CPUs (debugging)
port generalized routines to SYCL GPU
PoC to use NVIDIA NCCL instead of MPI (not production ready)
somewhat cleanup of documentation

Assets 2

24 Dec 12:55

marekandreas

new_release_2022.11.001

bea1f0f

ELPA 2022.11.001

Merge branch 'master_pre_stage' into 'master'

Master pre stage

See merge request elpa/elpa!128

Assets 2

08 Sep 10:59

marekandreas

new_release_2022.05.001

df470e0

ELPA_2022.05.001_release

implement OpenMP offloading to GPU for Intel GPU for ELPA 1 and 2 stage (
except for "step tridi_to_band")
implement SYCL offloading to Intel GPUs for ELPA 1 and 2 stage
AMD GPU offload has been tested on Mi200 (also with MPI)
can use ELPA with one individual "gpu stream" per MPI task (Nvidia and AMD
only)
allow steps "cholesky", "invert_trm", and "multiply_ab" to be called
directly with GPU device pointers
on error ELPA returns rather than aborting to give controll to calling
application and to allow for error recovery and/or graceful abort
allow ELPA to build with OpenMP and GPU
fix an FPE with the Intel compiler and AVX-512 instructions and optimization
level > -O2
better checking of user defined options in configure

Assets 2

08 Sep 10:56

marekandreas

new_release_2021.11.002

fc9e3b5

ELPA_2021.11.002_release

fix an error when choosing the Nvidia GPU kernel (fallback to CPU might have
been selected)
support of Nvidia cusolver library to accelerate some routines (needs CUDA >= 11.4)
experimental Nvidia GPU versions for "elpa_invert_trm" and "elpa_cholesky"
can be tested by setting elpa_set("gpu_invert_trm",1) and
elpa_set("gpu_cholesky",1). Is not used otherwise
BUGFIX: error in resort_ev (also backported to 2021.05.002 and 2020.11.001)
allow to call ELPA eigenvectors and eigenvalues also with GPU device
pointers for the input matrix, the vectors of eigenvalues and the output
matrix for the eigenvectors
BUGFIX: error in resort_ev
EXPERIMENTAL feature:g new real GPU kernel for Nvidia A100 (provided by Nvidia): can show a
performance boost if number of vectors per MPI task is > 20000. Most likely
most benifit in non-MPI version
as anounced, droping the legacy interface
more autotuning features, for example using non blocking MPI collectives
new version of autotunig avoiding a combinatorial grow of possibilities
(the old autotune version can be still used if
elpa%autotune_set_api_version(API_VERSION, error) is set to API_VERSION <
20211125)

Assets 2

08 Sep 11:05

marekandreas

new_release_2021.05.002

3c58e20

ELPA_2021.05.002_release

no feature changes
correct the SO version which was wrong in ELPA 2021.05.001
allow the user to set the mapping of MPI tasks to GPU id per set/get
experimental feature: port to AMD GPUS, works correctly, performance yet
unclear; only tested --with-mpi=0
On request, ELPA can print the pinning of MPI tasks and OpenMP thread
support for FUGAKU: some minor fix still have to be fixed due to compiler
issues
BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT
ALLOW a bandwidth = 1, since this would imply that the input matrix is
already diagonal which the ELPA algorithms do not support
BUG FIX in internal test programs: do not consider a residual of 0.0 to be
an error
support for skew-symmetric matrices now enabled by default
BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validate_real_double_generalized_1stage_random 90 90 45`
ELPA_SETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does
not rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails
then ELPA returns with an error
limit number of OpenMP threads to one, if MPI thread level is not at least MPI_THREAD_SERIALIZED
allow checking of the supported threading level of the MPI library at build time

Assets 2

17 Dec 08:51

marekandreas

new_release_2020.11.001

e676706

ELPA_2020.11.001_release

this release containts mostly bugfixes:

fix determination whether a _ is needed to link Fortran to C

fix an error in the real block4 kernel for arch64 NEON

add missing test_scalapack_template.F90 to EXTRA_DIST list

```
fix error in the GPU kernel
```

do not use MPI_COMM_WORLD but mpi_parent instead

switch form python2 to python3
experimental feature: complex kernels for arch64 NEON
experimental feature: kernels for ARM SVE

Assets 2

08 Sep 11:02

marekandreas

new_release_2020.05.001

907b033

ELPA_2020.05.001_release

Enable compilation with gcc v10
Fix a bug in elpa_multiply_a_b (GPU)
improved documentation, including fixing of typos and errors in markdown
Fix a bug in the calling of Cannons algorithm which might lead to crashes
for a squared process grid
improvements and bugfixes of the ELPA2 stage GPU version, see
https://arxiv.org/abs/2002.10991
bugfix for the build of AVX-512 KNL kernels
clean seperation of SIMD instructions for AVX and AVX2 kernels
better error checking for allocations / deallocations of CPU and GPU memory
experimental feature of matrix redistribution
bugfix in the cpuid tests
bugfix in elpa2_print_kernels
bugfix when configuring --with-gpu-support-only

Assets 2

Releases: marekandreas/elpa

ELPA 2024.05.001 release

Uh oh!

ELPA 2023.11.001 release

Uh oh!

ELPA 2023.05.001

Uh oh!

ELPA 2023.05.001.rc1

Uh oh!

ELPA 2022.11.001

Uh oh!

ELPA_2022.05.001_release

Uh oh!

ELPA_2021.11.002_release

Uh oh!

ELPA_2021.05.002_release

Uh oh!

ELPA_2020.11.001_release

Uh oh!

ELPA_2020.05.001_release

Uh oh!