Skip to content

Releases: marekandreas/elpa

ELPA 2024.05.001 release

27 Aug 06:58

Choose a tag to compare

new_release_2024.05.001

Merge branch 'master_pre_stage'

ELPA 2023.11.001 release

18 Dec 11:10

Choose a tag to compare

  • enable gpu-streams per default for NVIDIA and AMD GPUs
  • Updated / improved documentation and man pages
  • Fixed compilation error on AMD GPUs
  • Fixed SVE 256 compute kernels
  • Allow (currently in parts of ELPA) to use NVIDIA NCCL for device to device
    commpunication
  • Speed up of GPU version of hermitian_multiply by up to an factor of 4
  • significantly faster full-to-tridiagonal step in ELPA 1stage GPU
  • significatnly faster ELPA 2stage solver on Intel GPUs
  • Consistent enabling/disabling of SKEW_SYMMETRIC in header files
  • new setup_gpu API function

ELPA 2023.05.001

19 Jun 06:25

Choose a tag to compare

  • added CITATION.cff file
  • allow test programs to be run with 1 MPI task
  • correct a memory leak in the gpu stream setup
  • better handling of GPU BLAS handles
  • implement the execution of the AMD HIP code path on NVIDIA GPUs
  • implement the execution of the SYCL GPU code path on CPUs (debugging)
  • port generalized routines to SYCL GPU
  • PoC to use NVIDIA NCCL instead of MPI (not production ready)
  • somewhat cleanup of documentation

ELPA 2023.05.001.rc1

22 May 09:45

Choose a tag to compare

  • added CITATION.cff file
  • allow test programs to be run with 1 MPI task
  • correct a memory leak in the gpu stream setup
  • better handling of GPU BLAS handles
  • implement the execution of the AMD HIP code path on NVIDIA GPUs
  • implement the execution of the SYCL GPU code path on CPUs (debugging)
  • port generalized routines to SYCL GPU
  • PoC to use NVIDIA NCCL instead of MPI (not production ready)
  • somewhat cleanup of documentation

ELPA 2022.11.001

24 Dec 12:55

Choose a tag to compare

Merge branch 'master_pre_stage' into 'master'

Master pre stage

See merge request elpa/elpa!128

ELPA_2022.05.001_release

08 Sep 10:59

Choose a tag to compare

  • implement OpenMP offloading to GPU for Intel GPU for ELPA 1 and 2 stage (
    except for "step tridi_to_band")
  • implement SYCL offloading to Intel GPUs for ELPA 1 and 2 stage
  • AMD GPU offload has been tested on Mi200 (also with MPI)
  • can use ELPA with one individual "gpu stream" per MPI task (Nvidia and AMD
    only)
  • allow steps "cholesky", "invert_trm", and "multiply_ab" to be called
    directly with GPU device pointers
  • on error ELPA returns rather than aborting to give controll to calling
    application and to allow for error recovery and/or graceful abort
  • allow ELPA to build with OpenMP and GPU
  • fix an FPE with the Intel compiler and AVX-512 instructions and optimization
    level > -O2
  • better checking of user defined options in configure

ELPA_2021.11.002_release

08 Sep 10:56

Choose a tag to compare

  • fix an error when choosing the Nvidia GPU kernel (fallback to CPU might have
    been selected)
  • support of Nvidia cusolver library to accelerate some routines (needs CUDA >= 11.4)
  • experimental Nvidia GPU versions for "elpa_invert_trm" and "elpa_cholesky"
    can be tested by setting elpa_set("gpu_invert_trm",1) and
    elpa_set("gpu_cholesky",1). Is not used otherwise
  • BUGFIX: error in resort_ev (also backported to 2021.05.002 and 2020.11.001)
  • allow to call ELPA eigenvectors and eigenvalues also with GPU device
    pointers for the input matrix, the vectors of eigenvalues and the output
    matrix for the eigenvectors
  • BUGFIX: error in resort_ev
  • EXPERIMENTAL feature:g new real GPU kernel for Nvidia A100 (provided by Nvidia): can show a
    performance boost if number of vectors per MPI task is > 20000. Most likely
    most benifit in non-MPI version
  • as anounced, droping the legacy interface
  • more autotuning features, for example using non blocking MPI collectives
  • new version of autotunig avoiding a combinatorial grow of possibilities
    (the old autotune version can be still used if
    elpa%autotune_set_api_version(API_VERSION, error) is set to API_VERSION <
    20211125)

ELPA_2021.05.002_release

08 Sep 11:05

Choose a tag to compare

  • no feature changes
  • correct the SO version which was wrong in ELPA 2021.05.001
  • allow the user to set the mapping of MPI tasks to GPU id per set/get
  • experimental feature: port to AMD GPUS, works correctly, performance yet
    unclear; only tested --with-mpi=0
  • On request, ELPA can print the pinning of MPI tasks and OpenMP thread
  • support for FUGAKU: some minor fix still have to be fixed due to compiler
    issues
  • BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT
    ALLOW a bandwidth = 1, since this would imply that the input matrix is
    already diagonal which the ELPA algorithms do not support
  • BUG FIX in internal test programs: do not consider a residual of 0.0 to be
    an error
  • support for skew-symmetric matrices now enabled by default
  • BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validate_real_double_generalized_1stage_random 90 90 45`
  • ELPA_SETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does
    not rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails
    then ELPA returns with an error
  • limit number of OpenMP threads to one, if MPI thread level is not at least MPI_THREAD_SERIALIZED
  • allow checking of the supported threading level of the MPI library at build time

ELPA_2020.11.001_release

17 Dec 08:51

Choose a tag to compare

  • this release containts mostly bugfixes:
  • fix determination whether a _ is needed to link Fortran to C
    
  • fix an error in the real block4 kernel for arch64 NEON
    
  • add missing test_scalapack_template.F90 to EXTRA_DIST list
    
  • fix error in the GPU kernel
    
  • do not use MPI_COMM_WORLD but mpi_parent instead
    
  • switch form python2 to python3
  • experimental feature: complex kernels for arch64 NEON
  • experimental feature: kernels for ARM SVE

ELPA_2020.05.001_release

08 Sep 11:02

Choose a tag to compare

  • Enable compilation with gcc v10
  • Fix a bug in elpa_multiply_a_b (GPU)
  • improved documentation, including fixing of typos and errors in markdown
  • Fix a bug in the calling of Cannons algorithm which might lead to crashes
    for a squared process grid
  • improvements and bugfixes of the ELPA2 stage GPU version, see
    https://arxiv.org/abs/2002.10991
  • bugfix for the build of AVX-512 KNL kernels
  • clean seperation of SIMD instructions for AVX and AVX2 kernels
  • better error checking for allocations / deallocations of CPU and GPU memory
  • experimental feature of matrix redistribution
  • bugfix in the cpuid tests
  • bugfix in elpa2_print_kernels
  • bugfix when configuring --with-gpu-support-only