Merge pull request #16 from codeplaysoftware/new-research-papers

scottstraughan · web-flow · commit 3768f63c2fda · 2024-08-13T12:11:40.000+01:00
New Research Papers &amp; More
diff --git a/content/news/2024/2024-07-30-oneapi-construction-kit-4.0-brings-risc-v-host-cpu-support.md b/content/news/2024/2024-07-30-oneapi-construction-kit-4.0-brings-risc-v-host-cpu-support.md
@@ -0,0 +1,16 @@
+---
+contributor: max
+date: '2024-07-30T14:10:22.153253'
+external_url: https://www.phoronix.com/news/oneAPI-Construction-Kit-4.0
+title: 'oneAPI Construction Kit 4.0 Brings RISC-V Host CPU Support'
+image: ../../../static/images/news/2024-07-30-oneapi-construction-kit.webp
+tags:
+  - oneapi
+  - sycl
+  - hpc
+  - portability
+---
+
+Last year the oneAPI Construction Kit was introduced by Intel-owned Codeplay Software for bringing SYCL to new hardware
+even for hardware outside of Intel's offerings. One of the early targets of this oneAPI Construction Kit support was for
+RISC-V processors and now with today's release of oneAPI Construction Kit 4.0 there is finally RISC-V host CPU support.
diff --git a/content/research_papers/2023/2023-10-01-open-sycl-on-heterogeneous-gpu-systems-a-case-of-study.md b/content/research_papers/2023/2023-10-01-open-sycl-on-heterogeneous-gpu-systems-a-case-of-study.md
@@ -0,0 +1,34 @@
+---
+contributor: max
+date: '2023-10-01T08:08:10.490000'
+title: 'Open SYCL on heterogeneous GPU systems: A case of study'
+external_url: https://arxiv.org/ftp/arxiv/papers/2310/2310.06947.pdf
+authors:
+  - name: Rocío Carratalá-Sáez
+  - name: Francisco J. Andújar
+  - name: Yuri Torres
+  - name: Arturo Gonzalez-Escribano
+  - name: Diego R. Llanos
+tags:
+  - gpu
+  - cuda
+  - hpc
+  - hip
+---
+
+Computational platforms for high-performance scientific applications are becoming more heterogeneous, including hardware
+accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful
+management of the computational resources of this type of hardware to obtain the best possible performance. However,
+there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or
+machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become
+cumbersome and compromise portability to other systems. To overcome this problem, several proposals for high-level
+heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance
+portability, specifically when using GPU hardware accelerators. This paper evaluates the SYCL programming model, using
+the Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple
+GPU devices from the same or different vendors, and the development effort required to implement the code. We use as
+case of study the Finite Time Lyapunov Exponent calculation over two real-world scenarios and compare the performance
+and the development effort of its Open SYCL-based version against the equivalent versions that use CUDA or HIP. Based on
+the experimental results, we observe that the use of SYCL does not lead to a remarkable overhead in terms of the GPU
+kernels execution time. In general terms, the Open SYCL development effort for the host code is lower than that observed
+with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously much
+easier than directly using the vendor-specific programming solutions.
diff --git a/content/research_papers/2023/2023-10-24-a-performance-portable-sycl-implementation-of-crk-hacc-for-exascale.md b/content/research_papers/2023/2023-10-24-a-performance-portable-sycl-implementation-of-crk-hacc-for-exascale.md
@@ -0,0 +1,31 @@
+---
+contributor: max
+date: '2023-10-24T08:08:10.490000'
+title: 'A Performance-Portable SYCL Implementation of CRK-HACC for Exascale'
+external_url: https://arxiv.org/pdf/2310.16122
+authors:
+  - name: Esteban M. Rangel
+  - name: S. John Pennycook
+  - name: Adrian Pope
+  - name: Nicholas Frontiere
+  - name: Zhiqiang Ma
+  - name: Varsha Madananth
+tags:
+  - sycl
+  - hpc
+  - cuda
+  - hip
+  - heterogeneous-programming
+---
+
+The first generation of exascale systems will include a variety of machine architectures, featuring GPUs from multiple
+vendors. As a result, many developers are interested in adopting portable programming models to avoid maintaining
+multiple versions of their code. It is necessary to document experiences with such programming models to assist
+developers in understanding the advantages and disadvantages of different approaches.
+
+To this end, this paper evaluates the performance portability of a SYCL implementation of a large-scale cosmology
+application (CRK-HACC) running on GPUs from three different vendors: AMD, Intel, and NVIDIA. We detail the process of
+migrating the original code from CUDA to SYCL and show that specializing kernels for specific targets can greatly
+improve performance portability without significantly impacting programmer productivity. The SYCL version of CRK-HACC
+achieves a performance portability of 0.96 with a code divergence of almost 0, demonstrating that SYCL is a viable
+programming model for performance-portable applications.
diff --git a/content/research_papers/2024/2024-01-05-preliminary-report-initial-evaluation-of-stdpar-implementations-on-amd-gpus-for-hpc.md b/content/research_papers/2024/2024-01-05-preliminary-report-initial-evaluation-of-stdpar-implementations-on-amd-gpus-for-hpc.md
@@ -0,0 +1,28 @@
+---
+contributor: max
+date: '2024-01-05T08:08:10.490000'
+title: 'Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC'
+external_url: https://arxiv.org/pdf/2401.02680
+authors:
+  - name: Wei-Chen Lin
+  - name: Simon McIntosh-Smith
+  - name: Tom Deakin
+tags:
+  - sycl
+  - gpu
+  - hip
+---
+
+Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work
+highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we
+acknowledged AMD’s past effort such as HCC, which unfortunately is deprecated and does not support newer hardware
+platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled
+multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of
+currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with
+ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all
+available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust,
+Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation.
+Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very
+few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side
+userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our
+solution using the same set of benchmarks.
diff --git a/content/research_papers/2024/2024-01-24-lessons-learned-migrating-cuda-to-sycl-a-hep-case-study-with-root-rdataframe.md b/content/research_papers/2024/2024-01-24-lessons-learned-migrating-cuda-to-sycl-a-hep-case-study-with-root-rdataframe.md
@@ -0,0 +1,27 @@
+---
+contributor: max
+date: '2024-01-24T08:08:10.490000'
+title: 'Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame'
+external_url: https://arxiv.org/pdf/2401.13310
+authors:
+  - name: Jolly Chen
+  - name: Monica Dessole
+  - name: Ana Lucia Varbanescu
+tags:
+  - sycl
+  - gpu
+  - cuda
+  - heterogeneous-programming
+---
+
+The world’s largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed
+efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework,
+developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism.
+Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take
+advantage of the available resources. SYCL allows for a single-source implementation, which enables support for
+different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on
+a core high energy physics operation in RDataFrame – histogramming. We detail the challenges that we faced when
+integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance
+analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the
+performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide
+actionable insights for developers of SYCL applications.
diff --git a/content/research_papers/2024/2024-03-09-xfluids-a-sycl-based-unified-cross-architecture-heterogeneous-simulation-solver-for-compressible-reacting-flows.md b/content/research_papers/2024/2024-03-09-xfluids-a-sycl-based-unified-cross-architecture-heterogeneous-simulation-solver-for-compressible-reacting-flows.md
@@ -0,0 +1,32 @@
+---
+contributor: max
+date: '2024-03-09T08:08:10.490000'
+title: 'XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows'
+external_url: https://arxiv.org/abs/2403.05910
+authors:
+  - name: Jinlong Li
+  - name: Shucheng Pan
+tags:
+  - sycl
+  - hpc
+  - cuda
+  - hip
+  - heterogeneous-programming
+---
+
+We present a cross-architecture high-order heterogeneous Navier-Stokes simulation solver, XFluids, for compressible
+reacting multi-component flows on different platforms. The multi-component reacting flows are ubiquitous in many
+scientific and engineering applications, while their numerical simulations are usually time-consuming to capture the
+underlying multiscale features. Although heterogeneous accelerated computing is significantly beneficial for large-scale
+simulations of these flows, effective utilization of various heterogeneous accelerators with different architectures and
+programming models in the market remains a challenge. To address this, we develop XFluids by SYCL, to perform
+acceleration directly targeted to different devices, without translating any source code. A variety of optimization
+techniques have been proposed to increase the computational performance of XFluids, including adaptive range assignment,
+partial eigen-system reconstruction, hotspot device function optimizations, etc. This solver has been open-sourced, and
+tested on multiple GPUs from different mainstream vendors, indicating high portability. Through various benchmark cases,
+the accuracy of XFluids is demonstrated, with approximately no efficiency loss compared to existing GPU programming
+models, such as CUDA and HIP. In addition, the MPI library is used to extend the solver to multi-GPU platforms, with the
+GPU-enabled MPI supported. With this, the weak scaling of XFluids for multi-GPU devices is larger than 95% for 1024
+GPUs. Finally, we simulate both the inert and reactive multi-component shock-bubble interaction problems with
+high-resolution meshes, to investigate the reacting effects on the mixing, vortex stretching, and shape deformation of
+the bubble evolution.
diff --git a/content/research_papers/2024/2024-04-16-enabling-performance-portability-on-the-ligen-drug-discovery-pipeline.md b/content/research_papers/2024/2024-04-16-enabling-performance-portability-on-the-ligen-drug-discovery-pipeline.md
@@ -0,0 +1,38 @@
+---
+contributor: max
+date: '2024-04-16T08:08:10.490000'
+title: 'Enabling performance portability on the LiGen drug discovery pipeline'
+external_url: https://www.sciencedirect.com/science/article/pii/S0167739X24001195
+authors:
+  - name: Luigi Crisci
+  - name: Lorenzo Carpentieri
+  - name: Biagio Cosenza
+  - name: Leon Bogdanović
+  - name: Gianmarco Accordi
+  - name: Davide Gadioli
+  - name: Emanuele Vitali
+  - name: Gianluca Palermo
+  - name: Andrea Rosario Beccari
+tags:
+  - gpu
+  - sycl
+  - hpc
+  - cuda
+  - hip
+---
+
+In recent years, there has been a growing interest in developing high-performance implementations of drug discovery
+processing software. To target modern GPU architectures, such applications are mostly written in proprietary languages
+such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of
+accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery
+pipelines on multiple large-scale computing systems, with the ultimate goal of working on urgent computing scenarios.
+This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, from CUDA to
+the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a
+structured analysis of the performance portability of the SYCL LiGen platform, focusing on different aspects of the
+approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics
+of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects
+such as kernel occupancy and register pressure affect the performance portability of the overall application. The
+experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization
+patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both
+known and ad hoc metrics. The results show that, thanks to the combination of high-level SYCL semantics and some manual
+tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs.
diff --git a/content/videos/2024/2024-07-05-bring-your-code-to-riscv-accelerators-with-sycl.md b/content/videos/2024/2024-07-05-bring-your-code-to-riscv-accelerators-with-sycl.md
@@ -0,0 +1,18 @@
+---
+contributor: max
+date: '2024-07-05T14:16:16.441310'
+title: 'Bring your code to RISC-V accelerators with SYCL'
+external_url: https://www.youtube.com/watch?v=Jw_nEtYi2-k
+type: presentation
+tags:
+  - hpc
+  - risc-v
+---
+
+This talk will show attendees how to overcome proprietary code with RISC-V and SYCL. They will learn how they can
+achieve code portability and adopt RISC-V hardware without losing their existing work, for greater productivity.
+
+The talk will also highlight the ongoing research into pioneering applications for RISC-V, funded by the EU Horizon
+programme. AERO and SYCLOPS are two such projects. AERO seeks to enable the future heterogeneous EU cloud
+infrastructure, while SYCLOPS will bring together the RISC-V and SYCL standards together into a single software stack
+for the first time.
diff --git a/static/images/news/2024-07-30-oneapi-construction-kit.webp b/static/images/news/2024-07-30-oneapi-construction-kit.webp