diff --git a/content/news/2024/2024-07-30-oneapi-construction-kit-4.0-brings-risc-v-host-cpu-support.md b/content/news/2024/2024-07-30-oneapi-construction-kit-4.0-brings-risc-v-host-cpu-support.md new file mode 100644 index 0000000..b74a09f --- /dev/null +++ b/content/news/2024/2024-07-30-oneapi-construction-kit-4.0-brings-risc-v-host-cpu-support.md @@ -0,0 +1,16 @@ +--- +contributor: max +date: '2024-07-30T14:10:22.153253' +external_url: https://www.phoronix.com/news/oneAPI-Construction-Kit-4.0 +title: 'oneAPI Construction Kit 4.0 Brings RISC-V Host CPU Support' +image: ../../../static/images/news/2024-07-30-oneapi-construction-kit.webp +tags: + - oneapi + - sycl + - hpc + - portability +--- + +Last year the oneAPI Construction Kit was introduced by Intel-owned Codeplay Software for bringing SYCL to new hardware +even for hardware outside of Intel's offerings. One of the early targets of this oneAPI Construction Kit support was for +RISC-V processors and now with today's release of oneAPI Construction Kit 4.0 there is finally RISC-V host CPU support. diff --git a/content/research_papers/2023/2023-10-01-open-sycl-on-heterogeneous-gpu-systems-a-case-of-study.md b/content/research_papers/2023/2023-10-01-open-sycl-on-heterogeneous-gpu-systems-a-case-of-study.md new file mode 100644 index 0000000..b7f0c04 --- /dev/null +++ b/content/research_papers/2023/2023-10-01-open-sycl-on-heterogeneous-gpu-systems-a-case-of-study.md @@ -0,0 +1,34 @@ +--- +contributor: max +date: '2023-10-01T08:08:10.490000' +title: 'Open SYCL on heterogeneous GPU systems: A case of study' +external_url: https://arxiv.org/ftp/arxiv/papers/2310/2310.06947.pdf +authors: + - name: Rocío Carratalá-Sáez + - name: Francisco J. Andújar + - name: Yuri Torres + - name: Arturo Gonzalez-Escribano + - name: Diego R. Llanos +tags: + - gpu + - cuda + - hpc + - hip +--- + +Computational platforms for high-performance scientific applications are becoming more heterogeneous, including hardware +accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful +management of the computational resources of this type of hardware to obtain the best possible performance. However, +there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or +machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become +cumbersome and compromise portability to other systems. To overcome this problem, several proposals for high-level +heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance +portability, specifically when using GPU hardware accelerators. This paper evaluates the SYCL programming model, using +the Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple +GPU devices from the same or different vendors, and the development effort required to implement the code. We use as +case of study the Finite Time Lyapunov Exponent calculation over two real-world scenarios and compare the performance +and the development effort of its Open SYCL-based version against the equivalent versions that use CUDA or HIP. Based on +the experimental results, we observe that the use of SYCL does not lead to a remarkable overhead in terms of the GPU +kernels execution time. In general terms, the Open SYCL development effort for the host code is lower than that observed +with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously much +easier than directly using the vendor-specific programming solutions. diff --git a/content/research_papers/2023/2023-10-24-a-performance-portable-sycl-implementation-of-crk-hacc-for-exascale.md b/content/research_papers/2023/2023-10-24-a-performance-portable-sycl-implementation-of-crk-hacc-for-exascale.md new file mode 100644 index 0000000..749c39b --- /dev/null +++ b/content/research_papers/2023/2023-10-24-a-performance-portable-sycl-implementation-of-crk-hacc-for-exascale.md @@ -0,0 +1,31 @@ +--- +contributor: max +date: '2023-10-24T08:08:10.490000' +title: 'A Performance-Portable SYCL Implementation of CRK-HACC for Exascale' +external_url: https://arxiv.org/pdf/2310.16122 +authors: + - name: Esteban M. Rangel + - name: S. John Pennycook + - name: Adrian Pope + - name: Nicholas Frontiere + - name: Zhiqiang Ma + - name: Varsha Madananth +tags: + - sycl + - hpc + - cuda + - hip + - heterogeneous-programming +--- + +The first generation of exascale systems will include a variety of machine architectures, featuring GPUs from multiple +vendors. As a result, many developers are interested in adopting portable programming models to avoid maintaining +multiple versions of their code. It is necessary to document experiences with such programming models to assist +developers in understanding the advantages and disadvantages of different approaches. + +To this end, this paper evaluates the performance portability of a SYCL implementation of a large-scale cosmology +application (CRK-HACC) running on GPUs from three different vendors: AMD, Intel, and NVIDIA. We detail the process of +migrating the original code from CUDA to SYCL and show that specializing kernels for specific targets can greatly +improve performance portability without significantly impacting programmer productivity. The SYCL version of CRK-HACC +achieves a performance portability of 0.96 with a code divergence of almost 0, demonstrating that SYCL is a viable +programming model for performance-portable applications. diff --git a/content/research_papers/2024/2024-01-05-preliminary-report-initial-evaluation-of-stdpar-implementations-on-amd-gpus-for-hpc.md b/content/research_papers/2024/2024-01-05-preliminary-report-initial-evaluation-of-stdpar-implementations-on-amd-gpus-for-hpc.md new file mode 100644 index 0000000..5955b9d --- /dev/null +++ b/content/research_papers/2024/2024-01-05-preliminary-report-initial-evaluation-of-stdpar-implementations-on-amd-gpus-for-hpc.md @@ -0,0 +1,28 @@ +--- +contributor: max +date: '2024-01-05T08:08:10.490000' +title: 'Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC' +external_url: https://arxiv.org/pdf/2401.02680 +authors: + - name: Wei-Chen Lin + - name: Simon McIntosh-Smith + - name: Tom Deakin +tags: + - sycl + - gpu + - hip +--- + +Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work +highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we +acknowledged AMD’s past effort such as HCC, which unfortunately is deprecated and does not support newer hardware +platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled +multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of +currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with +ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all +available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust, +Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation. +Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very +few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side +userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our +solution using the same set of benchmarks. diff --git a/content/research_papers/2024/2024-01-24-lessons-learned-migrating-cuda-to-sycl-a-hep-case-study-with-root-rdataframe.md b/content/research_papers/2024/2024-01-24-lessons-learned-migrating-cuda-to-sycl-a-hep-case-study-with-root-rdataframe.md new file mode 100644 index 0000000..c962ce8 --- /dev/null +++ b/content/research_papers/2024/2024-01-24-lessons-learned-migrating-cuda-to-sycl-a-hep-case-study-with-root-rdataframe.md @@ -0,0 +1,27 @@ +--- +contributor: max +date: '2024-01-24T08:08:10.490000' +title: 'Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame' +external_url: https://arxiv.org/pdf/2401.13310 +authors: + - name: Jolly Chen + - name: Monica Dessole + - name: Ana Lucia Varbanescu +tags: + - sycl + - gpu + - cuda + - heterogeneous-programming +--- + +The world’s largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed +efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework, +developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism. +Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take +advantage of the available resources. SYCL allows for a single-source implementation, which enables support for +different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on +a core high energy physics operation in RDataFrame – histogramming. We detail the challenges that we faced when +integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance +analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the +performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide +actionable insights for developers of SYCL applications. diff --git a/content/research_papers/2024/2024-03-09-xfluids-a-sycl-based-unified-cross-architecture-heterogeneous-simulation-solver-for-compressible-reacting-flows.md b/content/research_papers/2024/2024-03-09-xfluids-a-sycl-based-unified-cross-architecture-heterogeneous-simulation-solver-for-compressible-reacting-flows.md new file mode 100644 index 0000000..3ff3da2 --- /dev/null +++ b/content/research_papers/2024/2024-03-09-xfluids-a-sycl-based-unified-cross-architecture-heterogeneous-simulation-solver-for-compressible-reacting-flows.md @@ -0,0 +1,32 @@ +--- +contributor: max +date: '2024-03-09T08:08:10.490000' +title: 'XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows' +external_url: https://arxiv.org/abs/2403.05910 +authors: + - name: Jinlong Li + - name: Shucheng Pan +tags: + - sycl + - hpc + - cuda + - hip + - heterogeneous-programming +--- + +We present a cross-architecture high-order heterogeneous Navier-Stokes simulation solver, XFluids, for compressible +reacting multi-component flows on different platforms. The multi-component reacting flows are ubiquitous in many +scientific and engineering applications, while their numerical simulations are usually time-consuming to capture the +underlying multiscale features. Although heterogeneous accelerated computing is significantly beneficial for large-scale +simulations of these flows, effective utilization of various heterogeneous accelerators with different architectures and +programming models in the market remains a challenge. To address this, we develop XFluids by SYCL, to perform +acceleration directly targeted to different devices, without translating any source code. A variety of optimization +techniques have been proposed to increase the computational performance of XFluids, including adaptive range assignment, +partial eigen-system reconstruction, hotspot device function optimizations, etc. This solver has been open-sourced, and +tested on multiple GPUs from different mainstream vendors, indicating high portability. Through various benchmark cases, +the accuracy of XFluids is demonstrated, with approximately no efficiency loss compared to existing GPU programming +models, such as CUDA and HIP. In addition, the MPI library is used to extend the solver to multi-GPU platforms, with the +GPU-enabled MPI supported. With this, the weak scaling of XFluids for multi-GPU devices is larger than 95% for 1024 +GPUs. Finally, we simulate both the inert and reactive multi-component shock-bubble interaction problems with +high-resolution meshes, to investigate the reacting effects on the mixing, vortex stretching, and shape deformation of +the bubble evolution. diff --git a/content/research_papers/2024/2024-04-16-enabling-performance-portability-on-the-ligen-drug-discovery-pipeline.md b/content/research_papers/2024/2024-04-16-enabling-performance-portability-on-the-ligen-drug-discovery-pipeline.md new file mode 100644 index 0000000..84bf470 --- /dev/null +++ b/content/research_papers/2024/2024-04-16-enabling-performance-portability-on-the-ligen-drug-discovery-pipeline.md @@ -0,0 +1,38 @@ +--- +contributor: max +date: '2024-04-16T08:08:10.490000' +title: 'Enabling performance portability on the LiGen drug discovery pipeline' +external_url: https://www.sciencedirect.com/science/article/pii/S0167739X24001195 +authors: + - name: Luigi Crisci + - name: Lorenzo Carpentieri + - name: Biagio Cosenza + - name: Leon Bogdanović + - name: Gianmarco Accordi + - name: Davide Gadioli + - name: Emanuele Vitali + - name: Gianluca Palermo + - name: Andrea Rosario Beccari +tags: + - gpu + - sycl + - hpc + - cuda + - hip +--- + +In recent years, there has been a growing interest in developing high-performance implementations of drug discovery +processing software. To target modern GPU architectures, such applications are mostly written in proprietary languages +such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of +accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery +pipelines on multiple large-scale computing systems, with the ultimate goal of working on urgent computing scenarios. +This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, from CUDA to +the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a +structured analysis of the performance portability of the SYCL LiGen platform, focusing on different aspects of the +approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics +of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects +such as kernel occupancy and register pressure affect the performance portability of the overall application. The +experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization +patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both +known and ad hoc metrics. The results show that, thanks to the combination of high-level SYCL semantics and some manual +tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs. diff --git a/content/videos/2024/2024-07-05-bring-your-code-to-riscv-accelerators-with-sycl.md b/content/videos/2024/2024-07-05-bring-your-code-to-riscv-accelerators-with-sycl.md new file mode 100644 index 0000000..9358678 --- /dev/null +++ b/content/videos/2024/2024-07-05-bring-your-code-to-riscv-accelerators-with-sycl.md @@ -0,0 +1,18 @@ +--- +contributor: max +date: '2024-07-05T14:16:16.441310' +title: 'Bring your code to RISC-V accelerators with SYCL' +external_url: https://www.youtube.com/watch?v=Jw_nEtYi2-k +type: presentation +tags: + - hpc + - risc-v +--- + +This talk will show attendees how to overcome proprietary code with RISC-V and SYCL. They will learn how they can +achieve code portability and adopt RISC-V hardware without losing their existing work, for greater productivity. + +The talk will also highlight the ongoing research into pioneering applications for RISC-V, funded by the EU Horizon +programme. AERO and SYCLOPS are two such projects. AERO seeks to enable the future heterogeneous EU cloud +infrastructure, while SYCLOPS will bring together the RISC-V and SYCL standards together into a single software stack +for the first time. diff --git a/static/images/news/2024-07-30-oneapi-construction-kit.webp b/static/images/news/2024-07-30-oneapi-construction-kit.webp new file mode 100644 index 0000000..dd3339c Binary files /dev/null and b/static/images/news/2024-07-30-oneapi-construction-kit.webp differ