Skip to content

New Research Papers & More #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
contributor: max
date: '2024-07-30T14:10:22.153253'
external_url: https://www.phoronix.com/news/oneAPI-Construction-Kit-4.0
title: 'oneAPI Construction Kit 4.0 Brings RISC-V Host CPU Support'
image: ../../../static/images/news/2024-07-30-oneapi-construction-kit.webp
tags:
- oneapi
- sycl
- hpc
- portability
---

Last year the oneAPI Construction Kit was introduced by Intel-owned Codeplay Software for bringing SYCL to new hardware
even for hardware outside of Intel's offerings. One of the early targets of this oneAPI Construction Kit support was for
RISC-V processors and now with today's release of oneAPI Construction Kit 4.0 there is finally RISC-V host CPU support.
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
contributor: max
date: '2023-10-01T08:08:10.490000'
title: 'Open SYCL on heterogeneous GPU systems: A case of study'
external_url: https://arxiv.org/ftp/arxiv/papers/2310/2310.06947.pdf
authors:
- name: Rocío Carratalá-Sáez
- name: Francisco J. Andújar
- name: Yuri Torres
- name: Arturo Gonzalez-Escribano
- name: Diego R. Llanos
tags:
- gpu
- cuda
- hpc
- hip
---

Computational platforms for high-performance scientific applications are becoming more heterogeneous, including hardware
accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful
management of the computational resources of this type of hardware to obtain the best possible performance. However,
there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or
machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become
cumbersome and compromise portability to other systems. To overcome this problem, several proposals for high-level
heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance
portability, specifically when using GPU hardware accelerators. This paper evaluates the SYCL programming model, using
the Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple
GPU devices from the same or different vendors, and the development effort required to implement the code. We use as
case of study the Finite Time Lyapunov Exponent calculation over two real-world scenarios and compare the performance
and the development effort of its Open SYCL-based version against the equivalent versions that use CUDA or HIP. Based on
the experimental results, we observe that the use of SYCL does not lead to a remarkable overhead in terms of the GPU
kernels execution time. In general terms, the Open SYCL development effort for the host code is lower than that observed
with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously much
easier than directly using the vendor-specific programming solutions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
contributor: max
date: '2023-10-24T08:08:10.490000'
title: 'A Performance-Portable SYCL Implementation of CRK-HACC for Exascale'
external_url: https://arxiv.org/pdf/2310.16122
authors:
- name: Esteban M. Rangel
- name: S. John Pennycook
- name: Adrian Pope
- name: Nicholas Frontiere
- name: Zhiqiang Ma
- name: Varsha Madananth
tags:
- sycl
- hpc
- cuda
- hip
- heterogeneous-programming
---

The first generation of exascale systems will include a variety of machine architectures, featuring GPUs from multiple
vendors. As a result, many developers are interested in adopting portable programming models to avoid maintaining
multiple versions of their code. It is necessary to document experiences with such programming models to assist
developers in understanding the advantages and disadvantages of different approaches.

To this end, this paper evaluates the performance portability of a SYCL implementation of a large-scale cosmology
application (CRK-HACC) running on GPUs from three different vendors: AMD, Intel, and NVIDIA. We detail the process of
migrating the original code from CUDA to SYCL and show that specializing kernels for specific targets can greatly
improve performance portability without significantly impacting programmer productivity. The SYCL version of CRK-HACC
achieves a performance portability of 0.96 with a code divergence of almost 0, demonstrating that SYCL is a viable
programming model for performance-portable applications.
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
contributor: max
date: '2024-01-05T08:08:10.490000'
title: 'Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC'
external_url: https://arxiv.org/pdf/2401.02680
authors:
- name: Wei-Chen Lin
- name: Simon McIntosh-Smith
- name: Tom Deakin
tags:
- sycl
- gpu
- hip
---

Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work
highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we
acknowledged AMD’s past effort such as HCC, which unfortunately is deprecated and does not support newer hardware
platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled
multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of
currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with
ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all
available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust,
Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation.
Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very
few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side
userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our
solution using the same set of benchmarks.
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
contributor: max
date: '2024-01-24T08:08:10.490000'
title: 'Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame'
external_url: https://arxiv.org/pdf/2401.13310
authors:
- name: Jolly Chen
- name: Monica Dessole
- name: Ana Lucia Varbanescu
tags:
- sycl
- gpu
- cuda
- heterogeneous-programming
---

The world’s largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed
efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework,
developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism.
Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take
advantage of the available resources. SYCL allows for a single-source implementation, which enables support for
different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on
a core high energy physics operation in RDataFrame – histogramming. We detail the challenges that we faced when
integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance
analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the
performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide
actionable insights for developers of SYCL applications.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
contributor: max
date: '2024-03-09T08:08:10.490000'
title: 'XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows'
external_url: https://arxiv.org/abs/2403.05910
authors:
- name: Jinlong Li
- name: Shucheng Pan
tags:
- sycl
- hpc
- cuda
- hip
- heterogeneous-programming
---

We present a cross-architecture high-order heterogeneous Navier-Stokes simulation solver, XFluids, for compressible
reacting multi-component flows on different platforms. The multi-component reacting flows are ubiquitous in many
scientific and engineering applications, while their numerical simulations are usually time-consuming to capture the
underlying multiscale features. Although heterogeneous accelerated computing is significantly beneficial for large-scale
simulations of these flows, effective utilization of various heterogeneous accelerators with different architectures and
programming models in the market remains a challenge. To address this, we develop XFluids by SYCL, to perform
acceleration directly targeted to different devices, without translating any source code. A variety of optimization
techniques have been proposed to increase the computational performance of XFluids, including adaptive range assignment,
partial eigen-system reconstruction, hotspot device function optimizations, etc. This solver has been open-sourced, and
tested on multiple GPUs from different mainstream vendors, indicating high portability. Through various benchmark cases,
the accuracy of XFluids is demonstrated, with approximately no efficiency loss compared to existing GPU programming
models, such as CUDA and HIP. In addition, the MPI library is used to extend the solver to multi-GPU platforms, with the
GPU-enabled MPI supported. With this, the weak scaling of XFluids for multi-GPU devices is larger than 95% for 1024
GPUs. Finally, we simulate both the inert and reactive multi-component shock-bubble interaction problems with
high-resolution meshes, to investigate the reacting effects on the mixing, vortex stretching, and shape deformation of
the bubble evolution.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
---
contributor: max
date: '2024-04-16T08:08:10.490000'
title: 'Enabling performance portability on the LiGen drug discovery pipeline'
external_url: https://www.sciencedirect.com/science/article/pii/S0167739X24001195
authors:
- name: Luigi Crisci
- name: Lorenzo Carpentieri
- name: Biagio Cosenza
- name: Leon Bogdanović
- name: Gianmarco Accordi
- name: Davide Gadioli
- name: Emanuele Vitali
- name: Gianluca Palermo
- name: Andrea Rosario Beccari
tags:
- gpu
- sycl
- hpc
- cuda
- hip
---

In recent years, there has been a growing interest in developing high-performance implementations of drug discovery
processing software. To target modern GPU architectures, such applications are mostly written in proprietary languages
such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of
accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery
pipelines on multiple large-scale computing systems, with the ultimate goal of working on urgent computing scenarios.
This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, from CUDA to
the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a
structured analysis of the performance portability of the SYCL LiGen platform, focusing on different aspects of the
approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics
of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects
such as kernel occupancy and register pressure affect the performance portability of the overall application. The
experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization
patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both
known and ad hoc metrics. The results show that, thanks to the combination of high-level SYCL semantics and some manual
tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
contributor: max
date: '2024-07-05T14:16:16.441310'
title: 'Bring your code to RISC-V accelerators with SYCL'
external_url: https://www.youtube.com/watch?v=Jw_nEtYi2-k
type: presentation
tags:
- hpc
- risc-v
---

This talk will show attendees how to overcome proprietary code with RISC-V and SYCL. They will learn how they can
achieve code portability and adopt RISC-V hardware without losing their existing work, for greater productivity.

The talk will also highlight the ongoing research into pioneering applications for RISC-V, funded by the EU Horizon
programme. AERO and SYCLOPS are two such projects. AERO seeks to enable the future heterogeneous EU cloud
infrastructure, while SYCLOPS will bring together the RISC-V and SYCL standards together into a single software stack
for the first time.
Binary file not shown.