Skip to content

Commit 3768f63

Browse files
Merge pull request #16 from codeplaysoftware/new-research-papers
New Research Papers & More
2 parents 0320a68 + 6a00cc4 commit 3768f63

9 files changed

+224
-0
lines changed
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
---
2+
contributor: max
3+
date: '2024-07-30T14:10:22.153253'
4+
external_url: https://www.phoronix.com/news/oneAPI-Construction-Kit-4.0
5+
title: 'oneAPI Construction Kit 4.0 Brings RISC-V Host CPU Support'
6+
image: ../../../static/images/news/2024-07-30-oneapi-construction-kit.webp
7+
tags:
8+
- oneapi
9+
- sycl
10+
- hpc
11+
- portability
12+
---
13+
14+
Last year the oneAPI Construction Kit was introduced by Intel-owned Codeplay Software for bringing SYCL to new hardware
15+
even for hardware outside of Intel's offerings. One of the early targets of this oneAPI Construction Kit support was for
16+
RISC-V processors and now with today's release of oneAPI Construction Kit 4.0 there is finally RISC-V host CPU support.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
contributor: max
3+
date: '2023-10-01T08:08:10.490000'
4+
title: 'Open SYCL on heterogeneous GPU systems: A case of study'
5+
external_url: https://arxiv.org/ftp/arxiv/papers/2310/2310.06947.pdf
6+
authors:
7+
- name: Rocío Carratalá-Sáez
8+
- name: Francisco J. Andújar
9+
- name: Yuri Torres
10+
- name: Arturo Gonzalez-Escribano
11+
- name: Diego R. Llanos
12+
tags:
13+
- gpu
14+
- cuda
15+
- hpc
16+
- hip
17+
---
18+
19+
Computational platforms for high-performance scientific applications are becoming more heterogeneous, including hardware
20+
accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efficient and careful
21+
management of the computational resources of this type of hardware to obtain the best possible performance. However,
22+
there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or
23+
machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become
24+
cumbersome and compromise portability to other systems. To overcome this problem, several proposals for high-level
25+
heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance
26+
portability, specifically when using GPU hardware accelerators. This paper evaluates the SYCL programming model, using
27+
the Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple
28+
GPU devices from the same or different vendors, and the development effort required to implement the code. We use as
29+
case of study the Finite Time Lyapunov Exponent calculation over two real-world scenarios and compare the performance
30+
and the development effort of its Open SYCL-based version against the equivalent versions that use CUDA or HIP. Based on
31+
the experimental results, we observe that the use of SYCL does not lead to a remarkable overhead in terms of the GPU
32+
kernels execution time. In general terms, the Open SYCL development effort for the host code is lower than that observed
33+
with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously much
34+
easier than directly using the vendor-specific programming solutions.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
contributor: max
3+
date: '2023-10-24T08:08:10.490000'
4+
title: 'A Performance-Portable SYCL Implementation of CRK-HACC for Exascale'
5+
external_url: https://arxiv.org/pdf/2310.16122
6+
authors:
7+
- name: Esteban M. Rangel
8+
- name: S. John Pennycook
9+
- name: Adrian Pope
10+
- name: Nicholas Frontiere
11+
- name: Zhiqiang Ma
12+
- name: Varsha Madananth
13+
tags:
14+
- sycl
15+
- hpc
16+
- cuda
17+
- hip
18+
- heterogeneous-programming
19+
---
20+
21+
The first generation of exascale systems will include a variety of machine architectures, featuring GPUs from multiple
22+
vendors. As a result, many developers are interested in adopting portable programming models to avoid maintaining
23+
multiple versions of their code. It is necessary to document experiences with such programming models to assist
24+
developers in understanding the advantages and disadvantages of different approaches.
25+
26+
To this end, this paper evaluates the performance portability of a SYCL implementation of a large-scale cosmology
27+
application (CRK-HACC) running on GPUs from three different vendors: AMD, Intel, and NVIDIA. We detail the process of
28+
migrating the original code from CUDA to SYCL and show that specializing kernels for specific targets can greatly
29+
improve performance portability without significantly impacting programmer productivity. The SYCL version of CRK-HACC
30+
achieves a performance portability of 0.96 with a code divergence of almost 0, demonstrating that SYCL is a viable
31+
programming model for performance-portable applications.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
contributor: max
3+
date: '2024-01-05T08:08:10.490000'
4+
title: 'Preliminary report: Initial evaluation of StdPar implementations on AMD GPUs for HPC'
5+
external_url: https://arxiv.org/pdf/2401.02680
6+
authors:
7+
- name: Wei-Chen Lin
8+
- name: Simon McIntosh-Smith
9+
- name: Tom Deakin
10+
tags:
11+
- sycl
12+
- gpu
13+
- hip
14+
---
15+
16+
Recently, AMD platforms have not supported offloading C++17 PSTL (StdPar) programs to the GPU. Our previous work
17+
highlights how StdPar is able to achieve good performance across NVIDIA and Intel GPU platforms. In that work, we
18+
acknowledged AMD’s past effort such as HCC, which unfortunately is deprecated and does not support newer hardware
19+
platforms. Recent developments by AMD, Codeplay, and AdaptiveCpp (previously known as hipSYCL or OpenSYCL) have enabled
20+
multiple paths for StdPar programs to run on AMD GPUs. This informal report discusses our experiences and evaluation of
21+
currently available StdPar implementations for AMD GPUs. We conduct benchmarks using our suite of HPC mini-apps with
22+
ports in many heterogeneous programming models, including StdPar. We then compare the performance of StdPar, using all
23+
available StdPar compilers, to contemporary heterogeneous programming models supported on AMD GPUs: HIP, OpenCL, Thrust,
24+
Kokkos, OpenMP, SYCL. Where appropriate, we discuss issues encountered and workarounds applied during our evaluation.
25+
Finally, the StdPar model discussed in this report largely depends on Unified Shared Memory (USM) performance and very
26+
few AMD GPUs have proper support for this feature. As such, this report demonstrates a proof-of-concept host-side
27+
userspace pagefault solution for models that use the HIP API. We discuss performance improvements achieved with our
28+
solution using the same set of benchmarks.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
contributor: max
3+
date: '2024-01-24T08:08:10.490000'
4+
title: 'Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame'
5+
external_url: https://arxiv.org/pdf/2401.13310
6+
authors:
7+
- name: Jolly Chen
8+
- name: Monica Dessole
9+
- name: Ana Lucia Varbanescu
10+
tags:
11+
- sycl
12+
- gpu
13+
- cuda
14+
- heterogeneous-programming
15+
---
16+
17+
The world’s largest particle accelerator, located at CERN, produces petabytes of data that need to be analysed
18+
efficiently, to study the fundamental structures of our universe. ROOT is an open-source C++ data analysis framework,
19+
developed for this purpose. Its high-level data analysis interface, RDataFrame, currently only supports CPU parallelism.
20+
Given the increasing heterogeneity in computing facilities, it becomes crucial to efficiently support GPGPUs to take
21+
advantage of the available resources. SYCL allows for a single-source implementation, which enables support for
22+
different architectures. In this paper, we describe a CUDA implementation and the migration process to SYCL, focusing on
23+
a core high energy physics operation in RDataFrame – histogramming. We detail the challenges that we faced when
24+
integrating SYCL into a large and complex code base. Furthermore, we perform an extensive comparative performance
25+
analysis of two SYCL compilers, AdaptiveCpp and DPC++, and the reference CUDA implementation. We highlight the
26+
performance bottlenecks that we encountered, and the methodology used to detect these. Based on our findings, we provide
27+
actionable insights for developers of SYCL applications.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
contributor: max
3+
date: '2024-03-09T08:08:10.490000'
4+
title: 'XFLUIDS: A SYCL-based unified cross-architecture heterogeneous simulation solver for compressible reacting flows'
5+
external_url: https://arxiv.org/abs/2403.05910
6+
authors:
7+
- name: Jinlong Li
8+
- name: Shucheng Pan
9+
tags:
10+
- sycl
11+
- hpc
12+
- cuda
13+
- hip
14+
- heterogeneous-programming
15+
---
16+
17+
We present a cross-architecture high-order heterogeneous Navier-Stokes simulation solver, XFluids, for compressible
18+
reacting multi-component flows on different platforms. The multi-component reacting flows are ubiquitous in many
19+
scientific and engineering applications, while their numerical simulations are usually time-consuming to capture the
20+
underlying multiscale features. Although heterogeneous accelerated computing is significantly beneficial for large-scale
21+
simulations of these flows, effective utilization of various heterogeneous accelerators with different architectures and
22+
programming models in the market remains a challenge. To address this, we develop XFluids by SYCL, to perform
23+
acceleration directly targeted to different devices, without translating any source code. A variety of optimization
24+
techniques have been proposed to increase the computational performance of XFluids, including adaptive range assignment,
25+
partial eigen-system reconstruction, hotspot device function optimizations, etc. This solver has been open-sourced, and
26+
tested on multiple GPUs from different mainstream vendors, indicating high portability. Through various benchmark cases,
27+
the accuracy of XFluids is demonstrated, with approximately no efficiency loss compared to existing GPU programming
28+
models, such as CUDA and HIP. In addition, the MPI library is used to extend the solver to multi-GPU platforms, with the
29+
GPU-enabled MPI supported. With this, the weak scaling of XFluids for multi-GPU devices is larger than 95% for 1024
30+
GPUs. Finally, we simulate both the inert and reactive multi-component shock-bubble interaction problems with
31+
high-resolution meshes, to investigate the reacting effects on the mixing, vortex stretching, and shape deformation of
32+
the bubble evolution.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
---
2+
contributor: max
3+
date: '2024-04-16T08:08:10.490000'
4+
title: 'Enabling performance portability on the LiGen drug discovery pipeline'
5+
external_url: https://www.sciencedirect.com/science/article/pii/S0167739X24001195
6+
authors:
7+
- name: Luigi Crisci
8+
- name: Lorenzo Carpentieri
9+
- name: Biagio Cosenza
10+
- name: Leon Bogdanović
11+
- name: Gianmarco Accordi
12+
- name: Davide Gadioli
13+
- name: Emanuele Vitali
14+
- name: Gianluca Palermo
15+
- name: Andrea Rosario Beccari
16+
tags:
17+
- gpu
18+
- sycl
19+
- hpc
20+
- cuda
21+
- hip
22+
---
23+
24+
In recent years, there has been a growing interest in developing high-performance implementations of drug discovery
25+
processing software. To target modern GPU architectures, such applications are mostly written in proprietary languages
26+
such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of
27+
accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery
28+
pipelines on multiple large-scale computing systems, with the ultimate goal of working on urgent computing scenarios.
29+
This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, from CUDA to
30+
the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a
31+
structured analysis of the performance portability of the SYCL LiGen platform, focusing on different aspects of the
32+
approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics
33+
of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects
34+
such as kernel occupancy and register pressure affect the performance portability of the overall application. The
35+
experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization
36+
patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both
37+
known and ad hoc metrics. The results show that, thanks to the combination of high-level SYCL semantics and some manual
38+
tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
contributor: max
3+
date: '2024-07-05T14:16:16.441310'
4+
title: 'Bring your code to RISC-V accelerators with SYCL'
5+
external_url: https://www.youtube.com/watch?v=Jw_nEtYi2-k
6+
type: presentation
7+
tags:
8+
- hpc
9+
- risc-v
10+
---
11+
12+
This talk will show attendees how to overcome proprietary code with RISC-V and SYCL. They will learn how they can
13+
achieve code portability and adopt RISC-V hardware without losing their existing work, for greater productivity.
14+
15+
The talk will also highlight the ongoing research into pioneering applications for RISC-V, funded by the EU Horizon
16+
programme. AERO and SYCLOPS are two such projects. AERO seeks to enable the future heterogeneous EU cloud
17+
infrastructure, while SYCLOPS will bring together the RISC-V and SYCL standards together into a single software stack
18+
for the first time.
Binary file not shown.

0 commit comments

Comments
 (0)