|
| 1 | +--- |
| 2 | +contributor: max |
| 3 | +date: '2024-04-16T08:08:10.490000+00:00' |
| 4 | +title: 'Enabling performance portability on the LiGen drug discovery pipeline' |
| 5 | +external_url: https://www.sciencedirect.com/science/article/pii/S0167739X24001195 |
| 6 | +authors: |
| 7 | + - name: Luigi Crisci |
| 8 | + - name: Lorenzo Carpentieri |
| 9 | + - name: Biagio Cosenza |
| 10 | + - name: Leon Bogdanović |
| 11 | + - name: Gianmarco Accordi |
| 12 | + - name: Davide Gadioli |
| 13 | + - name: Emanuele Vitali |
| 14 | + - name: Gianluca Palermo |
| 15 | + - name: Andrea Rosario Beccari |
| 16 | +tags: |
| 17 | + - gpu |
| 18 | + - sycl |
| 19 | + - hpc |
| 20 | + - cuda |
| 21 | + - hip |
| 22 | +--- |
| 23 | + |
| 24 | +In recent years, there has been a growing interest in developing high-performance implementations of drug discovery processing software. To target modern GPU architectures, |
| 25 | +such applications are mostly written in proprietary languages such as CUDA or HIP. However, with the increasing heterogeneity of modern HPC systems and the availability of |
| 26 | +accelerators from multiple hardware vendors, it has become critical to be able to efficiently execute drug discovery pipelines on multiple large-scale computing systems, |
| 27 | +with the ultimate goal of working on urgent computing scenarios. This article presents the challenges of migrating LiGen, an industrial drug discovery software pipeline, |
| 28 | +from CUDA to the SYCL programming model, an industry standard based on C++ that enables heterogeneous computing. We perform a structured analysis of the performance portability |
| 29 | +of the SYCL LiGen platform, focusing on different aspects of the approach from different perspectives. First, we analyze the performance portability provided by the high-level semantics |
| 30 | +of SYCL, including the most recent group algorithms and subgroups of SYCL 2020. Second, we analyze how low-level aspects such as kernel occupancy and register pressure affect |
| 31 | +the performance portability of the overall application. The experimental evaluation is performed on two different versions of LiGen, implementing two different parallelization |
| 32 | +patterns, by comparing them with a manually optimized CUDA version, and by evaluating performance portability using both known and ad hoc metrics. The results show that, |
| 33 | +thanks to the combination of high-level SYCL semantics and some manual tuning, LiGen achieves native-comparable performance on NVIDIA, while also running on AMD GPUs. |
0 commit comments