|
| 1 | +--- |
| 2 | +contributor: max |
| 3 | +date: '2023-10-01T08:08:10.490000+00:00' |
| 4 | +title: 'Open SYCL on heterogeneous GPU systems: A case of study' |
| 5 | +external_url: https://arxiv.org/ftp/arxiv/papers/2310/2310.06947.pdf |
| 6 | +authors: |
| 7 | + - name: Rocío Carratalá-Sáez |
| 8 | + - name: Francisco J. Andújar |
| 9 | + - name: Yuri Torres |
| 10 | + - name: Arturo Gonzalez-Escribano |
| 11 | + - name: Diego R. Llanos |
| 12 | +tags: |
| 13 | + - gpu |
| 14 | + - cuda |
| 15 | + - hpc |
| 16 | + - hip |
| 17 | +--- |
| 18 | + |
| 19 | +Computational platforms for high-performance scientific applications are becoming more heterogenous, including hardware |
| 20 | +accelerators such as multiple GPUs. Applications in a wide variety of scientific fields require an efcient and careful |
| 21 | +management of the computational resources of this type of hardware to obtain the best possible performance. However, |
| 22 | +there are currently different GPU vendors, architectures and families that can be found in heterogeneous clusters or |
| 23 | +machines. Programming with the vendor provided languages or frameworks, and optimizing for specific devices, may become |
| 24 | +cumbersome and compromise porta-bility to other systems. To overcome this problem, several proposals for high-level |
| 25 | +heterogeneous programming have appeared, trying to reduce the development effort and increase functional and performance |
| 26 | +portability, specifically when using GPU hardware accelerators. This paper evaluates the SYCL programming model, using the |
| 27 | +Open SYCL compiler, from two different perspectives: The performance it offers when dealing with single or multiple GPU devices |
| 28 | +from the same or different vendors, and the development effort required to implement the code. We use as case of study the Finite Time Lyapunov Exponent |
| 29 | +calculation over two real-world scenarios and compare the performance and the development effort of its Open SYCL-based version against |
| 30 | +the equivalent versions that use CUDA or HIP. Based on the experimental results, we observe that the use of SYCL does not lead to a |
| 31 | +remarkable overhead in terms of the GPU kernels execution time. In general terms, the Open SYCL development effort for the host code |
| 32 | +is lower than that observed with CUDA or HIP. Moreover, the SYCL version can take advantage of both CUDA and AMD GPU devices simultaneously |
| 33 | +much easier than directly using the vendor-specific programming solutions. |
0 commit comments