Skip to content

Commit 0caa83c

Browse files
authored
[SYCL][DOC] CUDA and HIP GetStartedGuide updates (#17928)
* Fix links to configure.py and compile.py * Linking to the file in tree caused the link in the docs to just download the python script. It makes more sense to link to the github web UI for these as they should be used in a checkout anyway. * Remove CUDA requiring device selector * Since #6203 the device selector should handle this case well. * Update HIP backend limitations * HIP is no longer in beta * Windows isn't supported but #17702 made the build work with it so it might work for some users. * Global offset has been supported since #5855 * Add common limitations * Update HIP for Nvidia section * This might work but is not supported * Update HIP section * Recommended HIP version + testing platforms * HIP is no longer in beta * Add note on target aliases for CUDA and HIP
1 parent c6f65b8 commit 0caa83c

File tree

1 file changed

+36
-31
lines changed

1 file changed

+36
-31
lines changed

sycl/doc/GetStartedGuide.md

Lines changed: 36 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -92,8 +92,8 @@ git clone --config core.autocrlf=false https://github.com/intel/llvm -b sycl
9292
## Build DPC++ toolchain
9393

9494
The easiest way to get started is to use the buildbot
95-
[configure](../../buildbot/configure.py) and
96-
[compile](../../buildbot/compile.py) scripts.
95+
[configure](https://github.com/intel/llvm/blob/sycl/buildbot/configure.py) and
96+
[compile](https://github.com/intel/llvm/blob/sycl/buildbot/compile.py) scripts.
9797

9898
In case you want to configure CMake manually the up-to-date reference for
9999
variables is in these files. Note that the CMake variables set by default by the [configure.py](../../buildbot/configure.py) script are the ones commonly used by
@@ -237,21 +237,21 @@ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$DPCPP_HOME/llvm/build/lib ./a.out
237237

238238
### Build DPC++ toolchain with support for HIP AMD
239239

240-
There is beta support for oneAPI DPC++ for HIP on AMD devices. It is not feature
241-
complete and it still contains known and unknown bugs. Currently it has only
242-
been tried on Linux, with ROCm 4.2.0, 4.3.0, 4.5.2, 5.3.0, and 5.4.3, using the
243-
AMD Radeon Pro W6800 (gtx1030), MI50 (gfx906), MI100 (gfx908) and MI250x
244-
(gfx90a) devices. The backend is tested by a relevant device/toolkit prior to a
245-
oneAPI plugin release. Go to the plugin release
246-
[pages](https://developer.codeplay.com/products/oneapi/amd) for further details.
247-
248240
To enable support for HIP devices, follow the instructions for the Linux DPC++
249241
toolchain, but add the `--hip` flag to `configure.py`.
250242

251243
Enabling this flag requires an installation of ROCm on the system, for
252244
instruction on how to install this refer to
253245
[AMD ROCm Installation Guide for Linux](https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html).
254246

247+
ROCm versions above 5.7 are recommended as earlier versions don't have graph
248+
support. DPC++ aims to support new ROCm versions as they come out, so there may
249+
be a delay but generally the latest ROCm version should work. The ROCm support
250+
is mostly tested on AMD Radeon Pro W6800 (gfx1030), and MI250x (gfx90a), however
251+
other architectures supported by LLVM may work just fine. The full list of ROCm
252+
versions tested prior to oneAPI releases are listed on the plugin release
253+
[pages](https://developer.codeplay.com/products/oneapi/amd).
254+
255255
The DPC++ build assumes that ROCm is installed in `/opt/rocm`, if it is
256256
installed somewhere else, the directory must be provided through the CMake
257257
variable `UR_HIP_ROCM_DIR` which can be passed through to cmake using the
@@ -280,7 +280,10 @@ by default when configuring for HIP. For more details on building LLD refer to
280280

281281
### Build DPC++ toolchain with support for HIP NVIDIA
282282

283-
There is experimental support for oneAPI DPC++ for HIP on Nvidia devices.
283+
HIP applications can be built to target Nvidia GPUs, so in theory it is possible
284+
to build the DPC++ HIP support for Nvidia, however this is not supported, so it
285+
may not work.
286+
284287
There is no continuous integration for this and there are no guarantees for
285288
supported platforms or configurations.
286289

@@ -292,13 +295,12 @@ To enable support for HIP NVIDIA devices, follow the instructions for the Linux
292295
DPC++ toolchain, but add the `--hip` and `--hip-platform NVIDIA` flags to
293296
`configure.py`.
294297

295-
Enabling this flag requires HIP to be installed, more specifically
296-
[HIP NVCC](https://rocmdocs.amd.com/en/latest/Installation_Guide/HIP-Installation.html#nvidia-platform),
297-
as well as the CUDA Runtime API to be installed, see
298-
[NVIDIA CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
299-
300-
Currently, this has only been tried on Linux, with ROCm 4.2.0 or 4.3.0, with
301-
CUDA 11, and using a GeForce 1060 device.
298+
Enabling this flag requires HIP to be installed, specifically for Nvidia, see
299+
the Nvidia tab on the HIP installation docs
300+
[here](https://rocm.docs.amd.com/projects/HIP/en/latest/install/install.html),
301+
as well as the CUDA Runtime API to be installed, see [NVIDIA CUDA Installation
302+
Guide for
303+
Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html).
302304

303305
### Build DPC++ toolchain with support for ARM processors
304306

@@ -736,14 +738,6 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
736738
The results are correct!
737739
```
738740
739-
**NOTE**: Currently, when the application has been built with the CUDA target,
740-
the CUDA backend must be selected at runtime using the `ONEAPI_DEVICE_SELECTOR`
741-
environment variable.
742-
743-
```bash
744-
ONEAPI_DEVICE_SELECTOR=cuda:* ./simple-sycl-app-cuda.exe
745-
```
746-
747741
**NOTE**: oneAPI DPC++/SYCL developers can specify SYCL device for execution
748742
using device selectors (e.g. `sycl::cpu_selector_v`, `sycl::gpu_selector_v`,
749743
[Intel FPGA selector(s)](extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc))
@@ -777,6 +771,14 @@ clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
777771
-Xsycl-target-backend --cuda-gpu-arch=sm_80
778772
```
779773
774+
Additionally AMD and Nvidia targets also support aliases for the target to
775+
simplify passing the specific architectures, for example
776+
`-fsycl-targets=nvidia_gpu_sm_80` is equivalent to
777+
`-fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend
778+
--cuda-gpu-arch=sm_80`, the full list of available aliases is documented in the
779+
[Users Manual](UsersManual.md#generic-options), for the `-fsycl-targets`
780+
option.
781+
780782
To build simple-sycl-app ahead of time for GPU, CPU or Accelerator devices,
781783
specify the target architecture. The examples provided use a supported
782784
alias for the target, representing a full triple. Additional details can
@@ -945,11 +947,14 @@ int CUDASelector(const sycl::device &Device) {
945947
946948
### HIP back-end limitations
947949
948-
* Requires a ROCm compatible operating system, for full details of supported
949-
Operating System for ROCm, please refer to the
950-
[ROCm Supported Operating Systems](https://github.com/RadeonOpenCompute/ROCm#supported-operating-systems).
951-
* Support is still in a beta state, but the backend is being actively developed.
952-
* Global offsets are currently not supported.
950+
* Requires a ROCm compatible system and GPU, see for
951+
[Linux](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-skus)
952+
and for
953+
[Windows](https://rocm.docs.amd.com/projects/install-on-windows/en/latest/reference/system-requirements.html#supported-skus).
954+
* Windows for HIP is not supported by DPC++ at the moment so it may not work.
955+
* `printf` within kernels is not supported.
956+
* C++ standard library functions using complex types are not supported,
957+
`sycl::complex` should be used instead.
953958
954959
## Find More
955960

0 commit comments

Comments
 (0)