You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+16-12Lines changed: 16 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -8,28 +8,32 @@ The examples are built and test in Linux with GCC 7.4, NVCC 10.1 and the
8
8
experimental support for CUDA in the DPC++ SYCL implementation.
9
9
10
10
CUDA is a registered trademark of NVIDIA Corporation
11
-
SYCL is a trademark of the Khronos Group Inc
11
+
SYCL is a trademark of the Khronos Group Inc.
12
12
13
-
Docker Image
13
+
Prerequisites
14
14
-------------
15
15
16
-
There is a docker image available with all the examples and the required
17
-
environment set up, see https://hub.docker.com/r/ruyman/dpcpp_cuda_examples.
16
+
These examples are intended to be used with this [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples).
17
+
It provides all the examples, libraries and the required environment variables.
18
18
19
-
If you have nvidia-docker, you can simply pull the image and run it to build
20
-
the examples:
19
+
[NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) must be installed to run the image.
21
20
22
-
```sh
21
+
A useful guide for setting up docker and the NVIDIA Container Toolkit can be found [here](https://www.pugetsystems.com/labs/hpc/Workstation-Setup-for-Docker-with-the-New-NVIDIA-Container-Toolkit-nvidia-docker2-is-deprecated-1568).
22
+
23
+
Getting Started
24
+
-------------
25
+
26
+
Once docker and the NVIDIA Container Toolkit are installed, we can create a new container and run the examples witin it.
27
+
28
+
```sh
23
29
$ sudo docker run --gpus all -it ruyman/dpcpp_cuda_examples
24
30
```
25
31
26
-
Once inside the docker image, navigate to /home/examples/ to find a clone
27
-
of this repo. Make sure to pull the latest changes:
32
+
Once inside the docker image, navigate to `/home/examples/` to find a local clone of this repo. Make sure to pull the latest changes:
28
33
29
-
```sh
34
+
```sh
30
35
$ cd /home/examples/SYCL-For-CUDA-Examples
31
36
$ git pull
32
37
```
33
38
34
-
Refer to each example and/or exercise for detailed instructions on how
35
-
to run it.
39
+
Refer to each example and/or exercise for detailed instructions on how to run it.
extension, so we use `sycl::buffer` and `sycl::accessors` instead.
4
+
This trivial example can be used to compare a simple vector addition in CUDA to
5
+
an equivalent implementation in SYCL for CUDA. The aim of the example is also
6
+
to highlight how to build an application with SYCL for CUDA using DPC++ support,
7
+
for which an example CMakefile is provided. For detailed documentation on how to
8
+
migrate from CUDA to SYCL, see [SYCL For CUDA Developers](https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers).
9
+
10
+
Note currently the CUDA backend does not support the [USM](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) extension, so we use
11
+
`sycl::buffer` and `sycl::accessors` instead.
15
12
16
13
Pre-requisites
17
14
---------------
18
15
19
-
You would need an installation of DPC++ with CUDA support,
20
-
see [Getting Started Guide](https://github.com/codeplaysoftware/sycl-for-cuda/blob/cuda/sycl/doc/GetStartedWithSYCLCompiler.md)
21
-
for details on how to build it.
22
-
23
-
The example has been built on CMake 3.13.3 and nvcc 10.1.243.
16
+
These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image
17
+
simplifies accessing these examples as the environment is set up correctly.
18
+
For details on how to get started with the example docker image, refer to the
Note the `SYCL_BE` env variable is not required, since we use a custom
50
-
device selector.
51
41
52
42
CMake Build script
53
43
------------------------
@@ -56,19 +46,19 @@ The provided CMake build script uses the native CUDA support to build the
56
46
CUDA application. It also serves as a check that all CUDA requirements
57
47
on the system are available (such as an installation of CUDA on the system).
58
48
59
-
Two flags are required: `-DSYCL_ROOT`, which must point to the place where the
60
-
DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER`, which must point to
49
+
Two flags are required: `-DSYCL_ROOT`, which must point to the place where the
50
+
DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER`, which must point to
61
51
the Clang compiler provided by DPC++.
62
52
63
53
The CMake target `sycl_vector_addition` will build the SYCL version of
64
54
the application.
55
+
65
56
Note the variable `SYCL_FLAGS` is used to store the Clang flags that enable
66
-
the compilation of a SYCL application (`-fsycl`) but also the flag that specify
67
-
which targets are built (`-fsycl-targets`).
68
-
In this case, we will build the example for both NVPTX and SPIR64.
69
-
This means the kernel for the vector addition will be compiled for both
70
-
backends, and runtime selection to the right queue will decide which variant
71
-
to use.
57
+
the compilation of a SYCL application ( `-fsycl` ) but also the flag that specify
58
+
which targets are built ( `-fsycl-targets` ). In this case, we will build the example
59
+
for both NVPTX and SPIR64. This means the kernel for the vector addition will be
60
+
compiled for both backends, and runtime selection to the right queue will
61
+
decide which variant to use.
72
62
73
63
Note the project is built with C++17 support, which enables the usage of
74
64
[deduction guides](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/deduction_guides/SYCL_INTEL_deduction_guides.asciidoc) to reduce the number of template parameters used.
@@ -77,27 +67,26 @@ SYCL Vector Addition code
77
67
--------------------------
78
68
79
69
The vector addition example uses a simple approach to implement with a plain
80
-
kernel that performs the add. Vectors are stored directly in buffers.
81
-
Data is initialized on the host using host accessors.
82
-
This approach avoids creating unnecessary storage on the host, and facilitates
83
-
the SYCL runtime to use optimized memory paths.
84
-
85
-
The SYCL queue created later on uses a custom `CUDASelector` to select
86
-
a CUDA device, or bail out if its not there.
87
-
The CUDA selector uses the `info::device::driver_version` to identify the
88
-
device exported by the CUDA backend.
89
-
If the NVIDIA OpenCL implementation is available on the
90
-
system, it will be reported as another SYCL device. The driver
91
-
version is the best way to differentiate between the two.
92
-
93
-
The command group is created as a lambda expression that takes the
70
+
kernel that performs the add. Vectors are stored directly in buffers. Data is
71
+
initialized on the host using host accessors. This approach avoids creating
72
+
unnecessary storage on the host, and facilitates the SYCL runtime to use
73
+
optimized memory paths.
74
+
75
+
The SYCL queue created later on uses a custom `CUDASelector` to select a CUDA
76
+
device, or bail out if its not there. The CUDA selector uses the
77
+
`info::device::driver_version` to identify the device exported by the CUDA
78
+
backend. If the NVIDIA OpenCL implementation is available on the system, it
79
+
will be reported as another SYCL device. The driver version is the best way to
80
+
differentiate between the two.
81
+
82
+
The command group is created as a lambda expression that takes the
94
83
`sycl::handler` parameter. Accessors are obtained from buffers using the
95
-
`get_access` method.
96
-
Finally the `parallel_for` with the SYCL kernel is invoked as usual.
84
+
`get_access` method. Finally the `parallel_for` with the SYCL kernel is invoked
85
+
as usual.
97
86
98
-
The command group is submitted to a queue which will convert all the
99
-
operations into CUDA commands that will be executed once the host accessor
100
-
is encountered later on.
87
+
The command group is subm$ itted to a queue which will convert all the operations
88
+
into CUDA commands that will be executed once the host accessor is encountered
89
+
later on.
101
90
102
-
The host accessor will trigger a copy of the data back to the host, and
103
-
then the values are reduced into a single sum element.
91
+
The host accessor will trigger a copy of the data back to the host, and then
0 commit comments