Skip to content

Commit 14df259

Browse files
authored
Updated documentation to assume the example docker image is being used (#8)
* Updated documentation to assume docker image is being used * Updated formatting of shell snippets * Updated formatting of shell snippets
1 parent 5d257b2 commit 14df259

File tree

6 files changed

+118
-152
lines changed

6 files changed

+118
-152
lines changed

README.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,28 +8,32 @@ The examples are built and test in Linux with GCC 7.4, NVCC 10.1 and the
88
experimental support for CUDA in the DPC++ SYCL implementation.
99

1010
CUDA is a registered trademark of NVIDIA Corporation
11-
SYCL is a trademark of the Khronos Group Inc
11+
SYCL is a trademark of the Khronos Group Inc.
1212

13-
Docker Image
13+
Prerequisites
1414
-------------
1515

16-
There is a docker image available with all the examples and the required
17-
environment set up, see https://hub.docker.com/r/ruyman/dpcpp_cuda_examples.
16+
These examples are intended to be used with this [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples).
17+
It provides all the examples, libraries and the required environment variables.
1818

19-
If you have nvidia-docker, you can simply pull the image and run it to build
20-
the examples:
19+
[NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) must be installed to run the image.
2120

22-
```sh
21+
A useful guide for setting up docker and the NVIDIA Container Toolkit can be found [here](https://www.pugetsystems.com/labs/hpc/Workstation-Setup-for-Docker-with-the-New-NVIDIA-Container-Toolkit-nvidia-docker2-is-deprecated-1568).
22+
23+
Getting Started
24+
-------------
25+
26+
Once docker and the NVIDIA Container Toolkit are installed, we can create a new container and run the examples witin it.
27+
28+
``` sh
2329
$ sudo docker run --gpus all -it ruyman/dpcpp_cuda_examples
2430
```
2531

26-
Once inside the docker image, navigate to /home/examples/ to find a clone
27-
of this repo. Make sure to pull the latest changes:
32+
Once inside the docker image, navigate to `/home/examples/` to find a local clone of this repo. Make sure to pull the latest changes:
2833

29-
```sh
34+
``` sh
3035
$ cd /home/examples/SYCL-For-CUDA-Examples
3136
$ git pull
3237
```
3338

34-
Refer to each example and/or exercise for detailed instructions on how
35-
to run it.
39+
Refer to each example and/or exercise for detailed instructions on how to run it.

example-01/README.md

Lines changed: 47 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,43 @@
11
Example 01: Vector addition
22
===============================
33

4-
This trivial example can be used to compare a simple vector addition in
5-
CUDA to an equivalent implementation in SYCL for CUDA.
6-
The aim of the example is also to highlight how to build an application
7-
with SYCL for CUDA using DPC++ support, for which an example CMakefile is
8-
provided.
9-
For detailed documentation on how to migrate from CUDA to SYCL, see
10-
[SYCL For CUDA Developers](https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers).
11-
12-
Note currently the CUDA backend does not support the
13-
[USM](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc)
14-
extension, so we use `sycl::buffer` and `sycl::accessors` instead.
4+
This trivial example can be used to compare a simple vector addition in CUDA to
5+
an equivalent implementation in SYCL for CUDA. The aim of the example is also
6+
to highlight how to build an application with SYCL for CUDA using DPC++ support,
7+
for which an example CMakefile is provided. For detailed documentation on how to
8+
migrate from CUDA to SYCL, see [SYCL For CUDA Developers](https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers).
9+
10+
Note currently the CUDA backend does not support the [USM](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) extension, so we use
11+
`sycl::buffer` and `sycl::accessors` instead.
1512

1613
Pre-requisites
1714
---------------
1815

19-
You would need an installation of DPC++ with CUDA support,
20-
see [Getting Started Guide](https://github.com/codeplaysoftware/sycl-for-cuda/blob/cuda/sycl/doc/GetStartedWithSYCLCompiler.md)
21-
for details on how to build it.
22-
23-
The example has been built on CMake 3.13.3 and nvcc 10.1.243.
16+
These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image
17+
simplifies accessing these examples as the environment is set up correctly.
18+
For details on how to get started with the example docker image, refer to the
19+
root README file.
2420

2521
Building the example
2622
---------------------
2723

28-
```sh
29-
$ mkdir build && cd build`
30-
$ cmake ../ -DSYCL_ROOT=/path/to/dpc++/install \
31-
-DCMAKE_CXX_COMPILER=/path/to/dpc++/install/bin/clang++
24+
``` sh
25+
$ mkdir build && cd build
26+
$ cmake ../ -DSYCL_ROOT=${SYCL_ROOT_DIR} -DCMAKE_CXX_COMPILER=${SYCL_ROOT_DIR}/bin/clang++
3227
$ make -j 8
3328
```
3429

35-
This should produce two binaries, `vector_addition` and `sycl_vector_addition`.
30+
This should produce two binaries, `vector_addition` and `sycl_vector_addition` .
3631
The former is the unmodified CUDA source and the second is the SYCL for CUDA
3732
version.
3833

3934
Running the example
4035
--------------------
4136

42-
The path to `libsycl.so` and the PI plugins must be in `LD_LIBRARY_PATH`.
43-
A simple way of running the app is as follows:
44-
37+
```
38+
$ ./sycl_vector_addition
39+
$ ./vector_addition
4540
```
46-
$ LD_LIBRARY_PATH=/path/to/dpc++/install/lib ./sycl_vector_addition
47-
```
48-
49-
Note the `SYCL_BE` env variable is not required, since we use a custom
50-
device selector.
5141

5242
CMake Build script
5343
------------------------
@@ -56,19 +46,19 @@ The provided CMake build script uses the native CUDA support to build the
5646
CUDA application. It also serves as a check that all CUDA requirements
5747
on the system are available (such as an installation of CUDA on the system).
5848

59-
Two flags are required: `-DSYCL_ROOT`, which must point to the place where the
60-
DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER`, which must point to
49+
Two flags are required: `-DSYCL_ROOT` , which must point to the place where the
50+
DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER` , which must point to
6151
the Clang compiler provided by DPC++.
6252

6353
The CMake target `sycl_vector_addition` will build the SYCL version of
6454
the application.
55+
6556
Note the variable `SYCL_FLAGS` is used to store the Clang flags that enable
66-
the compilation of a SYCL application (`-fsycl`) but also the flag that specify
67-
which targets are built (`-fsycl-targets`).
68-
In this case, we will build the example for both NVPTX and SPIR64.
69-
This means the kernel for the vector addition will be compiled for both
70-
backends, and runtime selection to the right queue will decide which variant
71-
to use.
57+
the compilation of a SYCL application ( `-fsycl` ) but also the flag that specify
58+
which targets are built ( `-fsycl-targets` ). In this case, we will build the example
59+
for both NVPTX and SPIR64. This means the kernel for the vector addition will be
60+
compiled for both backends, and runtime selection to the right queue will
61+
decide which variant to use.
7262

7363
Note the project is built with C++17 support, which enables the usage of
7464
[deduction guides](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/deduction_guides/SYCL_INTEL_deduction_guides.asciidoc) to reduce the number of template parameters used.
@@ -77,27 +67,26 @@ SYCL Vector Addition code
7767
--------------------------
7868

7969
The vector addition example uses a simple approach to implement with a plain
80-
kernel that performs the add. Vectors are stored directly in buffers.
81-
Data is initialized on the host using host accessors.
82-
This approach avoids creating unnecessary storage on the host, and facilitates
83-
the SYCL runtime to use optimized memory paths.
84-
85-
The SYCL queue created later on uses a custom `CUDASelector` to select
86-
a CUDA device, or bail out if its not there.
87-
The CUDA selector uses the `info::device::driver_version` to identify the
88-
device exported by the CUDA backend.
89-
If the NVIDIA OpenCL implementation is available on the
90-
system, it will be reported as another SYCL device. The driver
91-
version is the best way to differentiate between the two.
92-
93-
The command group is created as a lambda expression that takes the
70+
kernel that performs the add. Vectors are stored directly in buffers. Data is
71+
initialized on the host using host accessors. This approach avoids creating
72+
unnecessary storage on the host, and facilitates the SYCL runtime to use
73+
optimized memory paths.
74+
75+
The SYCL queue created later on uses a custom `CUDASelector` to select a CUDA
76+
device, or bail out if its not there. The CUDA selector uses the
77+
`info::device::driver_version` to identify the device exported by the CUDA
78+
backend. If the NVIDIA OpenCL implementation is available on the system, it
79+
will be reported as another SYCL device. The driver version is the best way to
80+
differentiate between the two.
81+
82+
The command group is created as a lambda expression that takes the
9483
`sycl::handler` parameter. Accessors are obtained from buffers using the
95-
`get_access` method.
96-
Finally the `parallel_for` with the SYCL kernel is invoked as usual.
84+
`get_access` method. Finally the `parallel_for` with the SYCL kernel is invoked
85+
as usual.
9786

98-
The command group is submitted to a queue which will convert all the
99-
operations into CUDA commands that will be executed once the host accessor
100-
is encountered later on.
87+
The command group is subm$ itted to a queue which will convert all the operations
88+
into CUDA commands that will be executed once the host accessor is encountered
89+
later on.
10190

102-
The host accessor will trigger a copy of the data back to the host, and
103-
then the values are reduced into a single sum element.
91+
The host accessor will trigger a copy of the data back to the host, and then
92+
the values are reduced into a single sum element.

example-02/README.md

Lines changed: 12 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,41 +5,30 @@ The example shows how to interop with CUBLAS from a SYCL for CUDA application.
55
The example uses Codeplay's extension *interop_task* to call the **SGEMM**
66
routine in CUBLAS. Parameters are extracted using the interop handler conversion.
77

8-
Requirements
9-
==============
8+
Pre-requisites
9+
---------------
1010

11-
Requires CMake 3.17 to configure (makes use of FindCUDAToolkit for simplicity)
12-
This example must be compiled and executed with the DPC++ compiler.
11+
These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image
12+
simplifies accessing these examples as the environment is set up correctly.
13+
For details on how to get started with the example docker image, refer to the
14+
root README file.
1315

1416
Building the example
1517
=====================
1618

17-
18-
Create a build directory and run the following command:
19-
20-
```
21-
CXX=/path/to/dpc++/bin/clang++ cmake build/
22-
```
23-
24-
If NVIDIA CUDA is installed in your system, CMake should be able to generate
25-
the configuration files.
26-
27-
Then run
28-
29-
```
30-
make
19+
``` sh
20+
$ mkdir build && cd build
21+
$ cmake ../
22+
$ make -j 8
3123
```
3224

33-
to build the example
34-
3525
Example
3626
=========
3727

3828
Two source codes are provided. `sgemm.cu` is the original CUDA code calling
39-
CUBLAS library to perform the matrix multiplication.
40-
`sycl_sgemm.cpp` is the sycl variant that calls CUBLAS underneath.
29+
CUBLAS library to perform the matrix multiplication. `sycl_sgemm.cpp` is the
30+
SYCL variant that calls CUBLAS underneath.
4131

4232
Both implementations perform the multiplication of square matrices A and B,
4333
where A is a matrix full of ones, and B is an identity matrix.
4434
The expected output on C is a matrix full of ones.
45-

example-03/Makefile

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,14 @@
11

22

3-
CUDACXX=${SYCL_ROOT}/bin/clang++
3+
CUDACXX=${SYCL_ROOT_DIR}/bin/clang++
44

5-
SYCL_INCLUDE=${SYCL_ROOT}/include/sycl/
5+
SYCL_INCLUDE=${SYCL_ROOT_DIR}/include/sycl/
66

77
CUDAFLAGS=--cuda-gpu-arch=sm_30
88

99
CXXFLAGS=-std=c++17 ${CUDAFLAGS} -I${SYCL_INCLUDE} -g
1010

11-
CUDA_ROOT=/usr/local/cuda/
12-
13-
LIBS=-L${SYCL_ROOT}/include/lib -lOpenCL -lsycl -L${CUDA_ROOT}/lib64 -lcudart
11+
LIBS=-L${SYCL_ROOT_DIR}/include/lib -lOpenCL -lsycl -L${CUDA_ROOT_DIR}/lib64 -lcudart
1412

1513
default: vec_add.exe usm_vec_add.exe
1614

example-03/README.md

Lines changed: 20 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -13,56 +13,48 @@ unstable at the time of writting.
1313
Pre-requisites
1414
---------------
1515

16-
You would need an installation of DPC++ with CUDA support,
17-
see [Getting Started Guide](https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#build-dpc-toolchain-with-support-for-nvidia-cuda)
18-
for details on how to build it.
16+
These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image
17+
simplifies accessing these examples as the environment is set up correctly.
18+
For details on how to get started with the example docker image, refer to the
19+
root README file.
1920

2021
The example is built using Makefiles, since there is no support yet on
2122
a release of CMake for changing the CUDA compiler from nvcc.
2223

2324
Building the example
2425
---------------------
2526

26-
```sh
27-
$ SYCL_ROOT=/path/to/dpcpp make
27+
``` sh
28+
$ make
2829
```
2930

30-
This compiles the SYCL code with the LLVM CUDA support, and generates
31-
two binaries.
32-
NVCC is not used, but the CUDA device libraries need to be available on
33-
/usr/local/cuda/lib64/ for linking to the device code.
31+
This compiles the SYCL code with the LLVM CUDA support, and generates two
32+
binaries. NVCC is not used, but the CUDA device libraries need to be available
33+
on `/usr/local/cuda/lib64/` for linking to the device code.
3434

3535
NVCC compiler does not support some of the advanced C++17 syntax used on the
3636
SYCL Runtime headers.
3737

3838
Running the example
3939
--------------------
4040

41-
The path to `libsycl.so` and the PI plugins must be in `LD_LIBRARY_PATH`.
42-
A simple way of running the example is as follows:
43-
44-
```
45-
$ LD_LIBRARY_PATH=/path/to/dpcpp/lib:$LD_LIBRARY_PATH ./vec_add.exe
41+
``` sh
42+
$ ./vec_add.exe
4643
```
4744

48-
4945
Calling CUDA kernels from SYCL
5046
-------------------------------
5147

5248
Using Codeplay's `interop_task` extension, the example calls a CUDA kernel from
53-
a SYCL application.
54-
Note the example is compiled with the LLVM CUDA compiler, not with the SYCL
55-
compiler, since there are no SYCL kernels on it. It is only required to link
56-
against the SYCL runtime library to ensure the runtime can use the application.
49+
a SYCL application. Note the example is compiled with the LLVM CUDA compiler,
50+
not with the SYCL compiler, since there are no SYCL kernels on it. It is only
51+
required to link against the SYCL runtime library to ensure the runtime can use
52+
the application.
5753

5854
At the time of writing, it is not possible to have both CUDA and SYCL kernels
59-
on the same file.
60-
It is possible to have different files for CUDA and SYCL kernels and call
61-
them together from a main application at runtime.
62-
63-
The example uses an extension to the SYCL interface to interact with the
64-
CUDA Runtime API.
65-
At the time of writing the extension is not public, so only a boolean flag
66-
is passed to the `sycl::context` creation.
67-
55+
on the same file. It is possible to have different files for CUDA and SYCL
56+
kernels and call them together from a main application at runtime.
6857

58+
The example uses an extension to the SYCL interface to interact with the CUDA
59+
Runtime API. At the time of writing the extension is not public, so only a
60+
boolean flagis passed to the `sycl::context` creation.

0 commit comments

Comments
 (0)