Updated documentation to assume the example docker image is being used (#8)

nyalloc · web-flow · commit 14df259393c3 · 2020-07-21T11:06:31.000+01:00
* Updated documentation to assume docker image is being used

* Updated formatting of shell snippets

* Updated formatting of shell snippets
diff --git a/README.md b/README.md
@@ -8,28 +8,32 @@ The examples are built and test in Linux with GCC 7.4, NVCC 10.1 and the
 experimental support for CUDA in the DPC++ SYCL implementation.
 
 CUDA is a registered trademark of NVIDIA Corporation
-SYCL is a trademark of the Khronos Group Inc
+SYCL is a trademark of the Khronos Group Inc.
 
-Docker Image
+Prerequisites
 -------------
 
-There is a docker image available with all the examples and the required
-environment set up, see https://hub.docker.com/r/ruyman/dpcpp_cuda_examples.
+These examples are intended to be used with this [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples). 
+It provides all the examples, libraries and the required environment variables. 
 
-If you have nvidia-docker, you can simply pull the image and run it to build
-the examples:
+[NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker) must be installed to run the image.
 
-```sh
+A useful guide for setting up docker and the NVIDIA Container Toolkit can be found [here](https://www.pugetsystems.com/labs/hpc/Workstation-Setup-for-Docker-with-the-New-NVIDIA-Container-Toolkit-nvidia-docker2-is-deprecated-1568).
+
+Getting Started
+-------------
+
+Once docker and the NVIDIA Container Toolkit are installed, we can create a new container and run the examples witin it.
+
+``` sh
 $ sudo docker run --gpus all -it ruyman/dpcpp_cuda_examples
 ```
 
-Once inside the docker image, navigate to /home/examples/ to find a clone 
-of this repo. Make sure to pull the latest changes:
+Once inside the docker image, navigate to `/home/examples/` to find a local clone of this repo. Make sure to pull the latest changes:
 
-```sh
+``` sh
 $ cd /home/examples/SYCL-For-CUDA-Examples
 $ git pull
 ```
 
-Refer to each example and/or exercise for detailed instructions on how 
-to run it.
+Refer to each example and/or exercise for detailed instructions on how  to run it.
diff --git a/example-01/README.md b/example-01/README.md
@@ -1,53 +1,43 @@
 Example 01: Vector addition 
 ===============================
 
-This trivial example can be used to compare a simple vector addition in 
-CUDA to an equivalent implementation in SYCL for CUDA.
-The aim of the example is also to highlight how to build an application
-with SYCL for CUDA using DPC++ support, for which an example CMakefile is
-provided.
-For detailed documentation on how to migrate from CUDA to SYCL, see 
-[SYCL For CUDA Developers](https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers).
-
-Note currently the CUDA backend does not support  the
-[USM](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) 
-extension, so we use `sycl::buffer` and `sycl::accessors` instead.
+This trivial example can be used to compare a simple vector addition in CUDA to
+an equivalent implementation in SYCL for CUDA. The aim of the example is also 
+to highlight how to build an application with SYCL for CUDA using DPC++ support, 
+for which an example CMakefile is provided. For detailed documentation on how to
+migrate from CUDA to SYCL, see [SYCL For CUDA Developers](https://developer.codeplay.com/products/computecpp/ce/guides/sycl-for-cuda-developers).
+
+Note currently the CUDA backend does not support the [USM](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/USM.adoc) extension, so we use
+`sycl::buffer` and `sycl::accessors` instead.
 
 Pre-requisites
 ---------------
 
-You would need an installation of DPC++ with CUDA support, 
-see [Getting Started Guide](https://github.com/codeplaysoftware/sycl-for-cuda/blob/cuda/sycl/doc/GetStartedWithSYCLCompiler.md) 
-for details on how to build it.
-
-The example has been built on CMake 3.13.3 and nvcc 10.1.243.
+These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image 
+simplifies accessing these examples as the environment is set up correctly.
+For details on how to get started with the example docker image, refer to the 
+root README file.
 
 Building the example
 ---------------------
 
-```sh
-$ mkdir build && cd build`
-$ cmake ../ -DSYCL_ROOT=/path/to/dpc++/install \
-    -DCMAKE_CXX_COMPILER=/path/to/dpc++/install/bin/clang++
+``` sh
+$ mkdir build && cd build
+$ cmake ../ -DSYCL_ROOT=${SYCL_ROOT_DIR} -DCMAKE_CXX_COMPILER=${SYCL_ROOT_DIR}/bin/clang++
 $ make -j 8
 ```
 
-This should produce two binaries, `vector_addition` and `sycl_vector_addition`.
+This should produce two binaries, `vector_addition` and `sycl_vector_addition` .
 The former is the unmodified CUDA source and the second is the SYCL for CUDA
 version.
 
 Running the example
 --------------------
 
-The path to `libsycl.so` and the PI plugins must be in `LD_LIBRARY_PATH`.
-A simple way of running the app is as follows:
-
+``` 
+$ ./sycl_vector_addition
+$ ./vector_addition
 ```
-$ LD_LIBRARY_PATH=/path/to/dpc++/install/lib ./sycl_vector_addition
-```
-
-Note the `SYCL_BE` env variable is not required, since we use a custom
-device selector.
 
 CMake Build script
 ------------------------
@@ -56,19 +46,19 @@ The provided CMake build script uses the native CUDA support to build the
 CUDA application. It also serves as a check that all CUDA requirements
 on the system are available (such as an installation of CUDA on the system).
 
-Two flags are required: `-DSYCL_ROOT`, which must point to the place where the
-DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER`, which must point to
+Two flags are required: `-DSYCL_ROOT` , which must point to the place where the
+DPC++ compiler is installed, and `-DCMAKE_CXX_COMPILER` , which must point to
 the Clang compiler provided by DPC++. 
 
 The CMake target `sycl_vector_addition` will build the SYCL version of
 the application.
+
 Note the variable `SYCL_FLAGS` is used to store the Clang flags that enable
-the compilation of a SYCL application (`-fsycl`) but also the flag that specify
-which targets are built (`-fsycl-targets`).
-In this case, we will build the example for both NVPTX and SPIR64. 
-This means the kernel for the vector addition will be compiled for both
-backends, and runtime selection to the right queue will decide which variant
-to use.
+the compilation of a SYCL application ( `-fsycl` ) but also the flag that specify
+which targets are built ( `-fsycl-targets` ). In this case, we will build the example 
+for both NVPTX and SPIR64. This means the kernel for the vector addition will be 
+compiled for both backends, and runtime selection to the right queue will 
+decide which variant to use.
 
 Note the project is built with C++17 support, which enables the usage of
 [deduction guides](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/deduction_guides/SYCL_INTEL_deduction_guides.asciidoc) to reduce the number of template parameters used.
@@ -77,27 +67,26 @@ SYCL Vector Addition code
 --------------------------
 
 The vector addition example uses a simple approach to implement with a plain
-kernel that performs the add. Vectors are stored directly in buffers.
-Data is initialized on the host using host accessors. 
-This approach avoids creating unnecessary storage on the host, and facilitates
-the SYCL runtime to use optimized memory paths.
-
-The SYCL queue created later on uses a custom `CUDASelector` to select
-a CUDA device, or bail out if its not there. 
-The CUDA selector uses the `info::device::driver_version` to identify the 
-device exported by the CUDA backend.
-If the NVIDIA OpenCL implementation is available on the
-system, it will be reported as another SYCL device. The driver 
-version is the best way to differentiate between the two.
-
-The command group is created as a lambda expression that takes the
+kernel that performs the add. Vectors are stored directly in buffers. Data is
+initialized on the host using host accessors. This approach avoids creating
+unnecessary storage on the host, and facilitates the SYCL runtime to use
+optimized memory paths.
+
+The SYCL queue created later on uses a custom `CUDASelector` to select a CUDA
+device, or bail out if its not there. The CUDA selector uses the
+`info::device::driver_version` to identify the device exported by the CUDA
+backend. If the NVIDIA OpenCL implementation is available on the system, it
+will be reported as another SYCL device. The driver version is the best way to
+differentiate between the two.
+
+The command group is created as a lambda expression that takes the 
 `sycl::handler` parameter. Accessors are obtained from buffers using the
-`get_access` method.
-Finally the `parallel_for` with the SYCL kernel is invoked as usual.
+`get_access` method. Finally the `parallel_for` with the SYCL kernel is invoked
+as usual.
 
-The command group is submitted to a queue which will convert all the 
-operations into CUDA commands that will be executed once the host accessor
-is encountered later on.
+The command group is subm$ itted to a queue which will convert all the operations
+into CUDA commands that will be executed once the host accessor is encountered
+later on.
 
-The host accessor will trigger a copy of the data back to the host, and
-then the values are reduced into a single sum element.
+The host accessor will trigger a copy of the data back to the host, and then
+the values are reduced into a single sum element.
diff --git a/example-02/README.md b/example-02/README.md
@@ -5,41 +5,30 @@ The example shows how to interop with CUBLAS from a SYCL for CUDA application.
 The example uses Codeplay's extension *interop_task* to call the **SGEMM** 
 routine in CUBLAS. Parameters are extracted using the interop handler conversion.
 
-Requirements
-==============
+Pre-requisites
+---------------
 
-Requires CMake 3.17 to configure (makes use of FindCUDAToolkit for simplicity)
-This example must be compiled and executed with the DPC++ compiler.
+These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image 
+simplifies accessing these examples as the environment is set up correctly.
+For details on how to get started with the example docker image, refer to the 
+root README file.
 
 Building the example
 =====================
 
-
-Create a build directory and run the following command:
-
-```
-CXX=/path/to/dpc++/bin/clang++ cmake build/
-```
-
-If NVIDIA CUDA is installed in your system, CMake should be able to generate
-the configuration files.
-
-Then run 
-
-```
-make
+``` sh
+$ mkdir build && cd build
+$ cmake ../
+$ make -j 8
 ```
 
-to build the example
-
 Example
 =========
 
 Two source codes are provided. `sgemm.cu` is the original CUDA code calling
-CUBLAS library to perform the matrix multiplication.
-`sycl_sgemm.cpp` is the sycl variant that calls CUBLAS underneath.
+CUBLAS library to perform the matrix multiplication. `sycl_sgemm.cpp` is the 
+SYCL variant that calls CUBLAS underneath.
 
 Both implementations perform the multiplication of square matrices A and B, 
 where A is a matrix full of ones, and B is an identity matrix.
 The expected output on C is a matrix full of ones.
-
diff --git a/example-03/Makefile b/example-03/Makefile
@@ -1,16 +1,14 @@
 
 
-CUDACXX=${SYCL_ROOT}/bin/clang++
+CUDACXX=${SYCL_ROOT_DIR}/bin/clang++
 
-SYCL_INCLUDE=${SYCL_ROOT}/include/sycl/
+SYCL_INCLUDE=${SYCL_ROOT_DIR}/include/sycl/
 
 CUDAFLAGS=--cuda-gpu-arch=sm_30 
 
 CXXFLAGS=-std=c++17 ${CUDAFLAGS} -I${SYCL_INCLUDE} -g
 
-CUDA_ROOT=/usr/local/cuda/
-
-LIBS=-L${SYCL_ROOT}/include/lib -lOpenCL -lsycl -L${CUDA_ROOT}/lib64 -lcudart
+LIBS=-L${SYCL_ROOT_DIR}/include/lib -lOpenCL -lsycl -L${CUDA_ROOT_DIR}/lib64 -lcudart
 
 default: vec_add.exe usm_vec_add.exe
 
diff --git a/example-03/README.md b/example-03/README.md
@@ -13,56 +13,48 @@ unstable at the time of writting.
 Pre-requisites
 ---------------
 
-You would need an installation of DPC++ with CUDA support, 
-see [Getting Started Guide](https://github.com/intel/llvm/blob/sycl/sycl/doc/GetStartedGuide.md#build-dpc-toolchain-with-support-for-nvidia-cuda)
-for details on how to build it.
+These instructions assume that example [docker image](https://hub.docker.com/r/ruyman/dpcpp_cuda_examples/dockerfile) is being used. This image 
+simplifies accessing these examples as the environment is set up correctly.
+For details on how to get started with the example docker image, refer to the 
+root README file.
 
 The example is built using Makefiles, since there is no support yet on
 a release of CMake for changing the CUDA compiler from nvcc.
 
 Building the example
 ---------------------
 
-```sh
-$ SYCL_ROOT=/path/to/dpcpp   make  
+``` sh
+$ make  
 ```
 
-This compiles the SYCL code with the LLVM CUDA support, and generates
-two binaries.
-NVCC is not used, but the CUDA device libraries need to be available on 
-/usr/local/cuda/lib64/ for linking to the device code.
+This compiles the SYCL code with the LLVM CUDA support, and generates two 
+binaries. NVCC is not used, but the CUDA device libraries need to be available
+on `/usr/local/cuda/lib64/` for linking to the device code.
 
 NVCC compiler does not support some of the advanced C++17 syntax used on the
 SYCL Runtime headers.
 
 Running the example
 --------------------
 
-The path to `libsycl.so` and the PI plugins must be in `LD_LIBRARY_PATH`.
-A simple way of running the example is as follows:
-
-```
-$ LD_LIBRARY_PATH=/path/to/dpcpp/lib:$LD_LIBRARY_PATH  ./vec_add.exe
+``` sh 
+$ ./vec_add.exe
 ```
 
-
 Calling CUDA kernels from SYCL
 -------------------------------
 
 Using Codeplay's `interop_task` extension, the example calls a CUDA kernel from
-a SYCL application.
-Note the example is compiled with the LLVM CUDA compiler, not with the SYCL
-compiler, since there are no SYCL kernels on it. It is only required to link
-against the SYCL runtime library to ensure the runtime can use the application.
+a SYCL application. Note the example is compiled with the LLVM CUDA compiler, 
+not with the SYCL compiler, since there are no SYCL kernels on it. It is only
+required to link against the SYCL runtime library to ensure the runtime can use
+the application.
 
 At the time of writing, it is not possible to have both CUDA and SYCL kernels
-on the same file.
-It is possible to have different files for CUDA and SYCL kernels and call
-them together from a main application at runtime.
-
-The example uses an extension to the SYCL interface to interact with the
-CUDA Runtime API. 
-At the time of writing the extension is not public, so only a boolean flag
-is passed to the `sycl::context` creation.
-
+on the same file. It is possible to have different files for CUDA and SYCL 
+kernels and call them together from a main application at runtime.
 
+The example uses an extension to the SYCL interface to interact with the CUDA
+Runtime API. At the time of writing the extension is not public, so only a
+boolean flagis passed to the `sycl::context` creation.
diff --git a/exercise-01/README.md b/exercise-01/README.md