[SYCL][doc] Add mention of cuda gpu arch for enabling cuda-arch specific features (#6978)

hdelan · web-flow · commit 4e5d276f24e9 · 2022-10-06T19:11:03.000+01:00
Some users were mistakenly thinking that native atomics are not
supported in DPC++ for CUDA.

The doc now mentions that you need to provide correct arch flags when
compiling if you wish to use native atomics as well as other features.
diff --git a/sycl/doc/GetStartedGuide.md b/sycl/doc/GetStartedGuide.md
@@ -641,6 +641,17 @@ clang++ -fsycl -fsycl-targets=amdgcn-amd-amdhsa \
   simple-sycl-app.cpp -o simple-sycl-app-amd.exe
 ```
 
+The target architecture may also be specified for the CUDA backend, with 
+`-Xsycl-target-backend --cuda-gpu-arch=<arch>`. Specifying the architecture is 
+necessary if an application aims to use newer hardware features, such as
+native atomic operations or tensor core operations. 
+
+```bash
+clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda \
+  simple-sycl-app.cpp -o simple-sycl-app-cuda.exe \
+  -Xsycl-target-backend --cuda-gpu-arch=sm_80
+```
+
 To build simple-sycl-app ahead of time for GPU, CPU or Accelerator devices,
 specify the target architecture.  The examples provided use a supported
 alias for the target, representing a full triple.  Additional details can