Skip to content

Commit 74e4ae1

Browse files
[SYCL][Doc] Update SYCL --offload-arch option. (#15531)
Document SYCL offloading using `--offload-arch` to Intel CPUs, Intel GPUs, NVidia GPUs and AMD GPUs.
1 parent 9efd43a commit 74e4ae1

File tree

1 file changed

+151
-1
lines changed

1 file changed

+151
-1
lines changed

sycl/doc/design/OffloadDesign.md

Lines changed: 151 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -283,7 +283,7 @@ list to be passed along.
283283
*Example: spir64_gen enabling options*
284284

285285
> --gpu-tool-arg="-device pvc -options extraopt_pvc"
286-
--gpu-tool-arg="-device skl -options -extraopt_skl"
286+
--gpu-tool-arg="-options -extraopt_skl"
287287

288288
*Example: clang-linker-wrapper options*
289289

@@ -296,6 +296,128 @@ resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing
296296
option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed
297297
set of targets.
298298

299+
#### --offload-arch
300+
301+
For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance
302+
``--offload-arch=sm_80`` to target an NVidia Tesla A100,
303+
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
304+
``--offload-arch=sm_80,gfx90a`` to target both.
305+
306+
For Intel Graphics AOT target, valid values for ``--offload-arch`` are mapped to
307+
valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option.
308+
309+
SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.
310+
311+
```
312+
Example:
313+
314+
$ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU.
315+
$ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU.
316+
```
317+
318+
The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.
319+
320+
| Intel GPU device | ``--offload-arch`` accepted value | OCLOC -device value |
321+
|------------------|-------------------------|------------------------|
322+
| Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw |
323+
| Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl |
324+
| Kaby Lake Intel graphics architecture | kbl | kbl |
325+
| Coffee Lake Intel graphics architecture | cfl | cfl |
326+
| Apollo Lake Intel graphics architecture | apl | apl |
327+
| Broxton Intel graphics architecture | bxt | apl |
328+
| Gemini Lake Intel graphics architecture | glk | glk |
329+
| Whiskey Lake Intel graphics architecture | whl | whl |
330+
| Amber Lake Intel graphics architecture | aml | aml |
331+
| Comet Lake Intel graphics architecture | cml | cml |
332+
| Ice Lake Intel graphics architecture | icl, icllp | icllp |
333+
| Elkhart Lake Intel graphics architecture | ehl | ehl |
334+
| Jasper Lake Intel graphics architecture | jsl | jsl |
335+
| Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp |
336+
| Rocket Lake Intel graphics architecture | rkl | rkl |
337+
| Alder Lake S Intel graphics architecture | adl_s | adl_s |
338+
| Raptor Lake Intel graphics architecture | rpl_s | adl_s |
339+
| Alder Lake P Intel graphics architecture | adl_p | adl_p |
340+
| Alder Lake N Intel graphics architecture | adl_n | adl_n |
341+
| DG1 Intel graphics architecture | dg1 | dg1 |
342+
| Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 |
343+
| Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 |
344+
| Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 |
345+
| Ponte Vecchio Intel graphics architecture | pvc | pvc |
346+
| Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg |
347+
| Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s |
348+
| Meteor Lake H Intel graphics architecture | mtl_h | mtl_h |
349+
| Arrow Lake H Intel graphics architecture | arl_h | arl_h |
350+
| Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 |
351+
| Lunar Lake Intel graphics architecture | lnl_m | lnl_m |
352+
353+
#### nvptx64-nvidia-cuda support
354+
For SYCL offloading to NVidia GPUs using ``--offload-arch`` option, the following table
355+
lists the accepted values.
356+
357+
| NVidia GPU device name | ``--offload-arch`` accepted values for NVidia GPUs |
358+
|------------------------|----------------------------------------------------|
359+
| NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 |
360+
| NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 |
361+
| NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 |
362+
| NVIDIA Pascal architecture (compute capability 6.0) | sm_60 |
363+
| NVIDIA Pascal architecture (compute capability 6.1) | sm_61 |
364+
| NVIDIA Pascal architecture (compute capability 6.2) | sm_62 |
365+
| NVIDIA Volta architecture (compute capability 7.0) | sm_70 |
366+
| NVIDIA Volta architecture (compute capability 7.2) | sm_72 |
367+
| NVIDIA Turing architecture (compute capability 7.5) | sm_75 |
368+
| NVIDIA Ampere architecture (compute capability 8.0) | sm_80 |
369+
| NVIDIA Ampere architecture (compute capability 8.6) | sm_86 |
370+
| NVIDIA Jetson/Drive AGX Orin architecture | sm_87 |
371+
| NVIDIA Ada Lovelace architecture | sm_89 |
372+
| NVIDIA Hopper architecture | sm_90 |
373+
| NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a |
374+
375+
#### amdgcn-amd-amdhsa support
376+
377+
For SYCL offloading to AMD GPUs using ``--offload-arch`` option, the following table
378+
lists the accepted values.
379+
380+
| AMD GPU device name | ``--offload-arch`` accepted values for AMD GPUs |
381+
|------------------------|----------------------------------------------------|
382+
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 |
383+
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 |
384+
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 |
385+
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 |
386+
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 |
387+
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 |
388+
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 |
389+
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 |
390+
| AMD GCN GFX9 (Vega) architecture | gfx900 |
391+
| AMD GCN GFX9 (Vega) architecture | gfx902 |
392+
| AMD GCN GFX9 (Vega) architecture | gfx904 |
393+
| AMD GCN GFX9 (Vega) architecture | gfx906 |
394+
| AMD GCN GFX9 (Vega) architecture | gfx908 |
395+
| AMD GCN GFX9 (Vega) architecture | gfx909 |
396+
| AMD GCN GFX9 (Vega) architecture | gfx90a |
397+
| AMD GCN GFX9 (Vega) architecture | gfx90c |
398+
| AMD GCN GFX9 (Vega) architecture | gfx940 |
399+
| AMD GCN GFX9 (Vega) architecture | gfx941 |
400+
| AMD GCN GFX9 (Vega) architecture | gfx942 |
401+
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 |
402+
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 |
403+
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 |
404+
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 |
405+
| AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 |
406+
| GCN GFX10.3 (RDNA 2) architecture | gfx1031 |
407+
| GCN GFX10.3 (RDNA 2) architecture | gfx1032 |
408+
| GCN GFX10.3 (RDNA 2) architecture | gfx1033 |
409+
| GCN GFX10.3 (RDNA 2) architecture | gfx1034 |
410+
| GCN GFX10.3 (RDNA 2) architecture | gfx1035 |
411+
| GCN GFX10.3 (RDNA 2) architecture | gfx1036 |
412+
| GCN GFX11 (RDNA 3) architecture | gfx1100 |
413+
| GCN GFX11 (RDNA 3) architecture | gfx1101 |
414+
| GCN GFX11 (RDNA 3) architecture | gfx1102 |
415+
| GCN GFX11 (RDNA 3) architecture | gfx1103 |
416+
| GCN GFX11 (RDNA 3) architecture | gfx1150 |
417+
| GCN GFX11 (RDNA 3) architecture | gfx1151 |
418+
| GCN GFX12 (RDNA 4) architecture | gfx1200 |
419+
| GCN GFX12 (RDNA 4) architecture | gfx1201 |
420+
299421
#### spir64_fpga support
300422

301423
Compilation behaviors involving AOT for FPGA involve an additional call to
@@ -355,6 +477,34 @@ Additional options passed by the user via the
355477
`-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new
356478
option to the wrapper, `--cpu-tool-arg=<arg>`
357479

480+
Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs
481+
will also leverage the `--offload-arch` option.
482+
The valid CPU device names accepted for `--offload-arch` are CPU names from ``clang -march``.
483+
These names are more verbose, and do not overlap with the Intel GPU names.
484+
These user input CPU names are mapped to the corresponding ``opencl-aot -march`` option.
485+
486+
The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel CPUs and the corresponding `-march` value passed to opencl-aot.
487+
488+
| Intel CPU device | ``--offload-arch`` accepted value | opencl-aot -march value |
489+
|----------------|-------------------------|----------------------------|
490+
| Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 |
491+
| Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 |
492+
| Intel(R) Advanced Vector Extensions | corei7-avx | avx |
493+
| Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 |
494+
| Intel(R) microarchitecture code name Westmere | westmere | wsm |
495+
| Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb |
496+
| Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb |
497+
| Intel(R) microarchitecture code name Broadwell | broadwell | bdw |
498+
| Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl |
499+
| Intel(R) microarchitecture code name Alder Lake | alderlake | adl |
500+
| Intel(R) microarchitecture code name Skylake (client) | skylake | skylake |
501+
| Intel(R) microarchitecture code name Skylake (server) | skx | skx |
502+
| Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk |
503+
| Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl |
504+
| Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx |
505+
| Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr |
506+
| Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr |
507+
358508
### Wrapping of device image
359509

360510
Once the device binary is pulled out of the fat binary, the binary must be

0 commit comments

Comments
 (0)