@@ -283,7 +283,7 @@ list to be passed along.
283
283
* Example: spir64_gen enabling options*
284
284
285
285
> --gpu-tool-arg="-device pvc -options extraopt_pvc"
286
- --gpu-tool-arg="-device skl - options -extraopt_skl"
286
+ --gpu-tool-arg="-options -extraopt_skl"
287
287
288
288
* Example: clang-linker-wrapper options*
289
289
@@ -296,6 +296,128 @@ resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing
296
296
option syntax of ` -fsycl-targets=intel_gpu_arch ` where ` arch ` can be a fixed
297
297
set of targets.
298
298
299
+ #### --offload-arch
300
+
301
+ For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using `` --offload-arch `` option. For instance
302
+ `` --offload-arch=sm_80 `` to target an NVidia Tesla A100,
303
+ `` --offload-arch=gfx90a `` to target an AMD Instinct MI250X, or
304
+ `` --offload-arch=sm_80,gfx90a `` to target both.
305
+
306
+ For Intel Graphics AOT target, valid values for `` --offload-arch `` are mapped to
307
+ valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the `` -device `` option.
308
+
309
+ SYCL offloading with `` --offload-arch `` for Intel CPUs and Intel GPUs is currently enabled only with `` --offload-new-driver `` option.
310
+
311
+ ```
312
+ Example:
313
+
314
+ $ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU.
315
+ $ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU.
316
+ ```
317
+
318
+ The following table shows a mapping of the accepted values for ` --offload-arch ` to enable SYCL offloading to Intel GPUs and the corresponding ` -device ` value passed to OCLOC.
319
+
320
+ | Intel GPU device | `` --offload-arch `` accepted value | OCLOC -device value |
321
+ | ------------------| -------------------------| ------------------------|
322
+ | Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw |
323
+ | Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl |
324
+ | Kaby Lake Intel graphics architecture | kbl | kbl |
325
+ | Coffee Lake Intel graphics architecture | cfl | cfl |
326
+ | Apollo Lake Intel graphics architecture | apl | apl |
327
+ | Broxton Intel graphics architecture | bxt | apl |
328
+ | Gemini Lake Intel graphics architecture | glk | glk |
329
+ | Whiskey Lake Intel graphics architecture | whl | whl |
330
+ | Amber Lake Intel graphics architecture | aml | aml |
331
+ | Comet Lake Intel graphics architecture | cml | cml |
332
+ | Ice Lake Intel graphics architecture | icl, icllp | icllp |
333
+ | Elkhart Lake Intel graphics architecture | ehl | ehl |
334
+ | Jasper Lake Intel graphics architecture | jsl | jsl |
335
+ | Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp |
336
+ | Rocket Lake Intel graphics architecture | rkl | rkl |
337
+ | Alder Lake S Intel graphics architecture | adl_s | adl_s |
338
+ | Raptor Lake Intel graphics architecture | rpl_s | adl_s |
339
+ | Alder Lake P Intel graphics architecture | adl_p | adl_p |
340
+ | Alder Lake N Intel graphics architecture | adl_n | adl_n |
341
+ | DG1 Intel graphics architecture | dg1 | dg1 |
342
+ | Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 |
343
+ | Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 |
344
+ | Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 |
345
+ | Ponte Vecchio Intel graphics architecture | pvc | pvc |
346
+ | Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg |
347
+ | Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s |
348
+ | Meteor Lake H Intel graphics architecture | mtl_h | mtl_h |
349
+ | Arrow Lake H Intel graphics architecture | arl_h | arl_h |
350
+ | Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 |
351
+ | Lunar Lake Intel graphics architecture | lnl_m | lnl_m |
352
+
353
+ #### nvptx64-nvidia-cuda support
354
+ For SYCL offloading to NVidia GPUs using `` --offload-arch `` option, the following table
355
+ lists the accepted values.
356
+
357
+ | NVidia GPU device name | `` --offload-arch `` accepted values for NVidia GPUs |
358
+ | ------------------------| ----------------------------------------------------|
359
+ | NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 |
360
+ | NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 |
361
+ | NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 |
362
+ | NVIDIA Pascal architecture (compute capability 6.0) | sm_60 |
363
+ | NVIDIA Pascal architecture (compute capability 6.1) | sm_61 |
364
+ | NVIDIA Pascal architecture (compute capability 6.2) | sm_62 |
365
+ | NVIDIA Volta architecture (compute capability 7.0) | sm_70 |
366
+ | NVIDIA Volta architecture (compute capability 7.2) | sm_72 |
367
+ | NVIDIA Turing architecture (compute capability 7.5) | sm_75 |
368
+ | NVIDIA Ampere architecture (compute capability 8.0) | sm_80 |
369
+ | NVIDIA Ampere architecture (compute capability 8.6) | sm_86 |
370
+ | NVIDIA Jetson/Drive AGX Orin architecture | sm_87 |
371
+ | NVIDIA Ada Lovelace architecture | sm_89 |
372
+ | NVIDIA Hopper architecture | sm_90 |
373
+ | NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a |
374
+
375
+ #### amdgcn-amd-amdhsa support
376
+
377
+ For SYCL offloading to AMD GPUs using `` --offload-arch `` option, the following table
378
+ lists the accepted values.
379
+
380
+ | AMD GPU device name | `` --offload-arch `` accepted values for AMD GPUs |
381
+ | ------------------------| ----------------------------------------------------|
382
+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 |
383
+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 |
384
+ | AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 |
385
+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 |
386
+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 |
387
+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 |
388
+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 |
389
+ | AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 |
390
+ | AMD GCN GFX9 (Vega) architecture | gfx900 |
391
+ | AMD GCN GFX9 (Vega) architecture | gfx902 |
392
+ | AMD GCN GFX9 (Vega) architecture | gfx904 |
393
+ | AMD GCN GFX9 (Vega) architecture | gfx906 |
394
+ | AMD GCN GFX9 (Vega) architecture | gfx908 |
395
+ | AMD GCN GFX9 (Vega) architecture | gfx909 |
396
+ | AMD GCN GFX9 (Vega) architecture | gfx90a |
397
+ | AMD GCN GFX9 (Vega) architecture | gfx90c |
398
+ | AMD GCN GFX9 (Vega) architecture | gfx940 |
399
+ | AMD GCN GFX9 (Vega) architecture | gfx941 |
400
+ | AMD GCN GFX9 (Vega) architecture | gfx942 |
401
+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 |
402
+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 |
403
+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 |
404
+ | AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 |
405
+ | AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 |
406
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1031 |
407
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1032 |
408
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1033 |
409
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1034 |
410
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1035 |
411
+ | GCN GFX10.3 (RDNA 2) architecture | gfx1036 |
412
+ | GCN GFX11 (RDNA 3) architecture | gfx1100 |
413
+ | GCN GFX11 (RDNA 3) architecture | gfx1101 |
414
+ | GCN GFX11 (RDNA 3) architecture | gfx1102 |
415
+ | GCN GFX11 (RDNA 3) architecture | gfx1103 |
416
+ | GCN GFX11 (RDNA 3) architecture | gfx1150 |
417
+ | GCN GFX11 (RDNA 3) architecture | gfx1151 |
418
+ | GCN GFX12 (RDNA 4) architecture | gfx1200 |
419
+ | GCN GFX12 (RDNA 4) architecture | gfx1201 |
420
+
299
421
#### spir64_fpga support
300
422
301
423
Compilation behaviors involving AOT for FPGA involve an additional call to
@@ -355,6 +477,34 @@ Additional options passed by the user via the
355
477
` -Xsycl-target-backend=spir64_x86_64 <opts> ` command will be processed by a new
356
478
option to the wrapper, ` --cpu-tool-arg=<arg> `
357
479
480
+ Similar to SYCL offloading to Intel GPUs using ` --offload-arch ` , SYCL AOT for Intel CPUs
481
+ will also leverage the ` --offload-arch ` option.
482
+ The valid CPU device names accepted for ` --offload-arch ` are CPU names from `` clang -march `` .
483
+ These names are more verbose, and do not overlap with the Intel GPU names.
484
+ These user input CPU names are mapped to the corresponding `` opencl-aot -march `` option.
485
+
486
+ The following table shows a mapping of the accepted values for ` --offload-arch ` to enable SYCL offloading to Intel CPUs and the corresponding ` -march ` value passed to opencl-aot.
487
+
488
+ | Intel CPU device | `` --offload-arch `` accepted value | opencl-aot -march value |
489
+ | ----------------| -------------------------| ----------------------------|
490
+ | Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 |
491
+ | Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 |
492
+ | Intel(R) Advanced Vector Extensions | corei7-avx | avx |
493
+ | Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 |
494
+ | Intel(R) microarchitecture code name Westmere | westmere | wsm |
495
+ | Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb |
496
+ | Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb |
497
+ | Intel(R) microarchitecture code name Broadwell | broadwell | bdw |
498
+ | Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl |
499
+ | Intel(R) microarchitecture code name Alder Lake | alderlake | adl |
500
+ | Intel(R) microarchitecture code name Skylake (client) | skylake | skylake |
501
+ | Intel(R) microarchitecture code name Skylake (server) | skx | skx |
502
+ | Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk |
503
+ | Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl |
504
+ | Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx |
505
+ | Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr |
506
+ | Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr |
507
+
358
508
### Wrapping of device image
359
509
360
510
Once the device binary is pulled out of the fat binary, the binary must be
0 commit comments