Skip to content

Enable dpnp build on AMD GPU #2302

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
72bc4d4
Enable CMake options to build dpnp on AMD
vlad-perevezentsev Feb 10, 2025
5f11917
Add build_locally args for AMD build
vlad-perevezentsev Feb 10, 2025
c07e0a7
Remove unused lines
vlad-perevezentsev Feb 10, 2025
323bbb4
Remove ROCM_PATH logic
vlad-perevezentsev Feb 11, 2025
e111ce1
Support amd build for indexing extension
vlad-perevezentsev Feb 11, 2025
ccc7b72
Merge master into enable_amd_build
vlad-perevezentsev Feb 11, 2025
efbab02
Merge master into enable_amd_build
vlad-perevezentsev Mar 14, 2025
310cd82
Support amd build for window extension
vlad-perevezentsev Mar 14, 2025
c3adf4e
Set HIP specific flags for MKL
vlad-perevezentsev Mar 17, 2025
574ea90
pdate logic to use --target-hip
vlad-perevezentsev Mar 18, 2025
d6c5925
Merge master into enable_amd_build
vlad-perevezentsev Mar 18, 2025
5bca529
Add docs for dpnp build on AMD
vlad-perevezentsev Mar 18, 2025
b858ae2
Remove unnecessary HIP_TARGETS validation in CMake
vlad-perevezentsev Mar 18, 2025
273113e
A small docs update
vlad-perevezentsev Mar 18, 2025
c4da3ef
Improve validation of --target and --target-hip
vlad-perevezentsev Apr 16, 2025
e6c280e
Clarify --target-hip usage in doc
vlad-perevezentsev Apr 16, 2025
b27a8a1
Update SYCL target selection logic in CMakeLists
vlad-perevezentsev Apr 16, 2025
2238372
Merge master into enable_amd_build
vlad-perevezentsev Apr 16, 2025
5e2cc3d
Avoid false HIP error when building for default target
vlad-perevezentsev Apr 17, 2025
2eaf883
Merge master into enable_amd_build
vlad-perevezentsev May 8, 2025
0877a32
Merge master into enable_amd_build
vlad-perevezentsev May 9, 2025
395871e
Enable onemkl_interfaces when DPNP_SYCL_TARGETS match amd or cuda
vlad-perevezentsev May 9, 2025
54f44cd
Fix logic for multitarget builds
vlad-perevezentsev May 9, 2025
e57ccf8
Use target aliases and clean up HIP configuration
vlad-perevezentsev May 9, 2025
c2ebe72
Merge master into enable_amd_build
vlad-perevezentsev May 9, 2025
7a89e0f
Update CHANGELOG
vlad-perevezentsev May 14, 2025
07639f6
Merge master into enable_amd_build
vlad-perevezentsev May 14, 2025
d2e5792
Disallow HIP/CUDA targets without oneMKL interface flag
vlad-perevezentsev May 15, 2025
a278f13
Merge remote-tracking branch 'origin/master' into enable_amd_build
vlad-perevezentsev May 15, 2025
e2351ee
Update changelog
vlad-perevezentsev May 15, 2025
f41e5d8
Merge master into enable_amd_build
vlad-perevezentsev May 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 51 additions & 14 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,31 +71,61 @@ option(DPNP_TARGET_CUDA
"Build DPNP to target CUDA devices"
OFF
)
option(DPNP_TARGET_HIP
"Build DPNP to target HIP devices"
OFF
)
option(DPNP_USE_ONEMKL_INTERFACES
"Build DPNP with oneMKL Interfaces"
OFF
)
set(HIP_TARGETS "" CACHE STRING "HIP architecture for target")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no support for multiple values:

Suggested change
set(HIP_TARGETS "" CACHE STRING "HIP architecture for target")
set(HIP_TARGET "" CACHE STRING "HIP architecture for target")

Copy link
Collaborator

@ndgrigorian ndgrigorian Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, it was clear in docs that only one architecture was supported at a time, but now it isn't as clear and should be tested

Also, there is new information in the extension guide

The compiler driver also offers alias targets for each target+architecture pair to make the command line shorter and easier to understand for humans. Thanks to the aliases, the -Xsycl-target-backend flags no longer need to be specified.

It shows that the command

icpx -fsycl -fsycl-targets=spir64_gen,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
        -Xsycl-target-backend=spir64_gen '-device pvc' \
        -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
        -Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
        -o sycl-app sycl-app.cpp

is equivalent to

icpx -fsycl -fsycl-targets=intel_gpu_pvc,amd_gpu_gfx1030,nvidia_gpu_sm_80 \
        -o sycl-app sycl-app.cpp

so maybe both dpctl and dpnp can simplify by removing the need for -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=[X] completely

list of aliases:
https://intel.github.io/llvm/UsersManual.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aliases list seems to claim only one alias is supported at a time. So probably only one architecture at once is possible? That would be my guess

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no support for multiple values:

I am using HIP_TARGETS instead of HIP_TARGET because oneMath requires HIP_TARGETS to be defined
https://github.com/uxlfoundation/oneMath/blob/4ad4dfb5db834117248ad5f8fbded5cfc1097005/CMakeLists.txt#L73

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only declaration of cmake variable which doesn't impact oneMath, isn't that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only declaration of cmake variable which doesn't impact oneMath, isn't that?

According to OneMath documentation HIP_TARGETS must be set for ROCm builds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, it was clear in docs that only one architecture was supported at a time, but now it isn't as clear and should be tested

Also, there is new information in the extension guide

The compiler driver also offers alias targets for each target+architecture pair to make the command line shorter and easier to understand for humans. Thanks to the aliases, the -Xsycl-target-backend flags no longer need to be specified.

It shows that the command

icpx -fsycl -fsycl-targets=spir64_gen,amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
        -Xsycl-target-backend=spir64_gen '-device pvc' \
        -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx1030 \
        -Xsycl-target-backend=nvptx64-nvidia-cuda --offload-arch=sm_80 \
        -o sycl-app sycl-app.cpp

is equivalent to

icpx -fsycl -fsycl-targets=intel_gpu_pvc,amd_gpu_gfx1030,nvidia_gpu_sm_80 \
        -o sycl-app sycl-app.cpp

so maybe both dpctl and dpnp can simplify by removing the need for -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=[X] completely

list of aliases: https://intel.github.io/llvm/UsersManual.html

This is a great suggestion.

  1. The compiler supports more than one target
  2. Using aliases greatly simplifies the logic

In this PR I will implement using aliases for AMD,
In the next PR I will suggest an update for CUDA


set(_dpnp_sycl_targets)
set(_dpnp_amd_targets)
set(_use_onemkl_interfaces OFF)
set(_use_onemkl_interfaces_cuda OFF)
set(_use_onemkl_interfaces_hip OFF)

set(_dpnp_sycl_target_compile_options)
set(_dpnp_sycl_target_link_options)

if ("x${DPNP_SYCL_TARGETS}" STREQUAL "x")
if(DPNP_TARGET_CUDA)
set(_dpnp_sycl_targets "nvptx64-nvidia-cuda,spir64-unknown-unknown")
set(_use_onemkl_interfaces_cuda ON)
else()
if(DEFINED ENV{DPNP_TARGET_CUDA})
set(_dpnp_sycl_targets "nvptx64-nvidia-cuda,spir64-unknown-unknown")
set(_use_onemkl_interfaces_cuda ON)
endif()
endif()
if(DPNP_TARGET_CUDA)
set(_dpnp_sycl_targets "nvptx64-nvidia-cuda,spir64-unknown-unknown")
set(_use_onemkl_interfaces_cuda ON)
endif()
if(DPNP_TARGET_HIP)
if (NOT "x${HIP_TARGETS}" STREQUAL "x")
set(_dpnp_amd_targets ${HIP_TARGETS})
set(_use_onemkl_interfaces_hip ON)
if(_dpnp_sycl_targets)
set(_dpnp_sycl_targets "amdgcn-amd-amdhsa,${_dpnp_sycl_targets}")
else()
set(_dpnp_sycl_targets "amdgcn-amd-amdhsa,spir64-unknown-unknown")
endif()
else()
message(FATAL_ERROR "HIP_TARGETS must be specified when using HIP backend")
endif()
endif()
else()
set(_dpnp_sycl_targets ${DPNP_SYCL_TARGETS})
set(_dpnp_sycl_targets ${DPNP_SYCL_TARGETS})
if (NOT "x${HIP_TARGETS}" STREQUAL "x")
set(_dpnp_amd_targets ${HIP_TARGETS})
set(_use_onemkl_interfaces_hip ON)
endif()
endif()

if(_dpnp_sycl_targets)
if (_dpnp_sycl_targets)
message(STATUS "Compiling for -fsycl-targets=${_dpnp_sycl_targets}")
list(APPEND _dpnp_sycl_target_compile_options -fsycl-targets=${_dpnp_sycl_targets})
list(APPEND _dpnp_sycl_target_link_options -fsycl-targets=${_dpnp_sycl_targets})
if(_dpnp_amd_targets)
list(APPEND _dpnp_sycl_target_compile_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpnp_amd_targets})
list(APPEND _dpnp_sycl_target_link_options -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=${_dpnp_amd_targets})
endif()
endif()

set(_use_onemkl_interfaces OFF)
if(DPNP_USE_ONEMKL_INTERFACES)
set(_use_onemkl_interfaces ON)
else()
Expand All @@ -107,13 +137,20 @@ endif()
if(_use_onemkl_interfaces)
set(BUILD_FUNCTIONAL_TESTS False)
set(BUILD_EXAMPLES False)
set(ENABLE_MKLGPU_BACKEND True)
set(ENABLE_MKLCPU_BACKEND True)

if(_use_onemkl_interfaces_cuda)
set(ENABLE_CUBLAS_BACKEND True)
set(ENABLE_CUSOLVER_BACKEND True)
set(ENABLE_CUFFT_BACKEND True)
# set(ENABLE_CURAND_BACKEND True)
set(ENABLE_MKLGPU_BACKEND True)
set(ENABLE_MKLCPU_BACKEND True)
endif()
if(_use_onemkl_interfaces_hip)
set(ENABLE_ROCBLAS_BACKEND True)
set(ENABLE_ROCSOLVER_BACKEND True)
set(ENABLE_ROCFFT_BACKEND True)
# set(ENABLE_ROCRAND_BACKEND True)
endif()

if(DPNP_ONEMKL_INTERFACES_DIR)
Expand Down
4 changes: 2 additions & 2 deletions dpnp/backend/extensions/indexing/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,12 @@ if(_dpnp_sycl_targets)
target_compile_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_compile_options}
)
target_link_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_link_options}
)
endif()

Expand Down
4 changes: 2 additions & 2 deletions dpnp/backend/extensions/statistics/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,12 @@ if(_dpnp_sycl_targets)
target_compile_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_compile_options}
)
target_link_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_link_options}
)
endif()

Expand Down
4 changes: 2 additions & 2 deletions dpnp/backend/extensions/ufunc/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ if(_dpnp_sycl_targets)
target_compile_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_compile_options}
)
target_link_options(
${python_module_name}
PRIVATE
-fsycl-targets=${_dpnp_sycl_targets}
${_dpnp_sycl_target_link_options}
)
endif()

Expand Down
18 changes: 18 additions & 0 deletions scripts/build_locally.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ def run(
verbose=False,
cmake_opts="",
target="intel",
arch=None,
onemkl_interfaces=False,
onemkl_interfaces_dir=None,
):
Expand Down Expand Up @@ -104,6 +105,16 @@ def run(
# Always builds using oneMKL interfaces for the cuda target
onemkl_interfaces = True

if target == "hip":
if not arch:
raise ValueError("--arch is required when --target=hip")
cmake_args += [
"-DDPNP_TARGET_HIP=ON",
Copy link
Contributor

@antonwolfy antonwolfy Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what do we need to define two variables? Can it be combined in a single one, like in dpctl: -DDPNP_TARGET_HIP={arch}?

Copy link
Collaborator

@ndgrigorian ndgrigorian Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, --target=cuda is current dpnp approach, but:

  1. dpctl and dpnp should consider supporting targeting specific CUDA architectures
  2. --target=hip means that there is no way to build simultaneously for HIP and CUDA (which is very, very much an edge case, but should be considered)

For these reasons, I think it is most sensible to move away from --target= universal approach to --target-cuda= and --target-hip= or something to that effect

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndgrigorian it is a great suggestion.
I have added support for --target-hip and I am going to add --target-cuda instead of --target in the next PR.
Thanks

f"-DHIP_TARGETS={arch}",
]
# Always builds using oneMKL interfaces for the hip target
onemkl_interfaces = True

if onemkl_interfaces:
cmake_args += [
"-DDPNP_USE_ONEMKL_INTERFACES=ON",
Expand Down Expand Up @@ -177,6 +188,12 @@ def run(
default="intel",
type=str,
)
driver.add_argument(
"--arch",
help="Architecture for HIP target",
dest="arch",
type=str,
)
driver.add_argument(
"--onemkl-interfaces",
help="Build using oneMKL Interfaces",
Expand Down Expand Up @@ -244,6 +261,7 @@ def run(
verbose=args.verbose,
cmake_opts=args.cmake_opts,
target=args.target,
arch=args.arch,
onemkl_interfaces=args.onemkl_interfaces,
onemkl_interfaces_dir=args.onemkl_interfaces_dir,
)
Loading