Open
Description
Hi, I'm trying to build the pybind11 extension mentioned under onemkl_gemv example DPCTL build with CUDA:
https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv
Example mentioned fails to run all test cases:
The build works with the following changes, but some tests are still failing:
--- a/examples/pybind11/onemkl_gemv/CMakeLists.txt
+++ b/examples/pybind11/onemkl_gemv/CMakeLists.txt
@@ -41,6 +41,9 @@ pybind11_add_module(${py_module_name}
${_sources}
)
add_sycl_to_target(TARGET ${py_module_name} SOURCES ${_sources})
+target_compile_options(${py_module_name} PRIVATE -fsycl-targets=nvptx64-nvidia-cuda)
+target_link_options(${py_module_name} PRIVATE -fsycl-targets=nvptx64-nvidia-cuda)
+
target_compile_definitions(${py_module_name} PRIVATE -DMKL_ILP64)
target_include_directories(${py_module_name}
PUBLIC ${MKL_INCLUDE_DIR} sycl_gemm
I also had to add an additional flag as well while building sycl_gemv
:
-DDpctl_DIR=<DPCTL_DIR>/cmake
Sample reproducer:
SYCL_PI_TRACE=1 python3 -c 'import dpctl; import dpctl.tensor as dpt; import numpy as np; from sycl_gemm import gemv; q = dpctl.SyclQueue(); Mnp, vnp = np.random.randn(5, 3), np.random.randn(3); M = dpt.asarray(Mnp, sycl_queue=q); v = dpt.asarray(vnp, sycl_queue=q); r = dpt.empty((5,), dtype=v.dtype, sycl_queue=q); hev, ev = gemv(q, M, v, r, []); hev.wait(); rnp = dpt.asnumpy(r);'
While executing this, it failed with:
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 15.49.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]: platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]: device: NVIDIA A100 80GB PCIe
Traceback (most recent call last):
File "<string>", line 1, in <module>
RuntimeError: Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
Coming back to the source which is invoked, the failure happens when executing the following code(github):
if (v_typenum == api.UAR_DOUBLE_) {
using T = double;
sycl::event gemv_ev = oneapi::mkl::blas::row_major::gemv(
q, oneapi::mkl::transpose::nontrans, n, m, T(1),
reinterpret_cast<T *>(mat_typeless_ptr), m,
reinterpret_cast<T *>(v_typeless_ptr), 1, T(0),
reinterpret_cast<T *>(r_typeless_ptr), 1, depends);
res_ev = gemv_ev;
}
... and SYCL_PI_TRACE=-1 reported:
---> piextDeviceSelectBinary(
<unknown> : 0x67c2de0
<unknown> : 0x68d3780
<unknown> : 1
<unknown> : 0x7ffcb6131ebc
) ---> pi_result : -42
[out]<unknown> ** : 0x68d3780[ 0x7f37efe416b0 ... ]
python -m dpctl --full-list
report the following:
> python -m dpctl --full-list 1s
Platform 0 ::
Name Intel(R) OpenCL
Version OpenCL 3.0 LINUX
Vendor Intel(R) Corporation
Backend opencl
Num Devices 1
# 0
Name Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Version 2024.18.7.0.11_160000
Filter string opencl:cpu:0
Platform 1 ::
Name NVIDIA CUDA BACKEND
Version CUDA 12.5
Vendor NVIDIA Corporation
Backend ext_oneapi_cuda
Num Devices 1
# 0
Name NVIDIA A100 80GB PCIe
Version CUDA 12.5
Filter string cuda:gpu:0