Update the documentation for building Pybind11 SYCL Backend with CUDA

Hi, I'm trying to build the pybind11 extension mentioned under onemkl_gemv example DPCTL build with CUDA:
https://github.com/IntelPython/dpctl/tree/master/examples/pybind11/onemkl_gemv

Example mentioned fails to run all test cases:

The build works with the following changes, but some tests are still failing:

```
	--- a/examples/pybind11/onemkl_gemv/CMakeLists.txt
	+++ b/examples/pybind11/onemkl_gemv/CMakeLists.txt
	@@ -41,6 +41,9 @@ pybind11_add_module(${py_module_name}
	     ${_sources}
	 )
	 add_sycl_to_target(TARGET ${py_module_name} SOURCES ${_sources})
	+target_compile_options(${py_module_name} PRIVATE -fsycl-targets=nvptx64-nvidia-cuda)
	+target_link_options(${py_module_name} PRIVATE -fsycl-targets=nvptx64-nvidia-cuda)
	+
	 target_compile_definitions(${py_module_name} PRIVATE -DMKL_ILP64)
	 target_include_directories(${py_module_name}
	     PUBLIC ${MKL_INCLUDE_DIR} sycl_gemm
```

I also had to add an additional flag as well while building `sycl_gemv`:

`-DDpctl_DIR=<DPCTL_DIR>/cmake`


Sample reproducer:

```python3
SYCL_PI_TRACE=1 python3 -c 'import dpctl; import dpctl.tensor as dpt; import numpy as np; from sycl_gemm import gemv; q = dpctl.SyclQueue(); Mnp, vnp = np.random.randn(5, 3), np.random.randn(3); M = dpt.asarray(Mnp, sycl_queue=q); v = dpt.asarray(vnp, sycl_queue=q); r = dpt.empty((5,), dtype=v.dtype, sycl_queue=q); hev, ev = gemv(q, M, v, r, []); hev.wait(); rnp = dpt.asnumpy(r);' 
```

While executing this, it failed with:

```
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_opencl.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_level_zero.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_cuda.so [ PluginVersion: 15.49.1 ]
SYCL_PI_TRACE[basic]: Plugin found and successfully loaded: libpi_unified_runtime.so [ PluginVersion: 15.47.1 ]
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100 80GB PCIe
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Requested device_type: info::device_type::automatic
SYCL_PI_TRACE[all]: Selected device: -> final score = 1500
SYCL_PI_TRACE[all]:   platform: NVIDIA CUDA BACKEND
SYCL_PI_TRACE[all]:   device: NVIDIA A100 80GB PCIe
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)
```
Coming back to the source which is invoked, the failure happens when executing the following code([github](https://github.com/IntelPython/dpctl/blob/master/examples/pybind11/onemkl_gemv/sycl_gemm/_onemkl.cpp#L102-L111)):   
 
```c++
   if (v_typenum == api.UAR_DOUBLE_) {
        using T = double;
        sycl::event gemv_ev = oneapi::mkl::blas::row_major::gemv(
            q, oneapi::mkl::transpose::nontrans, n, m, T(1),
            reinterpret_cast<T *>(mat_typeless_ptr), m,
            reinterpret_cast<T *>(v_typeless_ptr), 1, T(0),
            reinterpret_cast<T *>(r_typeless_ptr), 1, depends);
        res_ev = gemv_ev;
    }
```

... and SYCL_PI_TRACE=-1 reported:
 
```
    ---> piextDeviceSelectBinary(
            <unknown> : 0x67c2de0
            <unknown> : 0x68d3780
            <unknown> : 1
            <unknown> : 0x7ffcb6131ebc
    ) --->  pi_result : -42
            [out]<unknown> ** : 0x68d3780[ 0x7f37efe416b0 ... ]
```

`python -m dpctl --full-list` report the following:

```
> python -m dpctl --full-list                                                                                                                         1s
Platform  0 ::
    Name        Intel(R) OpenCL
    Version     OpenCL 3.0 LINUX
    Vendor      Intel(R) Corporation
    Backend     opencl
    Num Devices 1
      # 0
        Name                Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
        Version             2024.18.7.0.11_160000
        Filter string       opencl:cpu:0
Platform  1 ::
    Name        NVIDIA CUDA BACKEND
    Version     CUDA 12.5
    Vendor      NVIDIA Corporation
    Backend     ext_oneapi_cuda
    Num Devices 1
      # 0
        Name                NVIDIA A100 80GB PCIe
        Version             CUDA 12.5
        Filter string       cuda:gpu:0
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update the documentation for building Pybind11 SYCL Backend with CUDA #1843

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Update the documentation for building Pybind11 SYCL Backend with CUDA #1843

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions