Skip to content

OpenCL issues - RTX 4070 x64 #3077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
peardox opened this issue Apr 25, 2025 · 4 comments
Open

OpenCL issues - RTX 4070 x64 #3077

peardox opened this issue Apr 25, 2025 · 4 comments

Comments

@peardox
Copy link

peardox commented Apr 25, 2025

With whisper-bench I'm getting...

ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 4070 Laptop GPU (OpenCL 3.0 CUDA)'
Unsupported GPU: NVIDIA GeForce RTX 4070 Laptop GPU

Compiled with MSVC Win11 ...

-DGGML_CUDA=1 -DGGML_OPENCL=1 -DGGML_OPENCL_USE_ADRENO_KERNELS=0

Is OpenCL only for phones or something? I've run some OpenCL tests in the past so know OpenCL works, just GGML has problems

OK, CUDA is going to be faster - I just want to run comparisons. I've also got Vulkan support compiled in and that's only slightly slower than CUDA

The Adreno flag sort hints at a possible issue I guess but can't find anything saying it WONT work.

Possibly vckpg's opencl is an issue? [it's openvino was an issue]

@ggerganov
Copy link
Member

More info: ggml-org/llama.cpp#10693

@peardox
Copy link
Author

peardox commented Apr 25, 2025

Thanks - what about Kompute? Any limitations there?

@ggerganov
Copy link
Member

ggerganov commented Apr 25, 2025

The kompute backend has not been maintained for a very long time, so it's not relevant atm.

@peardox
Copy link
Author

peardox commented Apr 25, 2025

It would be highly beneficial if there was a public table of platform status by platform

I mean, ggml_backend_load_all in ggml-backend-reg.cpp tries this...

ggml_backend_load_best("blas", silent, dir_path);
ggml_backend_load_best("cann", silent, dir_path);
ggml_backend_load_best("cuda", silent, dir_path);
ggml_backend_load_best("hip", silent, dir_path);
ggml_backend_load_best("kompute", silent, dir_path);
ggml_backend_load_best("metal", silent, dir_path);
ggml_backend_load_best("rpc", silent, dir_path);
ggml_backend_load_best("sycl", silent, dir_path);
ggml_backend_load_best("vulkan", silent, dir_path);
ggml_backend_load_best("opencl", silent, dir_path);
ggml_backend_load_best("musa", silent, dir_path);
ggml_backend_load_best("cpu", silent, dir_path);

Plus an optional ggml_backend_load(backend_path);

Here's what I currently know

I've got blas, cuda, vulkan and rpc ATM on Windows (+ todo on Linux)
OpenCL is only for Mobile by the look of it
metal is Apple (tried Apple Silicon, optionally CoreML - x64 is an experiment-in-waiting)
Kompute = out of date
cpu = fine of course (but not GGML_CPU_ALL_VARIANTS on Apple Silicon, probably ok on Apple x64)

Not touched cann (Huawei Mobile?), hip, sycl or musa (looks old) yet - any of these good (and on what platforms?)

Raspberry Pi = cpu only (in tests)
Jetson Nano (2019 version) = too old to do CUDA - bad nvcc (CUDA 10.2 - reports CUDA17?)
Jetson Nano Orin (2025) = Can't buy one til August - should be fine

Planning AMD + CUDA tests on cheaper AWS Linux instances (once Windows/Mac via Pascal all good)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants