GGML_ASSERT(ggml_is_contiguous(src0) failed — Is ggml not yet supporting MoE with the OpenCL backend? #12909

han98115 · 2025-04-12T04:15:15Z

han98115
Apr 12, 2025

I built llama.cpp with the OpenCL backend and ran llama-cli on a Samsung S25 (SoC: Snapdragon 8 Elite, GPU: Adreno 830).

Most GGUF models run fine, but any MoE model fails to execute on Adreno GPU.

Non-MoE case: (https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF)
pa1q:/data/local/tmp/bin # ./llama-cli -m ../tinyllama-1.1b-chat-v1.0.Q4_0.gguf -ngl 1 # (OK)

MoE case: (https://huggingface.co/tensorblock/TinyLLama-4x1.1B-MoE-GGUF)
pa1q:/data/local/tmp/bin # ./llama-cli -m ../TinyLLama-4x1.1B-MoE-Q4_0.gguf -ngl 0 # (OK)
pa1q:/data/local/tmp/bin # ./llama-cli -m ../TinyLLama-4x1.1B-MoE-Q4_0.gguf -ngl 1 # (Doesn’t work)

pa1q:/data/local/tmp/bin # ./llama-cli -m ../TinyLLama-4x1.1B-MoE-Q4_0.gguf -ngl 1 # (Doesn’t work)
The execution hangs with the following message: GGML_ASSERT(ggml_is_contiguous(src0)) failed

Has anyone else encountered this issue?
I'm wondering if it's because ggml doesn't support MoE with the OpenCL backend yet.

Build command:
cmake .. -G Ninja
-DCMAKE_TOOLCHAIN_FILE=$HOME/android-sdk/ndk/26.3.11579264/build/cmake/android.toolchain.cmake
-DANDROID_ABI=arm64-v8a
-DANDROID_PLATFORM=android-28
-DBUILD_SHARED_LIBS=OFF
-DGGML_OPENCL=ON
-DLLAMA_CURL=OFF
ninja

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GGML_ASSERT(ggml_is_contiguous(src0) failed — Is ggml not yet supporting MoE with the OpenCL backend? #12909

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

GGML_ASSERT(ggml_is_contiguous(src0) failed — Is ggml not yet supporting MoE with the OpenCL backend? #12909

Uh oh!

Uh oh!

han98115 Apr 12, 2025

Replies: 0 comments

han98115
Apr 12, 2025