Illegal Instruction using Q4_0_4_4 on Rockchip rk3399 SOC #9853
-
IntroductionI'm experimenting with llama.cpp on Linux, running on the Pinephone Pro, a device which uses a Rockchip rk3399 SOC. Q4 quantz offers best performance so far, but i want to try the neon-optimized version. However, when running llama.cpp on the device, using a Q4_0_4_4 model, the program crashes, reporting an Illegal Instruction just after loading the model (supposedly when starting inference). Troubleshooting
ConclusionWhat i am missing to be able to run a Q4_0_4_4 model on this SOC / system ? Is it even possible ? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Does it work with this patch: diff --git a/ggml/src/ggml-aarch64.c b/ggml/src/ggml-aarch64.c
index b27f4114..7c6817b3 100644
--- a/ggml/src/ggml-aarch64.c
+++ b/ggml/src/ggml-aarch64.c
@@ -617,7 +617,7 @@ void ggml_gemv_q4_0_4x4_q8_0(int n, float * restrict s, size_t bs, const void *
UNUSED(ncols_interleaved);
UNUSED(blocklen);
-#if ! ((defined(_MSC_VER)) && ! defined(__clang__)) && defined(__aarch64__) && defined(__ARM_NEON)
+#if 0
if (ggml_cpu_has_neon()) {
const void * b_ptr = vx;
const void * a_ptr = vy; This disables the actual |
Beta Was this translation helpful? Give feedback.
-
Note : I was (and still am) running my tests in interactive mode by launching : After applying the patch, the program doesn't crash after loading the model anymore and it displays the interactive prompt token ('> '). I suppose the code you had me patch was called only during model initialization and now the program crash because it encountered another code path with Neon instructions ? |
Beta Was this translation helpful? Give feedback.
This instruction (use in ggml-aarch64.c) is not available on all neon arm CPU, the RK3399 is a old ARMv8-A with no sdot support.
so if I do not make mistake you can't use "accelerate" Q4_0_4_4 model with it.