whisper: validate get_rows support for cpu extra buffer #3323

chaxu01 · 2025-07-14T11:37:29Z

This patch enables KleidiAI to accelerate the Q4_0 matmul operation when it shares weight data with the get_rows operator.

ggerganov · 2025-07-14T12:14:02Z

Could you share some sample numbers of the performance before and after this change?

chaxu01 · 2025-07-14T12:34:00Z

Here's benchmark Comparison (GET_ROWS vs. Baseline) for Pixel 8:

Metric	GET_ROWS	Baseline	% Difference
Load time (ms)	82.16	61.14	25.58%
Encode time (ms)	856.86	841.23	1.82%
Decode time (ms)	401.13	423.74	-5.64%
Batch decode time (ms)	303.76	370.42	-21.94%
Prompt time (ms)	2838.28	3711.67	-30.77%
Total time (ms)	4401.69	5350.61	-21.56%

chaxu01 · 2025-07-14T12:45:22Z

This PR has a dependency on llama.cpp PR #14676, which introduces KleidiAI support for the get_rows operator.

whisper: validate get_rows support for cpu extra buffer

1f0ff47

ggerganov approved these changes Jul 14, 2025

View reviewed changes

ggerganov merged commit 032697b into ggml-org:master Jul 14, 2025
53 checks passed

Provide feedback