Skip to content

Commit 66ef1ce

Browse files
authored
metal : utilize max shared memory for mul_mat_id (#7935)
1 parent e65bbf6 commit 66ef1ce

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

ggml-metal.m

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1862,9 +1862,10 @@ static enum ggml_status ggml_metal_graph_compute(
18621862
// ne21 = n_rows
18631863
const int dst_rows = ne20*ne21;
18641864
const int dst_rows_min = n_as;
1865+
const int dst_rows_max = (ctx->device.maxThreadgroupMemoryLength - 32 - 8192)/4;
18651866

18661867
// max size of the rowids array in the kernel shared buffer
1867-
GGML_ASSERT(dst_rows <= 2048);
1868+
GGML_ASSERT(dst_rows <= dst_rows_max);
18681869

18691870
// for now the matrix-matrix multiplication kernel only works on A14+/M1+ SoCs
18701871
// AMD GPU and older A-chips will reuse matrix-vector multiplication kernel

0 commit comments

Comments
 (0)