Skip to content

Commit 4adc66f

Browse files
authored
[Bugfix] Allocate less memory in non-batched CUTLASS MoE (#21121)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
1 parent 55ad648 commit 4adc66f

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm/model_executor/layers/fused_moe/cutlass_moe.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -283,8 +283,8 @@ def workspace_shapes(
283283
(N // 2))
284284
output = (self.max_experts_per_worker, padded_M, K)
285285
else:
286-
workspace1 = (M * topk, max(2 * N, K))
287-
workspace2 = (M * topk, N)
286+
workspace1 = (M * topk, max(N, K))
287+
workspace2 = (M * topk, N // 2)
288288
output = (M * topk, K)
289289
return (workspace1, workspace2, output,
290290
self.out_dtype if self.out_dtype is not None else a.dtype)

0 commit comments

Comments
 (0)