Skip to content

Commit 3fb97a7

Browse files
authored
[Optim] Enable cuBLAS GeMM for bfloat16 (#3220)
This PR enables using cuBLAS GeMM dispatch for bfloat16 gemm.
1 parent fd8e84a commit 3fb97a7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

python/mlc_llm/interface/compiler_flags.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ def _cublas_gemm(target, quantization) -> bool:
101101
if not target.kind.name in ["cuda", "rocm"]:
102102
return False
103103
if not (
104-
quantization.name in ["q0f16", "q0f32"]
104+
quantization.name in ["q0f16", "q0bf16", "q0f32"]
105105
or "e4m3" in quantization.name
106106
or "e5m2" in quantization.name
107107
):

0 commit comments

Comments
 (0)