Skip to content

Commit 2763eba

Browse files
cthifacebook-github-bot
authored andcommitted
Fix OSS performance on FP8 AMD kernels OS (#4462)
Summary: X-link: facebookresearch/FBGEMM#1521 This adds some missing hipcc compiler flags, without these flags certain kernel instances suffer from major performance issues. Pull Request resolved: #4462 Reviewed By: jwfromm Differential Revision: D78010696 Pulled By: cthi fbshipit-source-id: e5d35944f236d90d5c0aa9cb2587a7d6d45540b6
1 parent ba14df1 commit 2763eba

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

fbgemm_gpu/experimental/gen_ai/CMakeLists.txt

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,18 @@ gpu_cpp_library(
149149
${experimental_gen_ai_cpp_source_files_hip}
150150
TORCH_LIBS
151151
# Used when building as part of PyTorch
152-
${FBGEMM_GENAI_TORCH_LIBS})
152+
${FBGEMM_GENAI_TORCH_LIBS}
153+
HIPCC_FLAGS
154+
# Below flags are required for strong CK performance
155+
# on certain kernel instances
156+
-mllvm
157+
# Reduce register spillage on certain kernels
158+
-amdgpu-coerce-illegal-types=1
159+
-mllvm
160+
-enable-post-misched=0
161+
-mllvm
162+
-greedy-reverse-local-assignment=1
163+
-fhip-new-launch-api)
153164

154165

155166

0 commit comments

Comments
 (0)