From ff8eca005a4a523c4a17fc49ce72d9843db2bcd1 Mon Sep 17 00:00:00 2001 From: Haoqiang Guo Date: Mon, 23 Jun 2025 14:08:40 -0700 Subject: [PATCH] reorder_batched_ad_indices_kernel on IFR/CFR shape results Summary: To reproduce the results, you could copy the shape from P1843971961 (for IFR) or P1845133642 (for CFR) to create the shape.csv. (6+ hours for the baseline to process 62k+ cases. As such, I select 1000 cases randomly to test the results). v=1: baseline (or v=0, but I recommend to use v=1 to eliminate additional overhead) v=2: optimized kernel Test results https://docs.google.com/spreadsheets/d/19bDXYYQngP5IQ567OiWYHikq-Ga521ElfrkGJqKIw1o/edit?usp=sharing Differential Revision: D77066925 --- fbgemm_gpu/src/sparse_ops/common.cuh | 1 + 1 file changed, 1 insertion(+) diff --git a/fbgemm_gpu/src/sparse_ops/common.cuh b/fbgemm_gpu/src/sparse_ops/common.cuh index c1ce3a8709..a755ddc022 100644 --- a/fbgemm_gpu/src/sparse_ops/common.cuh +++ b/fbgemm_gpu/src/sparse_ops/common.cuh @@ -20,6 +20,7 @@ #include #include #include +#include // clang-format off #include "fbgemm_gpu/utils/cub_namespace_prefix.cuh"