Skip to content

Commit 40ae36d

Browse files
committed
Reapply #9842: Save some size in dtype_util when dtype selective build is not in use
We duplicate a lot of functions depending on the operator name so that dtype selective build will work. We can just detect if dtype selective build is in use and, if not, stop duplicating. Test Plan: compared results of bash test/build_optimized_size_test.sh before/after this rev. Before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` After: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops ``` (This was reverted because the diff it was stacked on was a size regression. Reversing the order instead this time around, and reverted part of the change that was actually regressing size.) ghstack-source-id: 04195891a58c9b82b8ac2f9f437a285e213f456e ghstack-comment-id: 2831329046 Pull-Request-resolved: #10490
1 parent 8704e0c commit 40ae36d

File tree

1 file changed

+28
-1
lines changed

1 file changed

+28
-1
lines changed

kernels/portable/cpu/util/dtype_util.h

+28-1
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ enum class SupportedTensorDtypes {
228228
namespace internal {
229229

230230
template <typename CTYPE_COMPUTE, const char* op_name>
231-
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
231+
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn_impl(
232232
const Tensor& t,
233233
SupportedTensorDtypes dtypes) {
234234
switch (dtypes) {
@@ -251,6 +251,10 @@ load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
251251
return nullptr;
252252
}
253253

254+
// NOTE: applying the #ifdef EXECUTORCH_SELECTIVE_BUILD_DTYPE
255+
// technique used for get_load_to_compute_fn in this path was a size
256+
// regression rather than an improvement. Haven't fully investigated
257+
// why; just be aware when trying to improve size further.
254258
template <typename CTYPE_COMPUTE, const char* op_name>
255259
store_compute_to_tensor_fn<CTYPE_COMPUTE> get_store_compute_to_tensor_fn(
256260
const Tensor& t,
@@ -285,6 +289,29 @@ store_compute_to_tensor_fn<CTYPE_COMPUTE> get_store_compute_to_tensor_fn(
285289
return nullptr;
286290
}
287291

292+
#ifndef EXECUTORCH_SELECTIVE_BUILD_DTYPE
293+
inline constexpr const char kGenericElementwiseOpName[] =
294+
"generic_elementwise_op";
295+
#endif // EXECUTORCH_SELECTIVE_BUILD_DTYPE
296+
297+
template <typename CTYPE_COMPUTE, const char* op_name>
298+
load_to_compute_fn<CTYPE_COMPUTE> get_load_to_compute_fn(
299+
const Tensor& t,
300+
SupportedTensorDtypes dtypes) {
301+
// NOTE: Selective build relies on the operator name being passed
302+
// here. When it's *not* active, using the same operator name
303+
// everywhere saves on size because we don't require a new template
304+
// instantiation for every operator.
305+
return get_load_to_compute_fn_impl<
306+
CTYPE_COMPUTE,
307+
#ifdef EXECUTORCH_SELECTIVE_BUILD_DTYPE
308+
op_name
309+
#else // EXECUTORCH_SELECTIVE_BUILD_DTYPE
310+
kGenericElementwiseOpName
311+
#endif // EXECUTORCH_SELECTIVE_BUILD_DTYPE
312+
>(t, dtypes);
313+
}
314+
288315
bool check_tensor_dtype(
289316
const Tensor t,
290317
SupportedTensorDtypes dtypes,

0 commit comments

Comments
 (0)