Skip to content

Commit b0430f3

Browse files
committed
lib/crc: simplify the kconfig options for CRC implementations
Make the following simplifications to the kconfig options for choosing CRC implementations for CRC32 and CRC_T10DIF: 1. Make the option to disable the arch-optimized code be visible only when CONFIG_EXPERT=y. 2. Make a single option control the inclusion of the arch-optimized code for all enabled CRC variants. 3. Make CRC32_SARWATE (a.k.a. slice-by-1 or byte-by-byte) be the only generic CRC32 implementation. The result is there is now just one option, CRC_OPTIMIZATIONS, which is default y and can be disabled only when CONFIG_EXPERT=y. Rationale: 1. Enabling the arch-optimized code is nearly always the right choice. However, people trying to build the tiniest kernel possible would find some use in disabling it. Anything we add to CRC32 is de facto unconditional, given that CRC32 gets selected by something in nearly all kernels. And unfortunately enabling the arch CRC code does not eliminate the need to build the generic CRC code into the kernel too, due to CPU feature dependencies. The size of the arch CRC code will also increase slightly over time as more CRC variants get added and more implementations targeting different instruction set extensions get added. Thus, it seems worthwhile to still provide an option to disable it, but it should be considered an expert-level tweak. 2. Considering the use case described in (1), there doesn't seem to be sufficient value in making the arch-optimized CRC code be independently configurable for different CRC variants. Note also that multiple variants were already grouped together, e.g. CONFIG_CRC32 actually enables three different variants of CRC32. 3. The bit-by-bit implementation is uselessly slow, whereas slice-by-n for n=4 and n=8 use tables that are inconveniently large: 4096 bytes and 8192 bytes respectively, compared to 1024 bytes for n=1. Higher n gives higher instruction-level parallelism, so higher n easily wins on traditional microbenchmarks on most CPUs. However, the larger tables, which are accessed randomly, can be harmful in real-world situations where the dcache may be cold or useful data may need be evicted from the dcache. Meanwhile, today most architectures have much faster CRC32 implementations using dedicated CRC32 instructions or carryless multiplication instructions anyway, which make the generic code obsolete in most cases especially on long messages. Another reason for going with n=1 is that this is already what is used by all the other CRC variants in the kernel. CRC32 was unique in having support for larger tables. But as per the above this can be considered an outdated optimization. The standardization on slice-by-1 a.k.a. CRC32_SARWATE makes much of the code in lib/crc32.c unused. A later patch will clean that up. Link: https://lore.kernel.org/r/20250123212904.118683-2-ebiggers@kernel.org Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com>
1 parent d0d106a commit b0430f3

File tree

1 file changed

+14
-102
lines changed

1 file changed

+14
-102
lines changed

lib/Kconfig

Lines changed: 14 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -164,34 +164,9 @@ config CRC_T10DIF
164164
config ARCH_HAS_CRC_T10DIF
165165
bool
166166

167-
choice
168-
prompt "CRC-T10DIF implementation"
169-
depends on CRC_T10DIF
170-
default CRC_T10DIF_IMPL_ARCH if ARCH_HAS_CRC_T10DIF
171-
default CRC_T10DIF_IMPL_GENERIC if !ARCH_HAS_CRC_T10DIF
172-
help
173-
This option allows you to override the default choice of CRC-T10DIF
174-
implementation.
175-
176-
config CRC_T10DIF_IMPL_ARCH
177-
bool "Architecture-optimized" if ARCH_HAS_CRC_T10DIF
178-
help
179-
Use the optimized implementation of CRC-T10DIF for the selected
180-
architecture. It is recommended to keep this enabled, as it can
181-
greatly improve CRC-T10DIF performance.
182-
183-
config CRC_T10DIF_IMPL_GENERIC
184-
bool "Generic implementation"
185-
help
186-
Use the generic table-based implementation of CRC-T10DIF. Selecting
187-
this will reduce code size slightly but can greatly reduce CRC-T10DIF
188-
performance.
189-
190-
endchoice
191-
192167
config CRC_T10DIF_ARCH
193168
tristate
194-
default CRC_T10DIF if CRC_T10DIF_IMPL_ARCH
169+
default CRC_T10DIF if ARCH_HAS_CRC_T10DIF && CRC_OPTIMIZATIONS
195170

196171
config CRC64_ROCKSOFT
197172
tristate "CRC calculation for the Rocksoft model CRC64"
@@ -214,6 +189,7 @@ config CRC32
214189
tristate "CRC32/CRC32c functions"
215190
default y
216191
select BITREVERSE
192+
select CRC32_SARWATE
217193
help
218194
This option is provided for the case where no in-kernel-tree
219195
modules require CRC32/CRC32c functions, but a module built outside
@@ -223,87 +199,12 @@ config CRC32
223199
config ARCH_HAS_CRC32
224200
bool
225201

226-
choice
227-
prompt "CRC32 implementation"
228-
depends on CRC32
229-
default CRC32_IMPL_ARCH_PLUS_SLICEBY8 if ARCH_HAS_CRC32
230-
default CRC32_IMPL_SLICEBY8 if !ARCH_HAS_CRC32
231-
help
232-
This option allows you to override the default choice of CRC32
233-
implementation. Choose the default unless you know that you need one
234-
of the others.
235-
236-
config CRC32_IMPL_ARCH_PLUS_SLICEBY8
237-
bool "Arch-optimized, with fallback to slice-by-8" if ARCH_HAS_CRC32
238-
help
239-
Use architecture-optimized implementation of CRC32. Fall back to
240-
slice-by-8 in cases where the arch-optimized implementation cannot be
241-
used, e.g. if the CPU lacks support for the needed instructions.
242-
243-
This is the default when an arch-optimized implementation exists.
244-
245-
config CRC32_IMPL_ARCH_PLUS_SLICEBY1
246-
bool "Arch-optimized, with fallback to slice-by-1" if ARCH_HAS_CRC32
247-
help
248-
Use architecture-optimized implementation of CRC32, but fall back to
249-
slice-by-1 instead of slice-by-8 in order to reduce the binary size.
250-
251-
config CRC32_IMPL_SLICEBY8
252-
bool "Slice by 8 bytes"
253-
help
254-
Calculate checksum 8 bytes at a time with a clever slicing algorithm.
255-
This is much slower than the architecture-optimized implementation of
256-
CRC32 (if the selected arch has one), but it is portable and is the
257-
fastest implementation when no arch-optimized implementation is
258-
available. It uses an 8KiB lookup table. Most modern processors have
259-
enough cache to hold this table without thrashing the cache.
260-
261-
config CRC32_IMPL_SLICEBY4
262-
bool "Slice by 4 bytes"
263-
help
264-
Calculate checksum 4 bytes at a time with a clever slicing algorithm.
265-
This is a bit slower than slice by 8, but has a smaller 4KiB lookup
266-
table.
267-
268-
Only choose this option if you know what you are doing.
269-
270-
config CRC32_IMPL_SLICEBY1
271-
bool "Slice by 1 byte (Sarwate's algorithm)"
272-
help
273-
Calculate checksum a byte at a time using Sarwate's algorithm. This
274-
is not particularly fast, but has a small 1KiB lookup table.
275-
276-
Only choose this option if you know what you are doing.
277-
278-
config CRC32_IMPL_BIT
279-
bool "Classic Algorithm (one bit at a time)"
280-
help
281-
Calculate checksum one bit at a time. This is VERY slow, but has
282-
no lookup table. This is provided as a debugging option.
283-
284-
Only choose this option if you are debugging crc32.
285-
286-
endchoice
287-
288202
config CRC32_ARCH
289203
tristate
290-
default CRC32 if CRC32_IMPL_ARCH_PLUS_SLICEBY8 || CRC32_IMPL_ARCH_PLUS_SLICEBY1
291-
292-
config CRC32_SLICEBY8
293-
bool
294-
default y if CRC32_IMPL_SLICEBY8 || CRC32_IMPL_ARCH_PLUS_SLICEBY8
295-
296-
config CRC32_SLICEBY4
297-
bool
298-
default y if CRC32_IMPL_SLICEBY4
204+
default CRC32 if ARCH_HAS_CRC32 && CRC_OPTIMIZATIONS
299205

300206
config CRC32_SARWATE
301207
bool
302-
default y if CRC32_IMPL_SLICEBY1 || CRC32_IMPL_ARCH_PLUS_SLICEBY1
303-
304-
config CRC32_BIT
305-
bool
306-
default y if CRC32_IMPL_BIT
307208

308209
config CRC64
309210
tristate "CRC64 functions"
@@ -343,6 +244,17 @@ config CRC8
343244
when they need to do cyclic redundancy check according CRC8
344245
algorithm. Module will be called crc8.
345246

247+
config CRC_OPTIMIZATIONS
248+
bool "Enable optimized CRC implementations" if EXPERT
249+
default y
250+
help
251+
Disabling this option reduces code size slightly by disabling the
252+
architecture-optimized implementations of any CRC variants that are
253+
enabled. CRC checksumming performance may get much slower.
254+
255+
Keep this enabled unless you're really trying to minimize the size of
256+
the kernel.
257+
346258
config XXHASH
347259
tristate
348260

0 commit comments

Comments
 (0)