Skip to content

Commit ef4df57

Browse files
committed
perf: increase min buckets on very small types
Consider `HashSet<u8>` on x86_64 with SSE with various bucket sizes and how many bytes the allocation ends up being: | buckets | capacity | allocated bytes | | ------- | -------- | --------------- | | 4 | 3 | 36 | | 8 | 7 | 40 | | 16 | 14 | 48 | | 32 | 28 | 80 | In general, doubling the number of buckets should roughly double the number of bytes used. However, for small bucket sizes for these small TableLayouts (4 -> 8, 8 -> 16), it doesn't happen. This is an edge case which happens because of padding of the control bytes and adding the Group::WIDTH. Taking the buckets from 4 to 16 (4x) only takes the allocated bytes from 36 to 48 (~1.3x). This platform isn't the only one with edges. Here's aarch64 on an M1 for the same `HashSet<u8>`: | buckets | capacity | allocated bytes | | ------- | -------- | --------------- | | 4 | 3 | 20 | | 8 | 7 | 24 | | 16 | 14 | 40 | Notice 4 -> 8 buckets leading to only 4 more bytes (20 -> 24) instead of roughly doubling. Generalized, `buckets * table_layout.size` needs to be at least as big as `table_layout.ctrl_align`. For the cases I listed above, we'd get these new minimum bucket sizes: - x86_64 with SSE: 16 - aarch64: 8 This is a niche optimization. However, it also removes possible undefined behavior edge case in resize operations. In addition, it may be a useful property to utilize over-sized allocations (see #523).
1 parent 4c824c5 commit ef4df57

File tree

1 file changed

+50
-15
lines changed

1 file changed

+50
-15
lines changed

src/raw/mod.rs

Lines changed: 50 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -192,14 +192,35 @@ impl ProbeSeq {
192192
// Workaround for emscripten bug emscripten-core/emscripten-fastcomp#258
193193
#[cfg_attr(target_os = "emscripten", inline(never))]
194194
#[cfg_attr(not(target_os = "emscripten"), inline)]
195-
fn capacity_to_buckets(cap: usize) -> Option<usize> {
195+
fn capacity_to_buckets(cap: usize, table_layout: TableLayout) -> Option<usize> {
196196
debug_assert_ne!(cap, 0);
197197

198-
// For small tables we require at least 1 empty bucket so that lookups are
199-
// guaranteed to terminate if an element doesn't exist in the table.
198+
// Consider a small layout like TableLayout { size: 1, ctrl_align: 16 } on
199+
// a platform with Group::WIDTH of 16 (like x86_64 with SSE2). For small
200+
// bucket sizes, this ends up wasting quite a few bytes just to pad to the
201+
// relatively larger ctrl_align:
202+
//
203+
// | capacity | buckets | bytes allocated | bytes per item |
204+
// | -------- | ------- | --------------- | -------------- |
205+
// | 3 | 4 | 36 | (Yikes!) 12.0 |
206+
// | 7 | 8 | 40 | (Poor) 5.7 |
207+
// | 14 | 16 | 48 | 3.4 |
208+
// | 28 | 32 | 80 | 3.3 |
209+
//
210+
// In general, buckets * table_layout.size >= table_layout.ctrl_align must
211+
// be true to avoid these edges.
212+
let min_buckets = table_layout.ctrl_align / table_layout.size.max(1);
213+
214+
// This `min_buckets * 7 / 8` is the reverse of the `cap * 8 / 7` below.
215+
let min_cap = match min_buckets.checked_mul(7) {
216+
Some(c) => cap.max(c / 8),
217+
None => cap,
218+
};
219+
220+
let cap = cap.max(min_cap);
200221
if cap < 8 {
201222
// We don't bother with a table size of 2 buckets since that can only
202-
// hold a single element. Instead we skip directly to a 4 bucket table
223+
// hold a single element. Instead, skip directly to a 4 bucket table
203224
// which can hold 3 elements.
204225
return Some(if cap < 4 { 4 } else { 8 });
205226
}
@@ -1126,7 +1147,7 @@ impl<T, A: Allocator> RawTable<T, A> {
11261147
// elements. If the calculation overflows then the requested bucket
11271148
// count must be larger than what we have right and nothing needs to be
11281149
// done.
1129-
let min_buckets = match capacity_to_buckets(min_size) {
1150+
let min_buckets = match capacity_to_buckets(min_size, Self::TABLE_LAYOUT) {
11301151
Some(buckets) => buckets,
11311152
None => return,
11321153
};
@@ -1257,14 +1278,8 @@ impl<T, A: Allocator> RawTable<T, A> {
12571278
/// * If `self.table.items != 0`, calling of this function with `capacity`
12581279
/// equal to 0 (`capacity == 0`) results in [`undefined behavior`].
12591280
///
1260-
/// * If `capacity_to_buckets(capacity) < Group::WIDTH` and
1261-
/// `self.table.items > capacity_to_buckets(capacity)`
1262-
/// calling this function results in [`undefined behavior`].
1263-
///
1264-
/// * If `capacity_to_buckets(capacity) >= Group::WIDTH` and
1265-
/// `self.table.items > capacity_to_buckets(capacity)`
1266-
/// calling this function are never return (will go into an
1267-
/// infinite loop).
1281+
/// * If `self.table.items > capacity_to_buckets(capacity, Self::TABLE_LAYOUT)`
1282+
/// calling this function are never return (will loop infinitely).
12681283
///
12691284
/// See [`RawTableInner::find_insert_slot`] for more information.
12701285
///
@@ -1782,8 +1797,8 @@ impl RawTableInner {
17821797
// SAFETY: We checked that we could successfully allocate the new table, and then
17831798
// initialized all control bytes with the constant `EMPTY` byte.
17841799
unsafe {
1785-
let buckets =
1786-
capacity_to_buckets(capacity).ok_or_else(|| fallibility.capacity_overflow())?;
1800+
let buckets = capacity_to_buckets(capacity, table_layout)
1801+
.ok_or_else(|| fallibility.capacity_overflow())?;
17871802

17881803
let result = Self::new_uninitialized(alloc, table_layout, buckets, fallibility)?;
17891804
// SAFETY: We checked that the table is allocated and therefore the table already has
@@ -4571,6 +4586,26 @@ impl<T, A: Allocator> RawExtractIf<'_, T, A> {
45714586
mod test_map {
45724587
use super::*;
45734588

4589+
#[test]
4590+
fn test_minimum_capacity_for_small_types() {
4591+
#[track_caller]
4592+
fn test_t<T>() {
4593+
let raw_table: RawTable<T> = RawTable::with_capacity(1);
4594+
let actual_buckets = raw_table.buckets();
4595+
let min_buckets = Group::WIDTH / core::mem::size_of::<T>();
4596+
assert!(
4597+
actual_buckets >= min_buckets,
4598+
"expected at least {min_buckets} buckets, got {actual_buckets} buckets"
4599+
);
4600+
}
4601+
4602+
test_t::<u8>();
4603+
4604+
// This is only "small" for some platforms, like x86_64 with SSE2, but
4605+
// there's no harm in running it on other platforms.
4606+
test_t::<u16>();
4607+
}
4608+
45744609
fn rehash_in_place<T>(table: &mut RawTable<T>, hasher: impl Fn(&T) -> u64) {
45754610
unsafe {
45764611
table.table.rehash_in_place(

0 commit comments

Comments
 (0)