Skip to content

HashMap reports zero capacity without freeing memory #79178

Open
rust-lang/hashbrown
#436
@Diggsey

Description

@Diggsey

It's generally expected that when std collections return a capacity of zero, that they are not backed by any allocation. I believe this was also true for the HashMap prior to hashbrown, although I could be wrong.

In any case, whilst HashMap::capacity() is documented as a lower bound I still find it extremely surprising that it reports a capacity of zero without freeing its memory.

You can reproduce the issue with the following program (using mockalloc to detect the leaked memory):

use std::alloc::System;
use std::collections::hash_map::{DefaultHasher, HashMap};
use std::hash::BuildHasherDefault;

use mockalloc::Mockalloc;

#[global_allocator]
static ALLOCATOR: Mockalloc<System> = Mockalloc(System);

const NUMBERS: &[usize] = &[
    25, 38, 41, 42, 89, 115, 184, 237, 273, 286, 300, 326, 377, 413, 482, 536, 602, 650, 702, 746,
    750, 807, 810, 836, 960, 979, 982, 1007,
];

fn main() {
    let mut guard =
        HashMap::<String, (), _>::with_hasher(BuildHasherDefault::<DefaultHasher>::default());
    mockalloc::assert_allocs(|| {
        for &n in NUMBERS {
            let k = n.to_string();
            guard.insert(k, ());
        }
        for &n in NUMBERS {
            let k = n.to_string();
            guard.remove(&k);
            if guard.len() * 3 < guard.capacity() {
                guard.shrink_to_fit();
            }
        }
        eprintln!("Capacity: {}", guard.capacity());
        //guard.shrink_to_fit();
    });
}

Given these lines:

if guard.len() * 3 < guard.capacity() {
    guard.shrink_to_fit();
}

I expected all the memory to be reclaimed by the end of the loop. However, this is not the case: the memory is only reclaimed if shrink_to_fit() is called an additional time.

IMO, the fix should be two-fold:

  1. When the last element of a HashMap is removed, it should automatically clear out any tombstones so that the capacity() method is accurate again.

  2. A new method should be introduced to obtain the upper bound on the capacity. At the moment code which attempts to shrink under-utilized hashmaps is broken, and it's not possible to fix this code without a new method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-collectionsArea: `std::collections`C-bugCategory: This is a bug.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions