Inefficient locking in `FragmentedMapper::get_or_optionally_allocate_slab_table`

`FragmentedMapper::get_or_optionally_allocate_slab_table(addr, allocate)` finds the slab table for an address, or optionally allocates a slab table for it if not found.  It is essentially a hash map.  Its algorithm is:

1.  Compute the hash value of the base address derived form `addr`, and use the hash value as the current `index`.
2.  Look up the `slab_map` from the `index`.
3.  If it hits, return the `stab_table` entry.
4.  Otherwise, **acquire the `lock`**.
5.  If the entry at `index` is free, either return `None` or allocate a slab table there and return it.
6.  Othrwise the entry must be occupied by the slab table of another address.
7.  Increment `index` by one.
8.  **Release the `lock`**.
9.  Go to step 2

It will work normally if step 2 hits immediately.  But if the index happens to be used for another address, it will go through step 4 to 9 and repeat from 2, linearly scanning the hash table until it finds the matching or vacant entry.  During this process, it will frequently acquire and release the mutex `FragmentedMapper::lock`, causing significant slowdown.

The effect is amplified by frequently calling `is_mmtk_object` which checks if an arbitrary address points to a valid MMTk object.  In the [mmtk-ruby binding](https://github.com/mmtk/mmtk-ruby), I added [an aggressive assertion](https://github.com/mmtk/ruby/blob/3051b4d4f3808e7301351e0fc9621cd162d8e0b3/gc/default/default.c#L6587) at every write barrier to check if both the source and the target objects are valid MMTk objects.  It normally costs very little compared to the rest of the VM.  But remember that MMTk must check if an address is mapped before accessing the VO bits, and `FragmentedMapper::get_or_optionally_allocate_slab_table` is called when checking if a given metadata address is mapped.  If the metadata address happens to have a conflict in the `FragmentedMapperInner::slab_map`, it will reach the pathological case I described above.  When running the [optcarrot](https://github.com/mame/optcarrot), I observed almost 50% drop in FPS when choosing certain GC plans.  The problem goes away when I disable the assertion.

# Solution

We may argue that we should remove such assertions when measuring performance.  But I still believe the current implementation of `FragmentedMapper::get_or_optionally_allocate_slab_table` should be improved to eliminate the pathological cases.

First of all, why does `FragmentedMapper` have to be a hash table?  Each `Slab` governs 2048 chunks, and the `FragmentedMapper::slab_map` has 2048 entries.  That is 4M chunks, which is 16TB.  That is already 44 bits address space, slightly smaller than the typical 48-bit (256TB) address space of common architectures such as x86_64.  If we change both `MMAP_NUM_CHUNKS` (chunk per `Slab`) and `SLAB_TABLE_SIZE` (total `Slab` in `slab_map`), it will cover 48-bit address space.  Even if we have to support 56-bit address space one day, we can still adjust both of them to 131072, or we introduce a three-level table.

Secondly, even if we make `FragmentedMapper` a hash table, the table must have the capability to resize and rehash.  It is necessary if the heap plus metadata can grow beyond 16TB.

Thirdly, I think it should be possible to make the hash table lock-free, at least for the case where we don't need to insert new entries.  The check of VO bits should not cause more chunks to be mapped.  For this reason, `get_or_optionally_allocate_slab_table` does not need to insert new slabs.  Then it just need to visit entries from the `hash` code without holding lock.  Currently, it holds the lock to see if another thread has allocated the slab concurrently.  But if it is for the VO bit, it can simply return `false` because the "arbitrary address" existed before we call this function, and another thread can't concurrently make that address a "valid MMTk object" by allocating anything.

Fourthly, can we just use a third-party lock-free hash table implementation instead of writing by ourselves?

The third solution should be one that needs the least invasive changes, but we should think why we didn't do it the first way (i.e. making it a two-level lazy array instead of hash map) because it should be both easier and faster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient locking in `FragmentedMapper::get_or_optionally_allocate_slab_table` #1336

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient locking in FragmentedMapper::get_or_optionally_allocate_slab_table #1336

Description

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inefficient locking in `FragmentedMapper::get_or_optionally_allocate_slab_table` #1336