Possible race condition on VmaAllocator between defragmentation and regular allocations


We've been using VMA extensively for our memory management on projects with heavy memory load. However, upon integrating the defragmentation utility, we started getting frequent memory corruption on internal VMA data. We narrowed down the source of this corruption to concurrent accesses on our `VmaAllocator` and (apparently) succeeded in preventing it by blocking allocations for the duration of the defragmentation. More context follows, with a couple questions at the end.

We use a single `VmaAllocator` from two separate threads. Most of our allocations are done from the rendering thread, which is also the one that runs the defragmentation. We also allocate from the main thread, although less frequently.

Our defragmentation integration looks something like this (it has the same structure as what can be found in the online documentation) :
```
vmaBeginDefragmentation(mainAllocator, ...);

for (;;)
{
	if (vmaBeginDefragmentationPass(mainAllocator, ...) == VK_SUCCESS)
	   break;

	CreateResourcesAndCopy();
	WaitForCompletion();

	if (vmaEndDefragmentationPass(mainAllocator, ...) == VK_SUCCESS)
		break;
}

vmaEndDefragmentation(mainAllocator, ...);
```

Running this bit of code on heavy workloads from our rendering thread while our main thread is free to use `mainAllocator` for other allocations, we reliably trigger memory corruptions, specifically located on the first 4 bytes of random `VmaAllocation` from our working `VmaDefragmentationPassMoveInfo::pMoves` array.

Regardless of this particular symptom, we managed to prevent such corruption by implementing a mutex targeting the region between `vmaBeginDefragmentation()` and `vmaEndDefragmentation()` as well as other VMA calls on `mainAllocator`, specifically those that could be made concurrently to defragmentation.

As a side note, we first prevented memory corruption by enabling `VMA_DEBUG_GLOBAL_MUTEX`, changing `VmaMutex::m_Mutex` type to `std::recursive_mutex`, and locking it from our side right after `vmaBeginDefragmentation()` and releasing it right before `vmaEndDefragmentation()`.

Does it make sense to treat the region from `vmaBeginDefragmentation()` to `vmaEndDefragmentation()` as a critical section with regards to the used `VmaAllocator` ? If so, is it something that could be enforced from within the library ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible race condition on VmaAllocator between defragmentation and regular allocations #313

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Possible race condition on VmaAllocator between defragmentation and regular allocations #313

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions