-
Notifications
You must be signed in to change notification settings - Fork 383
Possible race condition on VmaAllocator between defragmentation and regular allocations #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for reporting this bug and a good description. It sounds very serious. We will investigate it. Any help in debugging it is welcome. Blocking any usage of the allocator between Begin and EndDefragmentation may be a good workaround for the bug, but it shouldn't be needed. Defragmentation was developed with the assumption that using the allocator while defragmentation is in progress is allowed. Is your code 32- or 64-bit? |
We triggered this issue on x86-64 with an AMD GPU. |
Forgive me if I'm wrong, but I've been looking at the defrag code recently and the map transferring code seems really sketchy to me.
It just seems like from a logical standpoint, from the moment the copy begins to the end of the pass, a memory block being moved needs to be unwritable. I'm not saying that VMA needs to do this, it probably makes more sense for us users to take care of that. More generally, the VMA documentation says that allocations are not considered thread safe, and the defrag manipulates them. Am I wrong here? |
Once we have ported VulkanSample app to Linux (#466), we could compile it with clang using the |
We've been using VMA extensively for our memory management on projects with heavy memory load. However, upon integrating the defragmentation utility, we started getting frequent memory corruption on internal VMA data. We narrowed down the source of this corruption to concurrent accesses on our
VmaAllocator
and (apparently) succeeded in preventing it by blocking allocations for the duration of the defragmentation. More context follows, with a couple questions at the end.We use a single
VmaAllocator
from two separate threads. Most of our allocations are done from the rendering thread, which is also the one that runs the defragmentation. We also allocate from the main thread, although less frequently.Our defragmentation integration looks something like this (it has the same structure as what can be found in the online documentation) :
Running this bit of code on heavy workloads from our rendering thread while our main thread is free to use
mainAllocator
for other allocations, we reliably trigger memory corruptions, specifically located on the first 4 bytes of randomVmaAllocation
from our workingVmaDefragmentationPassMoveInfo::pMoves
array.Regardless of this particular symptom, we managed to prevent such corruption by implementing a mutex targeting the region between
vmaBeginDefragmentation()
andvmaEndDefragmentation()
as well as other VMA calls onmainAllocator
, specifically those that could be made concurrently to defragmentation.As a side note, we first prevented memory corruption by enabling
VMA_DEBUG_GLOBAL_MUTEX
, changingVmaMutex::m_Mutex
type tostd::recursive_mutex
, and locking it from our side right aftervmaBeginDefragmentation()
and releasing it right beforevmaEndDefragmentation()
.Does it make sense to treat the region from
vmaBeginDefragmentation()
tovmaEndDefragmentation()
as a critical section with regards to the usedVmaAllocator
? If so, is it something that could be enforced from within the library ?The text was updated successfully, but these errors were encountered: