Skip to content

Conversation

@show50726
Copy link
Contributor

@show50726 show50726 commented Oct 16, 2025

I'm uncertain about under what situation would the FenceStatus be Error, need some suggestions about how to handle Error status in checkFenceAndUnlockSlots.

Other notes:

  • Only partial functions have been completed, beginFrame and finishBeginFrame is still WIP All of them have been done
  • Right now there's no unit test for UboManager, will try to figure out if there's a easier way to test the logic and add the tests later.

@show50726 show50726 added the internal Issue/PR does not affect clients label Oct 16, 2025
@show50726 show50726 changed the title [dependes on #9324] buffer update opt: Initialize UboManager class and implement partial methods buffer update opt: Initialize UboManager class and implement partial methods Oct 17, 2025
@show50726 show50726 changed the title buffer update opt: Initialize UboManager class and implement partial methods buffer update opt: Initialize UboManager class Oct 17, 2025

// Traverse all MIs and see which of them need slot allocation.
bool needReallocation =
updateMaterialInstanceAllocations(materialInstances, /*forceAllocateAll*/ false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the boolean parameter you could use a two state enum class{} with names are more clear about what action will be taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe @pixelflinger can elaborate. But did you mean the "parameter" (i.e. forceAllocateAll) or the return value (i.e. allocationResult).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed both of the parameter and the return value to be enum

if (!needReallocation) {
// No need to grow the buffer, so we can just map the buffer for writing and return.
mMmbHandle = driver.mapBuffer(mUbHandle, 0, mUboSize,
MapBufferAccessFlags::WRITE_BIT | MapBufferAccessFlags::INVALIDATE_RANGE_BIT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the INVALIDATE_RANGE_BIT will make the whole mapped range invalid, meaning you'll have to update/write the whole range. Is it your intention here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the usage of this flag. I think we don't need this flag here, thanks for the catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little bit worried about the performance here, not sure if I remember correctly, but I read some articles about using it without this flag would lead to significant performance issue (The driver will assume you want to read the data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have another idea is that we cached the entire buffer content on the CPU side, so whenever we want to update the slots, we update the cached one. Then we can use INVALIDATE_RANGE_BIT when we're mapping. After all the updates are done and before unmapping, we copy the whole cached buffer into the mapped location. wdyt?


// Traverse all MIs and see which of them need slot allocation.
bool needReallocation =
updateMaterialInstanceAllocations(materialInstances, /*forceAllocateAll*/ false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe @pixelflinger can elaborate. But did you mean the "parameter" (i.e. forceAllocateAll) or the return value (i.e. allocationResult).

materialInstances.forEach([this, &driver](const FMaterialInstance* mi) {
const AllocationId allocationId = mi->getAllocationId();
assert_invariant(BufferAllocator::isValid(allocationId));
updateSlot(driver, allocationId, mi->getUniformBuffer().toBufferDescriptor(driver));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to update the allocationId of each material instance here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation id of each instance is updated in updateMaterialInstanceAllocations. At this point what we need to do is copy the data to the new location in the new UBO.

allocationIds.insert(id);
});

mFenceAllocationList.emplace_back( driver.createFence(), std::move(allocationIds) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so if a reallocation happens between two frames. Will the ids of the first frame still be meaningful in the context of the allocator? Because line 168 will try to release it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's a great point. Then I think we should destroy the existing fences during reallocation. Thanks!

@show50726 show50726 force-pushed the dev/uboManager-no-test branch from 85e27e7 to 218e85c Compare October 28, 2025 12:39
Copy link
Contributor

@poweifeng poweifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with minor comments

// Migrate all MI data to the new allocated slots.
for (const auto& materialInstance : materialInstances) {
materialInstance.second.forEach([this, &driver](const FMaterialInstance* mi) {
if (!mi->isUsingUboBatching()) return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency, here and elsewhere, bracket with next line for these early returns.

            if (!mi->isUsingUboBatching()) {
                return;
            }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +286 to +287
// For MIs whose parameters have been updated, aside from the slot it is being
// occupied by the GPU, we need to preserve an additional slot for it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the scenario:

  • There are N material instances
  • All instances are updated every frame
  • The GPU has not finished the previous frame
  • The number of free slots is now insufficient for the N new updates

If we reallocate and only add 1 * uboSize for each of the N instances, we will be in the exact same situation on the very next frame. The buffer will be full again, and we'll be forced to reallocate every single frame.
To prevent this, when a material instance fails to find a slot due to Ubo update, I allocate twice its needed space.

This is also why I don't use an early break in the loop (what you ask in the other comment). I need to iterate through all material instances to get a total count of how many failed by update, so I can calculate the correct new buffer size (accommodating 2x space for all of them).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is possible that for some devices the latency is fairly high and 2 * uboSize is not enough. It is worth trying to use (number of pending fences + 1) in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if you just did something simple like always allocate max(oldSize * 1.5, newEstimatedRequiredSize). Thinking about frame delays and needed space here can get really complicated fast imo.
But my comment is more a of personal preference. This section is basically the heuristics part of the work. So I'm open to anything that's more memory efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about frame delays and needed space here can get really complicated fast imo.

Yeah I agree...so I would prefer we still have *2 for those updating material instances as a buffer, and see how it works on the real apps and adjust it as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Issue/PR does not affect clients

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants