webgpu: Implement Tqueries #8669

jc3265 · 2025-04-25T20:45:23Z

BUGS=[407961733]

This PR Implements Timer Queries. It does it by taking a timestamp before it beings a frame and after it ends processing

filament/backend/src/webgpu/WebGPUDriver.cpp

filament/backend/src/webgpu/WebGPUDriver.h

filament/backend/src/webgpu/WebGPUDriver.cpp

filament/backend/src/webgpu/WebGPUHandles.h

AndyHovingh · 2025-04-28T14:28:55Z

filament/backend/src/webgpu/WebGPUHandles.h

+        std::atomic<uint64_t> elapsed{ 0 };// only valid if available is true
+    };
+
+    std::shared_ptr<Status> status;


Why does status need a std::shared_ptr? From what I can tell, only this class owns and uses this? Thus, if a smart pointer were needed I would think to reach for std::unique_ptr, but I don't see the need for a pointer at all. Why can't Status just be a member variable of this class, or even just its members directory (available and elapsed)? In any of these cases, should the underlying status be private, since only this class needs it?

filament/backend/src/webgpu/WebGPUHandles.h

AndyHovingh · 2025-04-28T19:49:06Z

filament/backend/src/webgpu/WGPUTimerQuery.cpp

+    if (!status->available.load()) {
+        return false;
+    }
+    if (outElapsedTime) {


First, this should not ever be true right? If the status is available, the time should be. Perhaps an assert_invariant(...) makes more sense? Second, even if this were true we would return that the results are available, but we never set anything in outElapsedTime which could have undefined behavior, such as junk in that integer.

AndyHovingh · 2025-04-28T20:21:26Z

filament/backend/src/webgpu/WebGPUDriver.cpp

+
+void WebGPUDriver::endTimerQuery(Handle<HwTimerQuery> tqh) {
+    auto* tq = handleCast<WGPUTimerQuery>(tqh);
+    tq->endTimeElapsedQuery();


This is certainly one way to do it, and maybe we should review this with some of the other backend engineers, to see if this gets to the spirit of what is needed by these queries. But, what I originally had in mind was something more like:

... // somewhere in the driver initiation... maybe the constructor: ... mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::AllowSpontaneous, [this](auto const& status) { if (status == wgpu::QueueWorkDoneStatus::Success) { mCurrentTimerQuery.end(); } else { ... log something ... } }); ... } void WebGPUDriver::commit(Handle<HwSwapChain> sch) { mCommandEncoder = nullptr; mCurrentTimerQuery.begin(); mQueue.Submit(1, &mCommandBuffer); ...

beginTimerQuery(...) and endTimerQuery(...) maybe could be used to set (and unset) mCurrentTimerQuery.

You also have to account for finish, where we may need to add additional time if we submit multiple command buffers in a frame (we can discuss offline if you need clarity on that). In that case, you might want some kind of state for intermediate queues to account for in OnSubmittedWorkDone and maybe another argument to the timer query's "end" call.

AndyHovingh · 2025-04-28T20:29:58Z

filament/backend/src/webgpu/WGPUTimerQuery.cpp

+#include "WebGPUHandles.h"
+
+#include <chrono>
+


you need to include (include-what-you-use) <memory> for std::weak_ptr and <cstdint> for uint64_t.

AndyHovingh

We should at least time the command buffer queue submission time (I believe close to what Metal is doing) instead of the approach as-is at this point. We need to prioritize the approach of timing all the render/compute passes, accumulating those, and reporting that, which is a bit more complicated.

jc3265 added the internal Issue/PR does not affect clients label Apr 25, 2025

jc3265 requested review from bridgewaterrobbie, AndyHovingh, jneljneljnel and sidreesshah April 25, 2025 20:48

jc3265 force-pushed the jcaldas/wgpuTQueries branch from 2a92144 to 7cfcd2d Compare April 25, 2025 20:51

Implement Tqueries

658390f

jc3265 force-pushed the jcaldas/wgpuTQueries branch from 7cfcd2d to 658390f Compare April 25, 2025 20:53