Skip to content

webgpu: Implement Tqueries #8669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

webgpu: Implement Tqueries #8669

wants to merge 4 commits into from

Conversation

jc3265
Copy link
Contributor

@jc3265 jc3265 commented Apr 25, 2025

BUGS=[407961733]

This PR Implements Timer Queries. It does it by taking a timestamp before it beings a frame and after it ends processing

@jc3265 jc3265 added the internal Issue/PR does not affect clients label Apr 25, 2025
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 2a92144 to 7cfcd2d Compare April 25, 2025 20:51
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 7cfcd2d to 658390f Compare April 25, 2025 20:53
std::atomic<uint64_t> elapsed{ 0 };// only valid if available is true
};

std::shared_ptr<Status> status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does status need a std::shared_ptr? From what I can tell, only this class owns and uses this? Thus, if a smart pointer were needed I would think to reach for std::unique_ptr, but I don't see the need for a pointer at all. Why can't Status just be a member variable of this class, or even just its members directory (available and elapsed)? In any of these cases, should the underlying status be private, since only this class needs it?

if (!status->available.load()) {
return false;
}
if (outElapsedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this should not ever be true right? If the status is available, the time should be. Perhaps an assert_invariant(...) makes more sense? Second, even if this were true we would return that the results are available, but we never set anything in outElapsedTime which could have undefined behavior, such as junk in that integer.


void WebGPUDriver::endTimerQuery(Handle<HwTimerQuery> tqh) {
auto* tq = handleCast<WGPUTimerQuery>(tqh);
tq->endTimeElapsedQuery();
Copy link
Contributor

@AndyHovingh AndyHovingh Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly one way to do it, and maybe we should review this with some of the other backend engineers, to see if this gets to the spirit of what is needed by these queries. But, what I originally had in mind was something more like:

...
// somewhere in the driver initiation... maybe the constructor:
    ...
    mQueue.OnSubmittedWorkDone(wgpu::CallbackMode::AllowSpontaneous, [this](auto const& status) {
        if (status == wgpu::QueueWorkDoneStatus::Success) {
          mCurrentTimerQuery.end();
        } else {
          ... log something ...
        }
      });
      ...
}

void WebGPUDriver::commit(Handle<HwSwapChain> sch) {
    mCommandEncoder = nullptr;
    mCurrentTimerQuery.begin();
    mQueue.Submit(1, &mCommandBuffer);
   ...

beginTimerQuery(...) and endTimerQuery(...) maybe could be used to set (and unset) mCurrentTimerQuery.

You also have to account for finish, where we may need to add additional time if we submit multiple command buffers in a frame (we can discuss offline if you need clarity on that). In that case, you might want some kind of state for intermediate queues to account for in OnSubmittedWorkDone and maybe another argument to the timer query's "end" call.

#include "WebGPUHandles.h"

#include <chrono>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to include (include-what-you-use) <memory> for std::weak_ptr and <cstdint> for uint64_t.

Copy link
Contributor

@AndyHovingh AndyHovingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least time the command buffer queue submission time (I believe close to what Metal is doing) instead of the approach as-is at this point. We need to prioritize the approach of timing all the render/compute passes, accumulating those, and reporting that, which is a bit more complicated.

@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from b376137 to 097d165 Compare April 30, 2025 13:36
@jneljneljnel jneljneljnel added the webgpu Issues/features for WebGPU backend label May 1, 2025
@AndyHovingh AndyHovingh changed the title wgpu: Implement Tqueries webgpu: Implement Tqueries May 7, 2025
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 993c0be to 4fa7bb9 Compare May 14, 2025 21:46
@jc3265 jc3265 marked this pull request as draft May 14, 2025 22:27
@jc3265 jc3265 force-pushed the jcaldas/wgpuTQueries branch from 4fa7bb9 to adee213 Compare May 15, 2025 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal Issue/PR does not affect clients webgpu Issues/features for WebGPU backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants