Skip to content

Timeline layer: Add acceleration structure command tracking #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion generator/generate_vulkan_layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,10 @@ def get_layer_api_name(vendor: str, layer: str) -> str:

for char in name:
is_uc = char.isupper()
is_last_uc = False if not part else part[-1].isupper()
if part is None:
is_last_uc = False
else:
is_last_uc = part[-1].isupper()

if (is_uc and not is_last_uc) or not part:
if part:
Expand Down
2 changes: 1 addition & 1 deletion generator/vk_codegen/source_CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ if (CMAKE_BUILD_TYPE STREQUAL "Release")
COMMENT "Stripped lib${VK_LAYER}.so to ${VK_LAYER_STRIP}")
endif()

add_clang_tools()
add_clang_tools()
20 changes: 10 additions & 10 deletions layer_gpu_support/source/layer_device_functions.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -262,32 +262,32 @@ VKAPI_ATTR void VKAPI_CALL layer_vkCmdCopyImageToBuffer<user_tag>(VkCommandBuffe
/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureInfoKHR* pInfo);
layer_vkCmdCopyImageToBuffer2<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyImageToBufferInfo2* pCopyImageToBufferInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyAccelerationStructureToMemoryKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);
layer_vkCmdCopyImageToBuffer2KHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyImageToBufferInfo2* pCopyImageToBufferInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyMemoryToAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);
layer_vkCmdCopyAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureInfoKHR* pInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyImageToBuffer2<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyImageToBufferInfo2* pCopyImageToBufferInfo);
layer_vkCmdCopyAccelerationStructureToMemoryKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyImageToBuffer2KHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyImageToBufferInfo2* pCopyImageToBufferInfo);
layer_vkCmdCopyMemoryToAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

// Functions for queues

Expand Down
19 changes: 5 additions & 14 deletions layer_gpu_timeline/README_LAYER.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,20 +189,11 @@ render pass. When the layer detects suspended render pass in a multi-submit
command buffer, it will still capture and report the workload, but with an
unknown draw call count.

## Command stream modelling

Most properties we track are a property of the command buffer recording in
isolation. However, the user debug label stack is a property of the queue and
persists across submits. We can therefore only determine the debug label
associated with a workload in the command stream at submit time, and must
resolve it per workload inside the command buffer.

To support this we implement a software command stream that contains simple
bytecode actions that represent the sequence of debug label and workload
commands inside each command buffer. This "command stream" can be played to
update the the queue state at submit time, triggering metadata submission
for each workload that can snapshot the current state of the user debug label
stack at that point in the command stream.
## Developer documentation

This page covers using the layer as a tool for application development. For
documentation about developing the layer itself, please refer to the
[developer documentation](./docs/developer-docs.md).

- - -

Expand Down
46 changes: 46 additions & 0 deletions layer_gpu_timeline/docs/developer-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Layer: GPU Timeline - Developer Documentation

This layer is used with Arm GPU tooling that can show the scheduling of
workloads on to the GPU hardware. The layer provides additional semantic
annotation, extending the scheduling data from the Android Perfetto render
stages telemetry with useful API-aware context.

## Command stream modelling

Most properties we track are a property of the command buffer recording in
isolation. However, the user debug label stack is a property of the queue and
persists across submits. We can therefore only determine the debug label
associated with a workload in the command stream at submit time, and must
resolve it per workload inside the command buffer.

To support this we implement a software command stream that contains simple
bytecode actions that represent the sequence of debug label and workload
commands inside each command buffer. This "command stream" can be played to
update the the queue state at submit time, triggering metadata submission
for each workload that can snapshot the current state of the user debug label
stack at that point in the command stream.

## Updating protobuf

The protocol between the layer and the host tools uses Google Protocol
Buffers to implement the message encoding.

The layer implementation uses Protopuf, a light-weight implementation which
can be trivially integrated into the layer. Protopuf message definitions are
defined directly in the C++ code (see `timeline_protobuf_encoder.cpp`) and do
not use the `timeline.proto` definitions.

The host implementation uses the Google `protoc` compiler to generate native
bindings from the `timeline.proto` definition. When updating the protocol
buffers you must ensure that the C++ and `proto` definitions match.

To regenerate the Python bindings, run the following command from the
`layer_gpu_timeline` directory:

```sh
protoc ./timeline.proto --python_out=../lglpy/timeline/protos/layer_driver/
```

- - -

_Copyright © 2024-2025, Arm Limited and contributors._
40 changes: 39 additions & 1 deletion layer_gpu_timeline/source/layer_device_functions.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/*
* SPDX-License-Identifier: MIT
* ----------------------------------------------------------------------------
* Copyright (c) 2024 Arm Limited
* Copyright (c) 2024-2025 Arm Limited
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to
Expand Down Expand Up @@ -298,6 +298,26 @@ VKAPI_ATTR void VKAPI_CALL
uint32_t height,
uint32_t depth);

// Commands for acceleration structure builds

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL layer_vkCmdBuildAccelerationStructuresIndirectKHR<user_tag>(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkDeviceAddress* pIndirectDeviceAddresses,
const uint32_t* pIndirectStrides,
const uint32_t* const* ppMaxPrimitiveCounts);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL layer_vkCmdBuildAccelerationStructuresKHR<user_tag>(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos);

// Commands for transfers

/* See Vulkan API for documentation. */
Expand Down Expand Up @@ -406,6 +426,24 @@ VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyImageToBuffer2KHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyImageToBufferInfo2* pCopyImageToBufferInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureInfoKHR* pInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyAccelerationStructureToMemoryKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyAccelerationStructureToMemoryInfoKHR* pInfo);

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL
layer_vkCmdCopyMemoryToAccelerationStructureKHR<user_tag>(VkCommandBuffer commandBuffer,
const VkCopyMemoryToAccelerationStructureInfoKHR* pInfo);

// Functions for debug

/* See Vulkan API for documentation. */
Expand Down
79 changes: 79 additions & 0 deletions layer_gpu_timeline/source/layer_device_functions_trace_rays.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,26 @@

extern std::mutex g_vulkanLock;

/**
* @brief Register an acceleration structure build with the tracker.
*
* @param layer The layer context for the device.
* @param commandBuffer The command buffer we are recording.
* @param buildType The build type.
* @param primitiveCount The number of primitives in the build.
*
* @return The assigned tagID for the workload.
*/
static uint64_t registerAccelerationStructureBuild(Device* layer,
VkCommandBuffer commandBuffer,
Tracker::LCSAccelerationStructureBuild::Type buildType,
int64_t primitiveCount)
{
auto& tracker = layer->getStateTracker();
auto& cb = tracker.getCommandBuffer(commandBuffer);
return cb.accelerationStructureBuild(buildType, primitiveCount);
}

/**
* @brief Register a trace rays dispatch with the tracker.
*
Expand All @@ -55,6 +75,65 @@ static uint64_t registerTraceRays(Device* layer,
return cb.traceRays(itemsX, itemsY, itemsZ);
}

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL layer_vkCmdBuildAccelerationStructuresIndirectKHR<user_tag>(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkDeviceAddress* pIndirectDeviceAddresses,
const uint32_t* pIndirectStrides,
const uint32_t* const* ppMaxPrimitiveCounts)
{
LAYER_TRACE(__func__);

// Hold the lock to access layer-wide global store
std::unique_lock<std::mutex> lock {g_vulkanLock};
auto* layer = Device::retrieve(commandBuffer);

uint64_t tagID = registerAccelerationStructureBuild(layer,
commandBuffer,
Tracker::LCSAccelerationStructureBuild::Type::unknown,
-1);

// Release the lock to call into the driver
lock.unlock();
emitStartTag(layer, commandBuffer, tagID);
layer->driver.vkCmdBuildAccelerationStructuresIndirectKHR(commandBuffer,
infoCount,
pInfos,
pIndirectDeviceAddresses,
pIndirectStrides,
ppMaxPrimitiveCounts);
layer->driver.vkCmdEndDebugUtilsLabelEXT(commandBuffer);
}

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL layer_vkCmdBuildAccelerationStructuresKHR<user_tag>(
VkCommandBuffer commandBuffer,
uint32_t infoCount,
const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos)
{
LAYER_TRACE(__func__);

// Hold the lock to access layer-wide global store
std::unique_lock<std::mutex> lock {g_vulkanLock};
auto* layer = Device::retrieve(commandBuffer);

uint64_t tagID = registerAccelerationStructureBuild(layer,
commandBuffer,
Tracker::LCSAccelerationStructureBuild::Type::unknown,
-1);

// Release the lock to call into the driver
lock.unlock();
emitStartTag(layer, commandBuffer, tagID);
layer->driver.vkCmdBuildAccelerationStructuresKHR(commandBuffer, infoCount, pInfos, ppBuildRangeInfos);
layer->driver.vkCmdEndDebugUtilsLabelEXT(commandBuffer);
}

/* See Vulkan API for documentation. */
template<>
VKAPI_ATTR void VKAPI_CALL layer_vkCmdTraceRaysIndirect2KHR<user_tag>(VkCommandBuffer commandBuffer,
Expand Down
Loading