Skip to content

Commit 66df9bb

Browse files
committed
GPA 3.12 updates
1 parent cfbfe2a commit 66df9bb

File tree

298 files changed

+1087986
-343380
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

298 files changed

+1087986
-343380
lines changed

README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,16 @@ Prebuilt binaries can be downloaded from the Releases page: https://github.com/G
3131
* Provides access to some raw hardware counters. See [Raw Hardware Counters](#raw-hardware-counters) for more information.
3232

3333
## What's New
34-
### Version 3.11.1 (07/27/22)
35-
* Updated OpenGL support for the Adrenalin 22.7.1 driver.
36-
* Added L2CacheHit counter for OpenGL on Radeon RX 5000 Series hardware.
37-
* Improved GPA integration into GLTriangle sample application.
34+
### Version 3.12 (12/14/22)
35+
* Add support for AMD Radeon™ RX 7900 XTX and AMD Radeon™ RX 7900 XT GPUs.
36+
* Add support for compiling with Visual Studio 2022.
37+
* Reduced binary sizes by an average of 75%.
38+
* Bug Fixes:
39+
* AMD Radeon RX 6800, DX12: HiZ and PreZ counters are now reporting correct values (requires Adrenalin 22.7.1 or newer driver).
40+
* AMD Radeon RX 6800: CSThreadgroups is now reporting the correct values (requires Adrenalin 22.7.1 or newer driver).
41+
* AMD Radeon RX 6000 Series: PostTessellation counters now only show results in pipelines using tessellation.
42+
* AMD Radeon RX 5000 Series: PreTessellation counters now only show results in pipelines using tessellation.
43+
* Sample apps: Fix implementation of passes in D3D11Triangle, and improve general error handling.
3844

3945
## System Requirements
4046
* An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.
@@ -95,8 +101,6 @@ It was discovered that the improvements introduced in Vega, RDNA, and RDNA2 arch
95101
## Known Issues
96102
### Counter Validity on Specific Hardware
97103
There are some counters that are returning unexpected results on specific hardware with certain APIs.
98-
* AMD Radeon RX 6800, DX12: HiZ and PreZ counters may consistently report 33% higher than expected.
99-
* AMD Radeon RX 6800, DX11: CSThreadGroups may consistently report 33% higher than expected.
100104
* AMD Radeon RX 6700M, DX11: CSLDSBankConflict and CSLDSBankConflictCycles may consistently report as much as 30x higher than expected.
101105
* AMD Radeon RX 480, DX12: CulledPrims and PSPixelsOut may inconsistently report higher than expected.
102106

ReleaseNotes.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,17 @@
11
# GPU Performance API Release Notes
22
---
33

4+
## Version 3.12 (12/14/22)
5+
* Add support for AMD Radeon™ RX 7900 XTX and AMD Radeon™ RX 7900 XT GPUs.
6+
* Add support for compiling with Visual Studio 2022.
7+
* Reduced binary sizes by an average of 75%.
8+
* Bug Fixes:
9+
* AMD Radeon RX 6800, DX12: HiZ and PreZ counters are now reporting correct values (requires Adrenalin 22.7.1 or newer driver).
10+
* AMD Radeon RX 6800: CSThreadgroups is now reporting the correct values (requires Adrenalin 22.7.1 or newer driver).
11+
* AMD Radeon RX 6000 Series: PostTessellation counters now only show results in pipelines using tessellation.
12+
* AMD Radeon RX 5000 Series: PreTessellation counters now only show results in pipelines using tessellation.
13+
* Sample apps: Fix implementation of passes in D3D11Triangle, and improve general error handling.
14+
415
## Version 3.11.1 (07/27/22)
516
* Updated OpenGL support for the Adrenalin 22.7.1 driver.
617
* Added L2CacheHit counter for OpenGL on Radeon RX 5000 Series hardware.

build/cmake_modules/build_flags.cmake

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
1-
## Copyright (c) 2018-2019 Advanced Micro Devices, Inc. All rights reserved.
1+
## Copyright (c) 2018-2022 Advanced Micro Devices, Inc. All rights reserved.
22
cmake_minimum_required(VERSION 3.5.1)
33

44
## GPA has only Debug and Release
55
set(CMAKE_CONFIGURATION_TYPES Debug Release)
66
set(DEPTH "./")
77

8-
include(${GPA_CMAKE_MODULES_DIR}/defs.cmake)
9-
108
if(NOT DEFINED usingscript)
119
set(usingscript OFF CACHE BOOL "Turn on to indicate CMake is called using script" FORCE)
1210
endif()

build/cmake_modules/common.cmake

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1-
## Copyright (c) 2018-2019 Advanced Micro Devices, Inc. All rights reserved.
1+
## Copyright (c) 2018-2022 Advanced Micro Devices, Inc. All rights reserved.
22
cmake_minimum_required(VERSION 3.5.1)
33

4-
include(${GPA_CMAKE_MODULES_DIR}/defs.cmake)
54
include (${GPA_CMAKE_MODULES_DIR}/utils.cmake)
65

76
# Include global cmake common file
87
include(${CMAKE_COMMON_SRC_GLOBAL_CMAKE_MODULE})
98

9+
# Check for required variables from other cmake files.
10+
if (${GPA_OUTPUT_DIR} STREQUAL "")
11+
message(FATAL_ERROR "No output directory is defined, make sure defs.cmake is included before common.cmake")
12+
endif()
13+
1014
# Global compiler options
1115
add_compile_options(${COMMON_COMPILATION_FLAGS})
1216

build/cmake_modules/defs.cmake

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@ cmake_minimum_required(VERSION 3.5.1)
33

44
## Define the GPA version
55
set(GPA_MAJOR_VERSION 3)
6-
set(GPA_MINOR_VERSION 11)
7-
set(GPA_UPDATE_VERSION 1)
6+
set(GPA_MINOR_VERSION 12)
7+
set(GPA_UPDATE_VERSION 0)
88

99
if(NOT DEFINED GPA_BUILD_NUMBER)
1010
set(GPA_BUILD_NUMBER 0)

build/cmake_modules/gpa_version.cmake

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,13 @@ set(GPA_VERSION_HEADER_FILE_CONTENT "//=========================================
2727
#define GPA_BUILD_NUMBER ${GPA_BUILD_NUMBER}
2828
2929
#define GPA_STR_VALUE(s) #s ///< Macro to stringify a value.
30-
#define GPA_VERSION_STRING(s) GPA_STR_VALUE(s) ///< Macro to stringify a version value.
30+
#ifdef WIN32
31+
#define GPA_STR_EMPTY() ///< Macro to stringify an empty value.
32+
#define GPA_GET_MACRO(_0, _1, MACRO, ...) MACRO ///< Macro to select between GPA_STR_VALUE and GPA_STR_EMPTY.
33+
#define GPA_VERSION_STRING(s) GPA_GET_MACRO(_0, ##__VA_ARGS__, GPA_STR_VALUE, GPA_STR_EMPTY)(__VA_ARGS__) ///< Macro to stringify a version value.
34+
#else
35+
#define GPA_VERSION_STRING(s) GPA_STR_VALUE(s) ///< Macro to stringify a version value
36+
#endif // !WIN32
3137
3238
#define GPA_MAJOR_VERSION_STR GPA_VERSION_STRING(GPA_MAJOR_VERSION) ///< Macro for major version string.
3339
#define GPA_MINOR_VERSION_STR GPA_VERSION_STRING(GPA_MINOR_VERSION) ///< Macro for minor version string.

build/cmake_modules/targets.cmake

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@ cmake_minimum_required(VERSION 3.5.1)
55
set(CMAKE_CONFIGURATION_TYPES Debug Release)
66
set(DEPTH "./")
77

8-
include(${GPA_CMAKE_MODULES_DIR}/defs.cmake)
98
include(${GPA_CMAKE_MODULES_DIR}/build_flags.cmake)
109

1110
include(CTest)

build/cmake_modules/utils.cmake

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,11 +49,16 @@ macro(SET_EXECUTABLE_NAME EXECUTABLE_NAME)
4949
endif()
5050
endmacro()
5151

52-
## Macro to define additional compile defintion to GPA user
52+
## Macro to define additional compile definition to GPA user
5353
macro(ADD_GPA_USER_COMPILE_DEFINITIONS)
5454
set_property(TARGET ${GPA_PROJECT_NAME} PROPERTY COMPILE_DEFINITIONS $<$<CONFIG:DEBUG>:USE_DEBUG_GPA> ${ADDITIONAL_INTERNAL_DEFINITION})
5555
endmacro()
5656

57+
## Macro to define additional compile definitions to a named GPA Target project.
58+
macro(ADD_GPA_COMPILE_DEFINITIONS TARGET_NAME)
59+
set_property(TARGET ${ARGV0} PROPERTY COMPILE_DEFINITIONS $<$<CONFIG:DEBUG>:USE_DEBUG_GPA> ${ADDITIONAL_INTERNAL_DEFINITION})
60+
endmacro()
61+
5762
if(CMAKE_GENERATOR MATCHES "Visual Studio")
5863
set(EXCLUDE_FROM_BUILD EXCLUDE_FROM_DEFAULT_BUILD)
5964
else()
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
.. Copyright(c) 2018-2022 Advanced Micro Devices, Inc.All rights reserved.
2+
.. Compute Performance Counters for RDNA3
3+
4+
.. *** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.
5+
6+
RDNA3 Counters
7+
++++++++++++++
8+
9+
General Group
10+
%%%%%%%%%%%%%
11+
12+
.. csv-table::
13+
:header: "Counter Name", "Usage", "Brief Description"
14+
:widths: 15, 10, 75
15+
16+
"Wavefronts", "Items", "Total wavefronts."
17+
"VALUInsts", "Items", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
18+
"SALUInsts", "Items", "The average number of scalar ALU instructions executed per work-item (affected by flow control)."
19+
"VFetchInsts", "Items", "The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory."
20+
"SFetchInsts", "Items", "The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control)."
21+
"VWriteInsts", "Items", "The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory."
22+
"GDSInsts", "Items", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
23+
"VALUUtilization", "Percentage", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
24+
"VALUBusy", "Percentage", "The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
25+
"SALUBusy", "Percentage", "The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
26+
27+
LocalMemory Group
28+
%%%%%%%%%%%%%%%%%
29+
30+
.. csv-table::
31+
:header: "Counter Name", "Usage", "Brief Description"
32+
:widths: 15, 10, 75
33+
34+
"LDSInsts", "Items", "The average number of LDS read or LDS write instructions executed per work item (affected by flow control)."
35+
"LDSBankConflict", "Percentage", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
36+
37+
GlobalMemory Group
38+
%%%%%%%%%%%%%%%%%%
39+
40+
.. csv-table::
41+
:header: "Counter Name", "Usage", "Brief Description"
42+
:widths: 15, 10, 75
43+
44+
"FetchSize", "Kilobytes", "The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account."
45+
"WriteSize", "Kilobytes", "The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account."
46+
"L0CacheHit", "Percentage", "The percentage of fetch, write, atomic, and other instructions that hit the data in L0 cache. Value range: 0% (no hit) to 100% (optimal)."
47+
"L1CacheHit", "Percentage", "The percentage of fetch, write, atomic, and other instructions that hit the data in L1 cache. Writes and atomics always miss this cache. Value range: 0% (no hit) to 100% (optimal)."
48+
"L2CacheHit", "Percentage", "The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal)."
49+
"MemUnitBusy", "Percentage", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
50+
"MemUnitStalled", "Percentage", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
51+
"WriteUnitStalled", "Percentage", "The percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad)."

docs/sphinx/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,9 @@
6161
# built documents.
6262
#
6363
# The short X.Y version.
64-
version = u'3.11'
64+
version = u'3.12'
6565
# The full version, including alpha/beta/rc tags.
66-
release = u'3.11'
66+
release = u'3.12'
6767

6868
# The language for content autogenerated by Sphinx. Refer to documentation
6969
# for a list of supported languages.

0 commit comments

Comments
 (0)