Skip to content

Commit 02c8257

Browse files
committed
GPA 3.7 updates
1 parent 028e068 commit 02c8257

File tree

376 files changed

+713632
-375749
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

376 files changed

+713632
-375749
lines changed

CMakeLists.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ set(DEPTH "./")
66
message(STATUS "Generating project files for GPA....")
77
set(GPA_CMAKE_MODULES_DIR ${CMAKE_CURRENT_SOURCE_DIR}/build/cmake_modules)
88

9+
if(NOT DEFINED build)
10+
set(GPA_BUILD_NUMBER 0)
11+
else()
12+
set(GPA_BUILD_NUMBER ${build})
13+
endif()
14+
915
include(${GPA_CMAKE_MODULES_DIR}/gpa_version.cmake)
1016
include(${GPA_CMAKE_MODULES_DIR}/defs.cmake)
1117
include(${GPA_CMAKE_MODULES_DIR}/build_flags.cmake)

NOTICES.txt

Lines changed: 99 additions & 232 deletions
Large diffs are not rendered by default.

README.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,23 @@ Prebuilt binaries can be downloaded from the Releases page: https://github.com/G
3333
* Provides access to some raw hardware counters. See [Raw Hardware Counters](#raw-hardware-counters) for more information.
3434

3535
## What's New
36-
* Version 3.6 (05/15/20)
37-
* Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
38-
* Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan: LocalVidMemBytes and PcieBytes.
39-
* Add VS2019 project support to CMake.
40-
* Restructure of GPA source layout to adhere to google style.
36+
* Version 3.7 (11/24/20)
37+
* Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
38+
* New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
39+
* RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
40+
* TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
41+
* RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
42+
* New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs.
43+
* Scalar cache: ScalarCacheHit, ScalarCacheRequestCount, ScalarCacheHitCount, ScalarCacheMissCount
44+
* Instruction cache: InstCacheHit, InstCacheRequestCount, InstCacheHitCount, InstCacheMissCount
45+
* Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
46+
* Remove OpenCL™ support from Linux.
47+
* Remove downloading the Vulkan® SDK by the build script.
4148

4249
## System Requirements
4350
* An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.
44-
* Windows: Radeon Software Adrenaline 2020 Edition 19.12.2 or later (Driver Packaging Version 19.50 or later).
45-
* Linux: Radeon Software for Linux Revision 19.50 or later.
51+
* Windows: Radeon Software Adrenaline 2020 Edition 20.11.2 or later (Driver Packaging Version 20.45 or later).
52+
* Linux: Radeon Software for Linux Revision 20.45 or later.
4653
* Radeon GPUs or APUs based on Graphics IP version 6 and 7 are no longer supported by GPUPerfAPI. Please use an older version ([3.3](https://github.com/GPUOpen-Tools/GPA/releases/tag/v3.3)) with older hardware.
4754
* Windows 7, 8.1, and 10.
4855
* Ubuntu (16.04 and later) and CentOS/RHEL (7 and later) distributions.
@@ -88,14 +95,17 @@ hardware counters in a default build, by simply specifying a new flag when calli
8895
build of GPUPerfAPI that also exposes the raw hardware counters, but that is a deprecated build and it is likely to be removed in a future release.
8996

9097
## Known Issues
98+
* On Ubuntu 20.04 LTS, Vulkan ICD may not be set to use AMD Vulkan ICD. In this case, it needs to be explicitly set to use AMD Vulkan ICD before using the GPA.
99+
It can be done by setting the "VK_ICD_FILENAMES" environment variable to "/etc/vulkan/icd.d/amd_icd64.json"
100+
* VSVerticesIn, HSPatches, and DSVerticesIn counters aren't availbale on Radeon RX 6000 Series GPU using OpenGL.
101+
* FetchSize counter will show an error when enabled on Radeon RX 6000 Series GPU using OpenGL. This is expected to be fixed in a future driver release.
91102
* Adjusting the GPU clock mode on Linux is accomplished by writing to <br><br>/sys/class/drm/card\<N\>/device/power_dpm_force_performance_level<br><br> where \<N\> is
92103
the index of the card in question. By default this file is only modifiable by root, so the application being profiled would have to be run as root in order for it to
93104
modify the clock mode. It is possible to modify the permissions for the file instead so that it can be written by unprivileged users. The following command will
94105
achieve this. Note, however, that changing the permissions on a system file like this could circumvent security. Also, on multi-GPU systems, you may have to replace
95106
"card0" with the appropriate card number. Permissions on this file may be reset when rebooting the system:
96107
* sudo chmod ugo+w /sys/class/drm/card0/device/power_dpm_force_performance_level
97-
* The following performance counter values may not be accurate for DirectX 11 applications running on a Radeon 5700 Series GPU. This is expected to be addressed in a future
98-
driver release:
108+
* The following performance counter values may not be accurate for DirectX 11 applications running on a Radeon 5700, and 6000 Series GPU. This is expected to be fixed in a future driver release.
99109
* VALUInstCount, SALUInstCount, VALUBusy, SALUBusy for all shader stages: These values should be representative of performance, but may not be 100% accurate.
100110
* Most of the ComputeShader counters (all except the MemUnit and WriteUnit counters): These values should be representative of performance, but may not be 100% accurate.
101111
* The following performance counter values may not be accurate for OpenGL applications running on a Radeon 5700 Series GPU. This is expected to be addressed in a future

ReleaseNotes.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,19 @@
11
# GPU Performance API Release Notes
22
---
33

4+
## Version 3.7 (11/24/20)
5+
* Add support for additional GPUs and APUs, including AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
6+
* New RT counters for DXR workloads on AMD RDNA™ 2 Radeon™ RX 6000 series GPUs.
7+
* RayTriTests, and RayBoxTests: These counters collect the number of ray intersections for triangles and boxes, respectively.
8+
* TotalRayTests: This counter collects the aggregated number of ray-box and ray-triangle intersection tests.
9+
* RayTestsPerWave: This counter collects ray intersection test count at a more granular level – per wave.
10+
* New Scalar and Instruction cache counters on AMD RDNA™ Radeon™ RX 5000 series GPUs.
11+
* Scalar cache: ScalarCacheHit, ScalarCacheRequestCount, ScalarCacheHitCount, ScalarCacheMissCount
12+
* Instruction cache: InstCacheHit, InstCacheRequestCount, InstCacheHitCount, InstCacheMissCount
13+
* Update the Vulkan® sample to remove the static link and use the system-specific Vulkan® loader.
14+
* Remove OpenCL™ support from Linux.
15+
* Remove downloading the Vulkan® SDK by the build script.
16+
417
## Version 3.6 (05/15/20)
518
* Add support for additional GPUs and APUs, including AMD Ryzen™ 4000 Series APUs.
619
* Add two new GFX10 GlobalMemory Counters for graphics using DX12 and Vulkan: LocalVidMemBytes and PcieBytes.

build/cmake_modules/common.cmake

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,9 @@ if(NOT WIN32)
5858
set(CMAKE_CXX_COMPILER g++)
5959
endif()
6060

61+
if(NOT APPLE)
6162
set(GPA_COMMON_LINK_ARCHIVE_FLAG -Wl,--whole-archive)
6263
set(GPA_COMMON_LINK_NO_ARCHIVE_FLAG -Wl,--no-whole-archive)
64+
endif()
6365
add_compile_options(-Wno-unknown-pragmas -Wno-strict-aliasing -Wno-non-virtual-dtor -Wno-unused-value -msse -fvisibility=hidden)
64-
endif()
66+
endif()

build/cmake_modules/defs.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ cmake_minimum_required(VERSION 3.5.1)
33

44
## Define the GPA version
55
set(GPA_MAJOR_VERSION 3)
6-
set(GPA_MINOR_VERSION 6)
6+
set(GPA_MINOR_VERSION 7)
77
set(GPA_UPDATE_VERSION 0)
88

99
if(NOT DEFINED GPA_BUILD_NUMBER)

build/cmake_modules/targets.cmake

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,17 @@ include(${GPA_CMAKE_MODULES_DIR}/defs.cmake)
99
include(${GPA_CMAKE_MODULES_DIR}/build_flags.cmake)
1010

1111
if(UNIX)
12-
if(CMAKE_SIZEOF_VOID_P EQUAL 4)
13-
set(skipopencl ON)
14-
endif()
12+
set(skipopencl ON)
1513
endif()
1614

1715
add_subdirectory(${GPA_SRC_COMMON} ${CMAKE_BINARY_DIR}/${GPA_SRC_COMMON_REL_PATH})
1816
add_subdirectory(${GPA_SRC_COUNTER_GENERATOR} ${CMAKE_BINARY_DIR}/${GPA_SRC_COUNTER_GENERATOR_REL_PATH})
1917
add_subdirectory(${GPA_SRC_COUNTERS} ${CMAKE_BINARY_DIR}/${GPA_SRC_COUNTERS_REL_PATH})
2018

19+
if(APPLE)
20+
return()
21+
endif()
22+
2123
if(NOT ${skipopencl})
2224
add_subdirectory(${GPA_SRC_CL} ${CMAKE_BINARY_DIR}/${GPA_SRC_CL_REL_PATH})
2325
else()
@@ -29,14 +31,22 @@ if(WIN32)
2931

3032
if(NOT ${skipdx11})
3133
add_subdirectory(${GPA_SRC_DX11} ${CMAKE_BINARY_DIR}/${GPA_SRC_DX11_REL_PATH})
32-
add_subdirectory(${GPA_SRC_D3D11_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_D3D11_TRIANGLE_REL_PATH})
34+
if(NOT ${skipsamples})
35+
add_subdirectory(${GPA_SRC_D3D11_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_D3D11_TRIANGLE_REL_PATH})
36+
else()
37+
message(STATUS "Skipping DX11 sample from the build")
38+
endif()
3339
else()
3440
message(STATUS "Skipping DX11 from the build")
3541
endif()
3642

3743
if(NOT ${skipdx12})
3844
add_subdirectory(${GPA_SRC_DX12} ${CMAKE_BINARY_DIR}/${GPA_SRC_DX12_REL_PATH})
39-
add_subdirectory(${GPA_SRC_D3D12_COLOR_CUBE} ${CMAKE_BINARY_DIR}/${GPA_SRC_D3D12_COLOR_CUBE_REL_PATH})
45+
if(NOT ${skipsamples})
46+
add_subdirectory(${GPA_SRC_D3D12_COLOR_CUBE} ${CMAKE_BINARY_DIR}/${GPA_SRC_D3D12_COLOR_CUBE_REL_PATH})
47+
else()
48+
message(STATUS "Skipping DX12 sample from the build")
49+
endif()
4050
else()
4151
message(STATUS "Skipping DX12 from the build")
4252
endif()
@@ -47,19 +57,27 @@ endif()
4757

4858
if(NOT ${skipopengl})
4959
add_subdirectory(${GPA_SRC_GL} ${CMAKE_BINARY_DIR}/${GPA_SRC_GL_REL_PATH})
50-
if(WIN32)
51-
add_subdirectory(${GPA_SRC_GL_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_GL_TRIANGLE_REL_PATH})
52-
elseif(CMAKE_SIZEOF_VOID_P EQUAL 8)
53-
add_subdirectory(${GPA_SRC_GL_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_GL_TRIANGLE_REL_PATH})
60+
if(NOT ${skipsamples})
61+
if(WIN32)
62+
add_subdirectory(${GPA_SRC_GL_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_GL_TRIANGLE_REL_PATH})
63+
elseif(CMAKE_SIZEOF_VOID_P EQUAL 8)
64+
add_subdirectory(${GPA_SRC_GL_TRIANGLE} ${CMAKE_BINARY_DIR}/${GPA_SRC_GL_TRIANGLE_REL_PATH})
65+
endif()
66+
else()
67+
message(STATUS "Skipping OpenGL sample from the build")
5468
endif()
5569
else()
5670
message(STATUS "Skipping OpenGL from the build")
5771
endif()
5872

5973
if(NOT ${skipvulkan})
6074
add_subdirectory(${GPA_SRC_VK} ${CMAKE_BINARY_DIR}/${GPA_SRC_VK_REL_PATH})
61-
if(CMAKE_SIZEOF_VOID_P EQUAL 8)
62-
add_subdirectory(${GPA_SRC_VK_COLOR_CUBE} ${CMAKE_BINARY_DIR}/${GPA_SRC_VK_COLOR_CUBE_REL_PATH})
75+
if(NOT ${skipsamples})
76+
if(CMAKE_SIZEOF_VOID_P EQUAL 8)
77+
add_subdirectory(${GPA_SRC_VK_COLOR_CUBE} ${CMAKE_BINARY_DIR}/${GPA_SRC_VK_COLOR_CUBE_REL_PATH})
78+
endif()
79+
else()
80+
message(STATUS "Skipping Vulkan sample from the build")
6381
endif()
6482
else()
6583
message(STATUS "Skipping Vulkan from the build")

build/pre_build.py

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Copyright (c) 2018-2019 Advanced Micro Devices, Inc. All rights reserved.
1+
## Copyright (c) 2018-2020 Advanced Micro Devices, Inc. All rights reserved.
22
#! /usr/bin/python
33
# Utility Python Script to generate GPA projects on Windows and Linux
44

@@ -17,7 +17,7 @@ def pre_build(build_args):
1717
if build_args.nofetch == False:
1818
fetch_dependencies_script = os.path.join(gpa_root, "scripts", "fetch_dependencies.py")
1919
fetch_dependencies_script = os.path.normpath(fetch_dependencies_script)
20-
fetch_dependencies_args = ["python", fetch_dependencies_script]
20+
fetch_dependencies_args = [sys.executable, fetch_dependencies_script]
2121
fetch_dependencies_process = subprocess.Popen(fetch_dependencies_args)
2222
fetch_dependencies_process.wait()
2323
sys.stdout.flush()
@@ -38,6 +38,11 @@ def pre_build(build_args):
3838

3939
cmake_additional_args = PreBuildCMakeCommon.parse_cmake_arguments(build_args)
4040

41+
if build_args.build is not None:
42+
cmake_additional_args.append("-Dbuild=" + build_args.build)
43+
else:
44+
cmake_additional_args.append("-Dbuild=0")
45+
4146
if build_args.android == True:
4247
PreBuildCMakeCommon.cmake_generator_platforms.remove('x86')
4348
android_ndk=os.environ["ANDROID_NDK"]
@@ -75,10 +80,3 @@ def pre_build(build_args):
7580
script_parser = PreBuildCMakeCommon.define_cmake_arguments()
7681
build_args = script_parser.parse_args()
7782
pre_build(build_args)
78-
79-
80-
81-
82-
83-
84-

docs/sphinx/source/compute_counter_tables_gfx10.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
.. Copyright(c) 2018-2020 Advanced Micro Devices, Inc.All rights reserved.
2-
.. Compute Performance Counters for Navi
2+
.. Compute Performance Counters for RDNA
33
44
.. *** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.
55
6-
Navi Counters
6+
RDNA Counters
77
+++++++++++++
88

99
General Group
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
.. Copyright(c) 2018-2020 Advanced Micro Devices, Inc.All rights reserved.
2+
.. Compute Performance Counters for RDNA2
3+
4+
.. *** Note, this is an auto-generated file. Do not edit. Execute PublicCounterCompiler to rebuild.
5+
6+
RDNA2 Counters
7+
++++++++++++++
8+
9+
General Group
10+
%%%%%%%%%%%%%
11+
12+
.. csv-table::
13+
:header: "Counter Name", "Brief Description"
14+
:widths: 15, 80
15+
16+
"Wavefronts", "Total wavefronts."
17+
"VALUInsts", "The average number of vector ALU instructions executed per work-item (affected by flow control)."
18+
"SALUInsts", "The average number of scalar ALU instructions executed per work-item (affected by flow control)."
19+
"VFetchInsts", "The average number of vector fetch instructions from the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that fetch from video memory."
20+
"SFetchInsts", "The average number of scalar fetch instructions from the video memory executed per work-item (affected by flow control)."
21+
"VWriteInsts", "The average number of vector write instructions to the video memory executed per work-item (affected by flow control). Excludes FLAT instructions that write to video memory."
22+
"GDSInsts", "The average number of GDS read or GDS write instructions executed per work item (affected by flow control)."
23+
"VALUUtilization", "The percentage of active vector ALU threads in a wave. A lower number can mean either more thread divergence in a wave or that the work-group size is not a multiple of the wave size. Value range: 0% (bad), 100% (ideal - no thread divergence)."
24+
"VALUBusy", "The percentage of GPUTime vector ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
25+
"SALUBusy", "The percentage of GPUTime scalar ALU instructions are processed. Value range: 0% (bad) to 100% (optimal)."
26+
27+
LocalMemory Group
28+
%%%%%%%%%%%%%%%%%
29+
30+
.. csv-table::
31+
:header: "Counter Name", "Brief Description"
32+
:widths: 15, 80
33+
34+
"LDSInsts", "The average number of LDS read or LDS write instructions executed per work item (affected by flow control)."
35+
"LDSBankConflict", "The percentage of GPUTime LDS is stalled by bank conflicts. Value range: 0% (optimal) to 100% (bad)."
36+
37+
GlobalMemory Group
38+
%%%%%%%%%%%%%%%%%%
39+
40+
.. csv-table::
41+
:header: "Counter Name", "Brief Description"
42+
:widths: 15, 80
43+
44+
"FetchSize", "The total kilobytes fetched from the video memory. This is measured with all extra fetches and any cache or memory effects taken into account."
45+
"WriteSize", "The total kilobytes written to the video memory. This is measured with all extra fetches and any cache or memory effects taken into account."
46+
"L0CacheHit", "The percentage of fetch, write, atomic, and other instructions that hit the data in L0 cache. Value range: 0% (no hit) to 100% (optimal)."
47+
"L1CacheHit", "The percentage of fetch, write, atomic, and other instructions that hit the data in L1 cache. Writes and atomics always miss this cache. Value range: 0% (no hit) to 100% (optimal)."
48+
"L2CacheHit", "The percentage of fetch, write, atomic, and other instructions that hit the data in L2 cache. Value range: 0% (no hit) to 100% (optimal)."
49+
"MemUnitBusy", "The percentage of GPUTime the memory unit is active. The result includes the stall time (MemUnitStalled). This is measured with all extra fetches and writes and any cache or memory effects taken into account. Value range: 0% to 100% (fetch-bound)."
50+
"MemUnitStalled", "The percentage of GPUTime the memory unit is stalled. Try reducing the number or size of fetches and writes if possible. Value range: 0% (optimal) to 100% (bad)."
51+
"WriteUnitStalled", "The percentage of GPUTime the Write unit is stalled. Value range: 0% to 100% (bad)."

0 commit comments

Comments
 (0)