Skip to content

Commit bddd95f

Browse files
committed
GPA 3.10 updates
1 parent 30cd978 commit bddd95f

File tree

193 files changed

+51946
-26641
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

193 files changed

+51946
-26641
lines changed

_clang-format renamed to .clang-format

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,5 @@ SortUsingDeclarations: false
4343
BinPackArguments: false
4444
BinPackParameters: false
4545
ExperimentalAutoDetectBinPacking: false
46-
AllowAllParametersOfDeclarationOnNextLine: false
46+
AllowAllParametersOfDeclarationOnNextLine: false
47+
Standard: Cpp11

.clang-tidy

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
Checks: bugprone-*,clang-analyzer-*,clang-diagnostic-*,google-*,misc-*,modernize-*,-modernize-use-trailing-return-type,-modernize-concat-nested-namespaces,performance-*,readability-*,-modernize-use-auto
2+
WarningsAsErrors: bugprone-*,clang-analyzer-*,clang-diagnostic-*,google-*,misc-*,modernize-*,performance-*,readability-*
3+
FormatStyle: file
4+
CheckOptions:
5+
- key: readability-identifier-naming.ClassCase
6+
value: CamelCase
7+
- key: readability-identifier-naming.ClassConstantCase
8+
value: CamelCase
9+
- key: readability-identifier-naming.ClassConstantPrefix
10+
value: k
11+
- key: readability-identifier-naming.EnumCase
12+
value: CamelCase
13+
- key: readability-identifier-naming.EnumConstantCase
14+
value: CamelCase
15+
- key: readability-identifier-naming.EnumConstantPrefix
16+
value: k
17+
- key: readability-identifier-naming.FunctionCase
18+
value: CamelCase
19+
- key: readability-identifier-naming.GlobalConstantCase
20+
value: CamelCase
21+
- key: readability-identifier-naming.GlobalConstantPrefix
22+
value: k
23+
- key: readability-identifier-naming.GlobalConstantPointerCase
24+
value: CamelCase
25+
- key: readability-identifier-naming.GlobalConstantPointerPrefix
26+
value: k
27+
- key: readability-identifier-naming.MethodCase
28+
value: CamelCase
29+
- key: readability-identifier-naming.NamespaceCase
30+
value: lower_case
31+
- key: readability-identifier-naming.ParameterCase
32+
value: lower_case
33+
- key: readability-identifier-naming.PrivateMemberCase
34+
value: lower_case
35+
- key: readability-identifier-naming.PrivateMemberSuffix
36+
value: _
37+
- key: readability-identifier-naming.PublicMemberCase
38+
value: lower_case
39+
- key: readability-identifier-naming.StaticConstantCase
40+
value: CamelCase
41+
- key: readability-identifier-naming.StaticConstantPrefix
42+
value: k
43+
- key: readability-identifier-naming.TemplateParameterCase
44+
value: CamelCase
45+
- key: readability-identifier-naming.TypeAliasCase
46+
value: CamelCase
47+
- key: readability-identifier-naming.TypedefCase
48+
value: CamelCase
49+
- key: readability-identifier-naming.UnionCase
50+
value: CamelCase
51+
- key: readability-identifier-naming.VariableCase
52+
value: lower_case

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Copyright (c) 2016-2021 Advanced Micro Devices, Inc. All rights reserved.
1+
Copyright (c) 2016-2022 Advanced Micro Devices, Inc. All rights reserved.
22

33
Permission is hereby granted, free of charge, to any person obtaining a copy
44
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 74 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,64 @@ Prebuilt binaries can be downloaded from the Releases page: https://github.com/G
3131
* Provides access to some raw hardware counters. See [Raw Hardware Counters](#raw-hardware-counters) for more information.
3232

3333
## What's New
34-
* Version 3.9 (07/27/21)
34+
## Version 3.10 (01/255555/22)
3535
* Add support for additional GPUs and APUs.
36-
* Improvements made to the sample applications.
36+
* Redefined derived counters on GCN (Vega), RDNA, and RDNA2 hardware.
37+
* New pipeline-based counters to better match hardware behavior.
38+
* GCN (Polaris) hardware:
39+
* Added: CSThreadGroupSize.
40+
* Fixed: CSThreads, CSFlatVMemInsts, HiZTilesAccepted, HiZTilesAcceptedCount, PreZQuadsCulled, PreZQuadsCulledCount, PreZQuadsSurvivingCount.
41+
* GCN (Radeon Vega Series) hardware:
42+
* Removed: VSBusy, VSBusyCycles, VSTime, HSBusy, HSBusyCycles, HSTime, DSBusy, DSBusyCycles, DSTime.
43+
* Added: VsGsBusy, VsGsBusyCycles, VsGsTime, PreTessellationBusy, PreTessellationBusyCycles, PreTessellationTime, PostTessellationBusy, PostTessellationBusyCycles, PostTessellationTime.
44+
* Removed: VertexShader group (VSVerticesIn, VSVALUInstCount, VSSALUInstCount, VSVALUBusy, VSVALUBusyCycles, VSSALUBusy, VSSALUBusyCycles).
45+
* Added: VertexGeometry group (VsGsVALUInstCount, VsGsSALUInstCount, VsGsVALUBusy, VsGsVALUBusyCycles, VsGsSALUBusy, VsGsSALUBusyCycles).
46+
* Represents combined data from vertex and geometry shaders in a VS-PS or VS-GS-PS pipeline.
47+
* Removed: HullShader group (HSPatches, HSVALUInstCount, HSSALUInstCount, HSVALUBusy, HSVALUBusyCycles, HSSALUBusy, HSSALUBusyCycles).
48+
* Added: PreTessellation group (PreTessVALUInstCount, PreTessSALUInstCount, PreTessVALUBusy, PreTessVALUBusyCycles, PreTessSALUBusy, PreTessSALUBusyCycles).
49+
* Represents combined data from vertex and hull shaders in a VS-HS-DS-PS or VS-HS-DS-GS-PS pipeline.
50+
* Removed: DomainShader group (DSVerticesIn, DSVALUInstCount, DSSALUInstCount, DSVALUBusy, DSVALUBusyCycles, DSSALUBusy, DSSALUBusyCycles).
51+
* Removed: GeometryShader group (GSPrimsIn, GSVerticesOut, GSVALUInstCount, GSSALUInstCount, GSVALUBusy, GSVALUBusyCycles, GSSALUBusy, GSSALUBusyCycles).
52+
* Added: PostTessellation group (PostTessVALUInstCount, PostTessSALUInstCount, PostTessVALUBusy, PostTessVALUBusyCycles, PostTessSALUBusy, PostTessSALUBusyCycles).
53+
* Represents combined data from domain and geometry shaders in a VS-HS-DS-PS or VS-HS-DS-GS-PS pipeline.
54+
* Added: CSThreadGroupSize.
55+
* Fixed: PSBusy, PSBusyCycles, PSTime, CSBusy, CSBusyCycles, CSTime, CSThreads, CSFlatVMemInsts, HiZTilesAccepted, HiZTilesAcceptedCount, HiZTilesRejectedCount, HiZQuadsCulled, HiZQuadsCulledCount, HiZQuadsAcceptedCount, PreZQuadsCulled, PreZQuadsCulledCount, PreZQuadsSurvivingCount.
56+
* RDNA (Radeon RX 5000 Series) hardware:
57+
* Removed: VSBusy, VSBusyCycles, VSTime, HSBusy, HSBusyCycles, HSTime, DSBusy, DSBusyCycles, DSTime.
58+
* Added: VsGsBusy, VsGsBusyCycles, VsGsTime, PreTessellationBusy, PreTessellationBusyCycles, PreTessellationTime, PostTessellationBusy, PostTessellationBusyCycles, PostTessellationTime.
59+
* Removed: VertexShader group (VSVerticesIn, VSVALUInstCount, VSSALUInstCount, VSVALUBusy, VSVALUBusyCycles, VSSALUBusy, VSSALUBusyCycles).
60+
* Added: VertexGeometry group (VsGsVALUInstCount, VsGsSALUInstCount, VsGsVALUBusy, VsGsVALUBusyCycles, VsGsSALUBusy, VsGsSALUBusyCycles).
61+
* Represents combined data from vertex and geometry shaders in a VS-PS or VS-GS-PS pipeline.
62+
* Removed: HullShader group (HSPatches, HSVALUInstCount, HSSALUInstCount, HSVALUBusy, HSVALUBusyCycles, HSSALUBusy, HSSALUBusyCycles).
63+
* Added: PreTessellation group (PreTessVALUInstCount, PreTessSALUInstCount, PreTessVALUBusy, PreTessVALUBusyCycles, PreTessSALUBusy, PreTessSALUBusyCycles).
64+
* Represents combined data from vertex and hull shaders in a VS-HS-DS-PS or VS-HS-DS-GS-PS pipeline.
65+
* Removed: DomainShader group (DSVerticesIn, DSVALUInstCount, DSSALUInstCount, DSVALUBusy, DSVALUBusyCycles, DSSALUBusy, DSSALUBusyCycles).
66+
* Removed: GeometryShader group (GSPrimsIn, GSVerticesOut, GSVALUInstCount, GSSALUInstCount, GSVALUBusy, GSVALUBusyCycles, GSSALUBusy, GSSALUBusyCycles).
67+
* Added: PostTessellation group (PostTessVALUInstCount, PostTessSALUInstCount, PostTessVALUBusy, PostTessVALUBusyCycles, PostTessSALUBusy, PostTessSALUBusyCycles).
68+
* Represents combined data from domain and geometry shaders in a VS-HS-DS-PS or VS-HS-DS-GS-PS pipeline.
69+
* Removed: PrimitivesIn.
70+
* Added: CSThreadGroupSize.
71+
* Fixed: PSBusy, PSBusyCycles, PSTime, CSBusy, CSBusyCycles, CSTime, CSThreads, HiZTilesAccepted, HiZTilesAcceptedCount, HiZTilesRejectedCount, PreZQuadsCulled, PreZQuadsCulledCount, PreZQuadsSurvivingCount.
72+
* RDNA2 (Radeon RX 6000 Series) hardware:
73+
* Removed: VSBusy, VSBusyCycles, VSTime, HSBusy, HSBusyCycles, HSTime, DSBusy, DSBusyCycles, DSTime.
74+
* Removed: VertexShader group, HullShader group, DomainShader group, GeometryShader group.
75+
* Removed: PrimitivesIn, PSVALUInstCount, PSSALUInstCount, PSVALUBusy, PSVALUBusyCycles, PSSALUBusy, PSSALUBusyCycles.
76+
* Removed: CSVALUInsts, CSVALUUtilization, CSSALUInsts, CSVFetchInsts, CSSFetchInsts, CSVWriteInsts, CSVALUBusy, CSVALUBusyCycles, CSSALUBusy, CSSALUBusyCycles.
77+
* Added: CSThreadGroupSize
78+
* Fixed: CSThreads, HiZTilesAccepted, HiZTilesAcceptedCount, HiZTilesRejectedCount, PreZQuadsCulled, PreZQuadsCulledCount, PreZQuadsSurvivingCount.
79+
* Integrated clang-tidy and clang-format into cmake build options.
80+
* New entrypoint added: GpaGetDeviceGeneration. Binary backwards compatibility is maintained.
81+
* OpenGL on Linux: Fixed hardware detection on MESA drivers.
82+
* OpenGL: Fixed hardware detection accuracy.
83+
* DX11:
84+
* Fixed Adrenalin driver version detection.
85+
* Fixed setting the number of shader arrays based on client hardware.
86+
* Improvements made to the sample applications:
87+
* Extensive counter validation in DX12.
88+
* Sample apps can now confirm successful validation tests.
89+
* Sample apps now support passing in a counter file to specify which counters to enable.
90+
* Consolidated parameter parsing logic in sample apps.
91+
* In Vulkan and DX12 samples, the return code now indicates the number of errors that were reported.
3792

3893
## System Requirements
3994
* An AMD Radeon GPU or APU based on Graphics IP version 8 and newer.
@@ -80,7 +135,24 @@ The documentation is hosted publicly at: http://gpuperfapi.readthedocs.io/en/lat
80135
This release exposes both "Derived" counters and "Raw Hardware" counters. Derived counters are counters that are computed using a set of raw hardware counters.
81136
This version allows you to access the raw hardware counters by simply specifying a flag when calling GpaOpenContext.
82137

138+
## New Pipeline-Based Counters
139+
It was discovered that the improvements introduced in Vega, RDNA, and RDNA2 architectures were not being properly accounted for in GPUPerfAPI v3.9, and caused a lot of known issues to be called out in that release. In certain cases, the driver and hardware are able to make optimizations by combining two shader stages together, which prevented GPUPerfAPI from identifying which instructions where executed for which shader type. As a result of these changes, GPUPerfAPI is no longer able to expose instruction counters for each API-level shader, specifically Vertex Shaders, Hull Shaders, Domain Shaders, and Geometry Shaders. Pixel Shaders and Compute Shaders remain unchanged. We are now exposing these instruction counters based on the type of shader pipeline being used. In pipelines that do not use tessellation, the instruction counts for both the Vertex and Geometry Shaders (if used) will be combined in the VertexGeometry group (ie: counters with the "VsGs" prefix). In pipelines that use tessellation, the instruction counts for both the Vertex and Hull Shaders will be combined in the PreTessellation group (ie: counters with the "PreTessellation" or "PreTess" prefix), and instruction counts for the Domain and Geometry Shaders (if used) will be combined in the PostTessellation group (ie: counters with the "PostTessellation" or "PostTess" prefix). The table below may help to better understand the new mapping between the API-level shaders (across the top), and which prefixes to look for in the GPUPerfAPI counters.
140+
141+
| Pipeline | Vertex | Hull | Domain | Geometry | Pixel | Compute |
142+
|----------------|:-------:|:-------:|:--------:|:--------:|:-----:|:-------:|
143+
| VS-PS | VsGs | | | | PS | |
144+
| VS-GS-PS | VsGs | | | VsGs | PS | |
145+
| VS-HS-DS-PS | PreTess | PreTess | PostTess | PostTess | PS | |
146+
| VS-HS-DS-GS-PS | PreTess | PreTess | PostTess | PostTess | PS | |
147+
| CS | | | | | | CS |
148+
83149
## Known Issues
150+
### Counter Validation Errors in D3D12ColorCube Sample App
151+
Due to the extensive counter validation now being done in the D3D12ColorCube sample application, and some expected variation in nondeterministic counters across a wide range of systems, the sample app may report errors on some systems. Likewise, some counters are marked as known issues and we are investigating the underlying causes of the inconsistent results.
152+
153+
Additionally, the following deterministic performance counter values may not be accurate for the D3D12ColorCube sample application:
154+
* CulledPrims, PSPixelsOut on Radeon RX 480 hardware.
155+
84156
### Ubuntu 20.04 LTS Vulkan ICD Issue
85157
On Ubuntu 20.04 LTS, Vulkan ICD may not be set to use AMD Vulkan ICD. In this case, it needs to be explicitly set to use AMD Vulkan ICD before using the GPA. It can be done by setting the ```VK_ICD_FILENAMES``` environment variable to ```/etc/vulkan/icd.d/amd_icd64.json```.
86158

@@ -96,23 +168,10 @@ By default this file is only modifiable by root, so the application being profil
96168
* You may have to reboot the system for the change to take effect.
97169
* Setting the GPU clock mode is not working correctly for <b>Radeon 5700 Series GPUs</b>, potentially leading to some inconsistencies in counter values from one run to the next.
98170

99-
### DirectX11 Performance Counter Accuracy For Select Counters and GPUs
100-
The following performance counter values may not be accurate for DirectX 11 applications running on a Radeon 5700, and 6000 Series GPUs:
101-
* VALUInstCount, SALUInstCount, VALUBusy, SALUBusy for all shader stages: These values should be representative of performance, but may not be 100% accurate.
102-
* Most of the ComputeShader counters (all except the MemUnit and WriteUnit counters): These values should be representative of performance, but may not be 100% accurate.
103-
104171
### OpenCL Performance Counter Accuracy For Radeon 6000 Series GPUs
105172
The following performance counter values may not be accurate for OpenCL applications running on Radeon 6000 Series GPUs:
106173
* Wavefronts, VALUInsts, SALUInsts, SALUBusy, VALUUtilization: These values should be representative of performance, but may not be 100% accurate.
107174

108-
### OpenGL Performance Counter Accuracy For Radeon 5700 Series GPUs
109-
The following performance counter values may not be accurate for OpenGL applications running on a Radeon 5700 Series GPUs:
110-
* Most of the ComputeShader counters (all except the MemUnit and WriteUnit counters): These values should be representative of performance, but may not be 100% accurate.
111-
112-
### Variability in Deterministic Counters For Select GPUs
113-
Performance counters which should be deterministic are showing variability on Radeon 5700 and 6000 Series GPUs. The values should be useful for performance analysis, but may not be 100% correct.
114-
* e.g. VSVerticesIn, PrimitivesIn, PSPixelsOut, PreZSamplesPassing
115-
116175
### Profiling Bundles
117176
Profiling bundles in DirectX12 and Vulkan is not working properly. It is recommended to remove those GPA Samples from your application, or move the calls out of the bundle for profiling.
118177

0 commit comments

Comments
 (0)