Skip to content

Commit e7103f8

Browse files
committed
Merge tag 'amd-drm-next-6.13-2024-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2 parents c9ff14d + dac64cb commit e7103f8

File tree

334 files changed

+9322
-8887
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

334 files changed

+9322
-8887
lines changed

Documentation/gpu/amdgpu/display/dc-arch-overview.svg

Lines changed: 731 additions & 0 deletions
Loading

Documentation/gpu/amdgpu/display/dc-components.svg

Lines changed: 732 additions & 0 deletions
Loading

Documentation/gpu/amdgpu/display/dc-debug.rst

Lines changed: 187 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,181 @@
22
Display Core Debug tools
33
========================
44

5+
In this section, you will find helpful information on debugging the amdgpu
6+
driver from the display perspective. This page introduces debug mechanisms and
7+
procedures to help you identify if some issues are related to display code.
8+
9+
Narrow down display issues
10+
==========================
11+
12+
Since the display is the driver's visual component, it is common to see users
13+
reporting issues as a display when another component causes the problem. This
14+
section equips users to determine if a specific issue was caused by the display
15+
component or another part of the driver.
16+
17+
DC dmesg important messages
18+
---------------------------
19+
20+
The dmesg log is the first source of information to be checked, and amdgpu
21+
takes advantage of this feature by logging some valuable information. When
22+
looking for the issues associated with amdgpu, remember that each component of
23+
the driver (e.g., smu, PSP, dm, etc.) is loaded one by one, and this
24+
information can be found in the dmesg log. In this sense, look for the part of
25+
the log that looks like the below log snippet::
26+
27+
[ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8).
28+
[ 4.254718] [drm] register mmio base: 0xFCB00000
29+
[ 4.254918] [drm] register mmio size: 1048576
30+
[ 4.260095] [drm] add ip block number 0 <soc21_common>
31+
[ 4.260318] [drm] add ip block number 1 <gmc_v11_0>
32+
[ 4.260510] [drm] add ip block number 2 <ih_v6_0>
33+
[ 4.260696] [drm] add ip block number 3 <psp>
34+
[ 4.260878] [drm] add ip block number 4 <smu>
35+
[ 4.261057] [drm] add ip block number 5 <dm>
36+
[ 4.261231] [drm] add ip block number 6 <gfx_v11_0>
37+
[ 4.261402] [drm] add ip block number 7 <sdma_v6_0>
38+
[ 4.261568] [drm] add ip block number 8 <vcn_v4_0>
39+
[ 4.261729] [drm] add ip block number 9 <jpeg_v4_0>
40+
[ 4.261887] [drm] add ip block number 10 <mes_v11_0>
41+
42+
From the above example, you can see the line that reports that `<dm>`,
43+
(**Display Manager**), was loaded, which means that display can be part of the
44+
issue. If you do not see that line, something else might have failed before
45+
amdgpu loads the display component, indicating that we don't have a
46+
display issue.
47+
48+
After you identified that the DM was loaded correctly, you can check for the
49+
display version of the hardware in use, which can be retrieved from the dmesg
50+
log with the command::
51+
52+
dmesg | grep -i 'display core'
53+
54+
This command shows a message that looks like this::
55+
56+
[ 4.655828] [drm] Display Core v3.2.285 initialized on DCN 3.2
57+
58+
This message has two key pieces of information:
59+
60+
* **The DC version (e.g., v3.2.285)**: Display developers release a new DC version
61+
every week, and this information can be advantageous in a situation where a
62+
user/developer must find a good point versus a bad point based on a tested
63+
version of the display code. Remember from page :ref:`Display Core <amdgpu-display-core>`,
64+
that every week the new patches for display are heavily tested with IGT and
65+
manual tests.
66+
* **The DCN version (e.g., DCN 3.2)**: The DCN block is associated with the
67+
hardware generation, and the DCN version conveys the hardware generation that
68+
the driver is currently running. This information helps to narrow down the
69+
code debug area since each DCN version has its files in the DC folder per DCN
70+
component (from the example, the developer might want to focus on
71+
files/folders/functions/structs with the dcn32 label might be executed).
72+
However, keep in mind that DC reuses code across different DCN versions; for
73+
example, it is expected to have some callbacks set in one DCN that are the same
74+
as those from another DCN. In summary, use the DCN version just as a guide.
75+
76+
From the dmesg file, it is also possible to get the ATOM bios code by using::
77+
78+
dmesg | grep -i 'ATOM BIOS'
79+
80+
Which generates an output that looks like this::
81+
82+
[ 4.274534] amdgpu: ATOM BIOS: 113-D7020100-102
83+
84+
This type of information is useful to be reported.
85+
86+
Avoid loading display core
87+
--------------------------
88+
89+
Sometimes, it might be hard to figure out which part of the driver is causing
90+
the issue; if you suspect that the display is not part of the problem and your
91+
bug scenario is simple (e.g., some desktop configuration) you can try to remove
92+
the display component from the equation. First, you need to identify `dm` ID
93+
from the dmesg log; for example, search for the following log::
94+
95+
[ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8).
96+
[..]
97+
[ 4.260095] [drm] add ip block number 0 <soc21_common>
98+
[ 4.260318] [drm] add ip block number 1 <gmc_v11_0>
99+
[..]
100+
[ 4.261057] [drm] add ip block number 5 <dm>
101+
102+
Notice from the above example that the `dm` id is 5 for this specific hardware.
103+
Next, you need to run the following binary operation to identify the IP block
104+
mask::
105+
106+
0xffffffff & ~(1 << [DM ID])
107+
108+
From our example the IP mask is::
109+
110+
0xffffffff & ~(1 << 5) = 0xffffffdf
111+
112+
Finally, to disable DC, you just need to set the below parameter in your
113+
bootloader::
114+
115+
amdgpu.ip_block_mask = 0xffffffdf
116+
117+
If you can boot your system with the DC disabled and still see the issue, it
118+
means you can rule DC out of the equation. However, if the bug disappears, you
119+
still need to consider the DC part of the problem and keep narrowing down the
120+
issue. In some scenarios, disabling DC is impossible since it might be
121+
necessary to use the display component to reproduce the issue (e.g., play a
122+
game).
123+
124+
**Note: This will probably lead to the absence of a display output.**
125+
126+
Display flickering
127+
------------------
128+
129+
Display flickering might have multiple causes; one is the lack of proper power
130+
to the GPU or problems in the DPM switches. A good first generic verification
131+
is to set the GPU to use high voltage::
132+
133+
bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"
134+
135+
The above command sets the GPU/APU to use the maximum power allowed which
136+
disables DPM switches. If forcing DPM levels high does not fix the issue, it
137+
is less likely that the issue is related to power management. If the issue
138+
disappears, there is a good chance that other components might be involved, and
139+
the display should not be ignored since this could be a DPM issues. From the
140+
display side, if the power increase fixes the issue, it is worth debugging the
141+
clock configuration and the pipe split police used in the specific
142+
configuration.
143+
144+
Display artifacts
145+
-----------------
146+
147+
Users may see some screen artifacts that can be categorized into two different
148+
types: localized artifacts and general artifacts. The localized artifacts
149+
happen in some specific areas, such as around the UI window corners; if you see
150+
this type of issue, there is a considerable chance that you have a userspace
151+
problem, likely Mesa or similar. The general artifacts usually happen on the
152+
entire screen. They might be caused by a misconfiguration at the driver level
153+
of the display parameters, but the userspace might also cause this issue. One
154+
way to identify the source of the problem is to take a screenshot or make a
155+
desktop video capture when the problem happens; after checking the
156+
screenshot/video recording, if you don't see any of the artifacts, it means
157+
that the issue is likely on the the driver side. If you can still see the
158+
problem in the data collected, it is an issue that probably happened during
159+
rendering, and the display code just got the framebuffer already corrupted.
160+
161+
Disabling/Enabling specific features
162+
====================================
163+
164+
DC has a struct named `dc_debug_options`, which is statically initialized by
165+
all DCE/DCN components based on the specific hardware characteristic. This
166+
structure usually facilitates the bring-up phase since developers can start
167+
with many disabled features and enable them individually. This is also an
168+
important debug feature since users can change it when debugging specific
169+
issues.
170+
171+
For example, dGPU users sometimes see a problem where a horizontal fillet of
172+
flickering happens in some specific part of the screen. This could be an
173+
indication of Sub-Viewport issues; after the users identified the target DCN,
174+
they can set the `force_disable_subvp` field to true in the statically
175+
initialized version of `dc_debug_options` to see if the issue gets fixed. Along
176+
the same lines, users/developers can also try to turn off `fams2_config` and
177+
`enable_single_display_2to1_odm_policy`. In summary, the `dc_debug_options` is
178+
an interesting form for identifying the problem.
179+
5180
DC Visual Confirmation
6181
======================
7182

@@ -76,6 +251,18 @@ change in real-time by using something like::
76251
When reporting a bug related to DC, consider attaching this log before and
77252
after you reproduce the bug.
78253

254+
Collect Firmware information
255+
============================
256+
257+
When reporting issues, it is important to have the firmware information since
258+
it can be helpful for debugging purposes. To get all the firmware information,
259+
use the command::
260+
261+
cat /sys/kernel/debug/dri/0/amdgpu_firmware_info
262+
263+
From the display perspective, pay attention to the firmware of the DMCU and
264+
DMCUB.
265+
79266
DMUB Firmware Debug
80267
===================
81268

Documentation/gpu/amdgpu/display/dcn-blocks.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _dcn_blocks:
2+
13
==========
24
DCN Blocks
35
==========

Documentation/gpu/amdgpu/display/dcn-overview.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _dcn_overview:
2+
13
=======================
24
Display Core Next (DCN)
35
=======================

Documentation/gpu/amdgpu/display/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ table of content:
9090
display-manager.rst
9191
dcn-overview.rst
9292
dcn-blocks.rst
93+
programming-model-dcn.rst
9394
mpo-overview.rst
9495
dc-debug.rst
9596
display-contributing.rst
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
====================
2+
DC Programming Model
3+
====================
4+
5+
In the :ref:`Display Core Next (DCN) <dcn_overview>` and :ref:`DCN Block
6+
<dcn_blocks>` pages, you learned about the hardware components and how they
7+
interact with each other. On this page, the focus is shifted to the display
8+
code architecture. Hence, it is reasonable to remind the reader that the code
9+
in DC is shared with other OSes; for this reason, DC provides a set of
10+
abstractions and operations to connect different APIs with the hardware
11+
configuration. See DC as a service available for a Display Manager (amdgpu_dm)
12+
to access and configure DCN/DCE hardware (DCE is also part of DC, but for
13+
simplicity's sake, this documentation only examines DCN).
14+
15+
.. note::
16+
For this page, we will use the term GPU to refers to dGPU and APU.
17+
18+
Overview
19+
========
20+
21+
From the display hardware perspective, it is plausible to expect that if a
22+
problem is well-defined, it will probably be implemented at the hardware level.
23+
On the other hand, when there are multiple ways of achieving something without
24+
a very well-defined scope, the solution is usually implemented as a policy at
25+
the DC level. In other words, some policies are defined in the DC core
26+
(resource management, power optimization, image quality, etc.), and the others
27+
implemented in hardware are enabled via DC configuration.
28+
29+
In terms of hardware management, DCN has multiple instances of the same block
30+
(e.g., HUBP, DPP, MPC, etc), and during the driver execution, it might be
31+
necessary to use some of these instances. The core has policies in place for
32+
handling those instances. Regarding resource management, the DC objective is
33+
quite simple: minimize the hardware shuffle when the driver performs some
34+
actions. When the state changes from A to B, the transition is considered
35+
easier to maneuver if the hardware resource is still used for the same set of
36+
driver objects. Usually, adding and removing a resource to a `pipe_ctx` (more
37+
details below) is not a problem; however, moving a resource from one `pipe_ctx`
38+
to another should be avoided.
39+
40+
Another area of influence for DC is power optimization, which has a myriad of
41+
arrangement possibilities. In some way, just displaying an image via DCN should
42+
be relatively straightforward; however, showing it with the best power
43+
footprint is more desirable, but it has many associated challenges.
44+
Unfortunately, there is no straight-forward analytic way to determine if a
45+
configuration is the best one for the context due to the enormous variety of
46+
variables related to this problem (e.g., many different DCN/DCE hardware
47+
versions, different displays configurations, etc.) for this reason DC
48+
implements a dedicated library for trying some configuration and verify if it
49+
is possible to support it or not. This type of policy is extremely complex to
50+
create and maintain, and amdgpu driver relies on Display Mode Library (DML) to
51+
generate the best decisions.
52+
53+
In summary, DC must deal with the complexity of handling multiple scenarios and
54+
determine policies to manage them. All of the above information is conveyed to
55+
give the reader some idea about the complexity of driving a display from the
56+
driver's perspective. This page hopes to allow the reader to better navigate
57+
over the amdgpu display code.
58+
59+
Display Driver Architecture Overview
60+
====================================
61+
62+
The diagram below provides an overview of the display driver architecture;
63+
notice it illustrates the software layers adopted by DC:
64+
65+
.. kernel-figure:: dc-components.svg
66+
67+
The first layer of the diagram is the high-level DC API represented by the
68+
`dc.h` file; below it are two big blocks represented by Core and Link. Next is
69+
the hardware configuration block; the main file describing it is
70+
the`hw_sequencer.h`, where the implementation of the callbacks can be found in
71+
the hardware sequencer folder. Almost at the end, you can see the block level
72+
API (`dc/inc/hw`), which represents each DCN low-level block, such as HUBP,
73+
DPP, MPC, OPTC, etc. Notice on the left side of the diagram that we have a
74+
different set of layers representing the interaction with the DMUB
75+
microcontroller.
76+
77+
Basic Objects
78+
-------------
79+
80+
The below diagram outlines the basic display objects. In particular, pay
81+
attention to the names in the boxes since they represent a data structure in
82+
the driver:
83+
84+
.. kernel-figure:: dc-arch-overview.svg
85+
86+
Let's start with the central block in the image, `dc`. The `dc` struct is
87+
initialized per GPU; for example, one GPU has one `dc` instance, two GPUs have
88+
two `dc` instances, and so forth. In other words we have one 'dc' per 'amdgpu'
89+
instance. In some ways, this object behaves like the `Singleton` pattern.
90+
91+
After the `dc` block in the diagram, you can see the `dc_link` component, which
92+
is a low-level abstraction for the connector. One interesting aspect of the
93+
image is that connectors are not part of the DCN block; they are defined by the
94+
platform/board and not by the SoC. The `dc_link` struct is the high-level data
95+
container with information such as connected sinks, connection status, signal
96+
types, etc. After `dc_link`, there is the `dc_sink`, which is the object that
97+
represents the connected display.
98+
99+
.. note::
100+
For historical reasons, we used the name `dc_link`, which gives the
101+
wrong impression that this abstraction only deals with physical connections
102+
that the developer can easily manipulate. However, this also covers
103+
conections like eDP or cases where the output is connected to other devices.
104+
105+
There are two structs that are not represented in the diagram since they were
106+
elaborated in the DCN overview page (check the DCN block diagram :ref:`Display
107+
Core Next (DCN) <dcn_overview>`); still, it is worth bringing back for this
108+
overview which is `dc_stream` and `dc_state`. The `dc_stream` stores many
109+
properties associated with the data transmission, but most importantly, it
110+
represents the data flow from the connector to the display. Next we have
111+
`dc_state`, which represents the logic state within the hardware at the moment;
112+
`dc_state` is composed of `dc_stream` and `dc_plane`. The `dc_stream` is the DC
113+
version of `drm_crtc` and represents the post-blending pipeline.
114+
115+
Speaking of the `dc_plane` data structure (first part of the diagram), you can
116+
think about it as an abstraction similar to `drm_plane` that represents the
117+
pre-blending portion of the pipeline. This image was probably processed by GFX
118+
and is ready to be composited under a `dc_stream`. Normally, the driver may
119+
have one or more `dc_plane` connected to the same `dc_stream`, which defines a
120+
composition at the DC level.
121+
122+
Basic Operations
123+
----------------
124+
125+
Now that we have covered the basic objects, it is time to examine some of the
126+
basic hardware/software operations. Let's start with the `dc_create()`
127+
function, which directly works with the `dc` data struct; this function behaves
128+
like a constructor responsible for the basic software initialization and
129+
preparing for enabling other parts of the API. It is important to highlight
130+
that this operation does not touch any hardware configuration; it is only a
131+
software initialization.
132+
133+
Next, we have the `dc_hardware_init()`, which also relies on the `dc` data
134+
struct. Its main function is to put the hardware in a valid state. It is worth
135+
highlighting that the hardware might initialize in an unknown state, and it is
136+
a requirement to put it in a valid state; this function has multiple callbacks
137+
for the hardware-specific initialization, whereas `dc_hardware_init` does the
138+
hardware initialization and is the first point where we touch hardware.
139+
140+
The `dc_get_link_at_index` is an operation that depends on the `dc_link` data
141+
structure. This function retrieves and enumerates all the `dc_links` available
142+
on the device; this is required since this information is not part of the SoC
143+
definition but depends on the board configuration. As soon as the `dc_link` is
144+
initialized, it is useful to figure out if any of them are already connected to
145+
the display by using the `dc_link_detect()` function. After the driver figures
146+
out if any display is connected to the device, the challenging phase starts:
147+
configuring the monitor to show something. Nonetheless, dealing with the ideal
148+
configuration is not a DC task since this is the Display Manager (`amdgpu_dm`)
149+
responsibility which in turn is responsible for dealing with the atomic
150+
commits. The only interface DC provides to the configuration phase is the
151+
function `dc_validate_with_context` that receives the configuration information
152+
and, based on that, validates whether the hardware can support it or not. It is
153+
important to add that even if the display supports some specific configuration,
154+
it does not mean the DCN hardware can support it.
155+
156+
After the DM and DC agree upon the configuration, the stream configuration
157+
phase starts. This task activates one or more `dc_stream` at this phase, and in
158+
the best-case scenario, you might be able to turn the display on with a black
159+
screen (it does not show anything yet since it does not have any plane
160+
associated with it). The final step would be to call the
161+
`dc_update_planes_and_stream,` which will add or remove planes.
162+

0 commit comments

Comments
 (0)