Skip to content

Commit a037699

Browse files
sefrickeJonathan Corbet
authored andcommitted
docs: Add debugging section to process
This idea was formed after noticing that new developers experience certain difficulty to navigate within the multitude of different debugging options in the Kernel and while there often is good documentation for the tools, the developer has to know first that they exist and where to find them. Add a general debugging section to the Kernel documentation, as an easily locatable entry point to other documentation and as a general guideline for the topic. Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241028-media_docs_improve_v3-v3-1-edf5c5b3746f@collabora.com
1 parent d8c949c commit a037699

File tree

4 files changed

+573
-3
lines changed

4 files changed

+573
-3
lines changed
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
========================================
4+
Debugging advice for driver development
5+
========================================
6+
7+
This document serves as a general starting point and lookup for debugging
8+
device drivers.
9+
While this guide focuses on debugging that requires re-compiling the
10+
module/kernel, the :doc:`userspace debugging guide
11+
</process/debugging/userspace_debugging_guide>` will guide
12+
you through tools like dynamic debug, ftrace and other tools useful for
13+
debugging issues and behavior.
14+
For general debugging advice, see the :doc:`general advice document
15+
</process/debugging/index>`.
16+
17+
.. contents::
18+
:depth: 3
19+
20+
The following sections show you the available tools.
21+
22+
printk() & friends
23+
------------------
24+
25+
These are derivatives of printf() with varying destinations and support for
26+
being dynamically turned on or off, or lack thereof.
27+
28+
Simple printk()
29+
~~~~~~~~~~~~~~~
30+
31+
The classic, can be used to great effect for quick and dirty development
32+
of new modules or to extract arbitrary necessary data for troubleshooting.
33+
34+
Prerequisite: ``CONFIG_PRINTK`` (usually enabled by default)
35+
36+
**Pros**:
37+
38+
- No need to learn anything, simple to use
39+
- Easy to modify exactly to your needs (formatting of the data (See:
40+
:doc:`/core-api/printk-formats`), visibility in the log)
41+
- Can cause delays in the execution of the code (beneficial to confirm whether
42+
timing is a factor)
43+
44+
**Cons**:
45+
46+
- Requires rebuilding the kernel/module
47+
- Can cause delays in the execution of the code (which can cause issues to be
48+
not reproducible)
49+
50+
For the full documentation see :doc:`/core-api/printk-basics`
51+
52+
Trace_printk
53+
~~~~~~~~~~~~
54+
55+
Prerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>``
56+
57+
It is a tiny bit less comfortable to use than printk(), because you will have
58+
to read the messages from the trace file (See: :ref:`read_ftrace_log`
59+
instead of from the kernel log, but very useful when printk() adds unwanted
60+
delays into the code execution, causing issues to be flaky or hidden.)
61+
62+
If the processing of this still causes timing issues then you can try
63+
trace_puts().
64+
65+
For the full Documentation see trace_printk()
66+
67+
dev_dbg
68+
~~~~~~~
69+
70+
Print statement, which can be targeted by
71+
:ref:`process/debugging/userspace_debugging_guide:dynamic debug` that contains
72+
additional information about the device used within the context.
73+
74+
**When is it appropriate to leave a debug print in the code?**
75+
76+
Permanent debug statements have to be useful for a developer to troubleshoot
77+
driver misbehavior. Judging that is a bit more of an art than a science, but
78+
some guidelines are in the :ref:`Coding style guidelines
79+
<process/coding-style:13) printing kernel messages>`. In almost all cases the
80+
debug statements shouldn't be upstreamed, as a working driver is supposed to be
81+
silent.
82+
83+
Custom printk
84+
~~~~~~~~~~~~~
85+
86+
Example::
87+
88+
#define core_dbg(fmt, arg...) do { \
89+
if (core_debug) \
90+
printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
91+
} while (0)
92+
93+
**When should you do this?**
94+
95+
It is better to just use a pr_debug(), which can later be turned on/off with
96+
dynamic debug. Additionally, a lot of drivers activate these prints via a
97+
variable like ``core_debug`` set by a module parameter. However, Module
98+
parameters `are not recommended anymore
99+
<https://lore.kernel.org/all/2024032757-surcharge-grime-d3dd@gregkh>`_.
100+
101+
Ftrace
102+
------
103+
104+
Creating a custom Ftrace tracepoint
105+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106+
107+
A tracepoint adds a hook into your code that will be called and logged when the
108+
tracepoint is enabled. This can be used, for example, to trace hitting a
109+
conditional branch or to dump the internal state at specific points of the code
110+
flow during a debugging session.
111+
112+
Here is a basic description of :ref:`how to implement new tracepoints
113+
<trace/tracepoints:usage>`.
114+
115+
For the full event tracing documentation see :doc:`/trace/events`
116+
117+
For the full Ftrace documentation see :doc:`/trace/ftrace`
118+
119+
DebugFS
120+
-------
121+
122+
Prerequisite: ``CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>``
123+
124+
DebugFS differs from the other approaches of debugging, as it doesn't write
125+
messages to the kernel log nor add traces to the code. Instead it allows the
126+
developer to handle a set of files.
127+
With these files you can either store values of variables or make
128+
register/memory dumps or you can make these files writable and modify
129+
values/settings in the driver.
130+
131+
Possible use-cases among others:
132+
133+
- Store register values
134+
- Keep track of variables
135+
- Store errors
136+
- Store settings
137+
- Toggle a setting like debug on/off
138+
- Error injection
139+
140+
This is especially useful, when the size of a data dump would be hard to digest
141+
as part of the general kernel log (for example when dumping raw bitstream data)
142+
or when you are not interested in all the values all the time, but with the
143+
possibility to inspect them.
144+
145+
The general idea is:
146+
147+
- Create a directory during probe (``struct dentry *parent =
148+
debugfs_create_dir("my_driver", NULL);``)
149+
- Create a file (``debugfs_create_u32("my_value", 444, parent, &my_variable);``)
150+
151+
- In this example the file is found in
152+
``/sys/kernel/debug/my_driver/my_value`` (with read permissions for
153+
user/group/all)
154+
- any read of the file will return the current contents of the variable
155+
``my_variable``
156+
157+
- Clean up the directory when removing the device
158+
(``debugfs_remove_recursive(parent);``)
159+
160+
For the full documentation see :doc:`/filesystems/debugfs`.
161+
162+
KASAN, UBSAN, lockdep and other error checkers
163+
----------------------------------------------
164+
165+
KASAN (Kernel Address Sanitizer)
166+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167+
168+
Prerequisite: ``CONFIG_KASAN``
169+
170+
KASAN is a dynamic memory error detector that helps to find use-after-free and
171+
out-of-bounds bugs. It uses compile-time instrumentation to check every memory
172+
access.
173+
174+
For the full documentation see :doc:`/dev-tools/kasan`.
175+
176+
UBSAN (Undefined Behavior Sanitizer)
177+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
178+
179+
Prerequisite: ``CONFIG_UBSAN``
180+
181+
UBSAN relies on compiler instrumentation and runtime checks to detect undefined
182+
behavior. It is designed to find a variety of issues, including signed integer
183+
overflow, array index out of bounds, and more.
184+
185+
For the full documentation see :doc:`/dev-tools/ubsan`
186+
187+
lockdep (Lock Dependency Validator)
188+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189+
190+
Prerequisite: ``CONFIG_DEBUG_LOCKDEP``
191+
192+
lockdep is a runtime lock dependency validator that detects potential deadlocks
193+
and other locking-related issues in the kernel.
194+
It tracks lock acquisitions and releases, building a dependency graph that is
195+
analyzed for potential deadlocks.
196+
lockdep is especially useful for validating the correctness of lock ordering in
197+
the kernel.
198+
199+
PSI (Pressure stall information tracking)
200+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
201+
202+
Prerequisite: ``CONFIG_PSI``
203+
204+
PSI is a measurement tool to identify excessive overcommits on hardware
205+
resources, that can cause performance disruptions or even OOM kills.
206+
207+
device coredump
208+
---------------
209+
210+
Prerequisite: ``#include <linux/devcoredump.h>``
211+
212+
Provides the infrastructure for a driver to provide arbitrary data to userland.
213+
It is most often used in conjunction with udev or similar userland application
214+
to listen for kernel uevents, which indicate that the dump is ready. Udev has
215+
rules to copy that file somewhere for long-term storage and analysis, as by
216+
default, the data for the dump is automatically cleaned up after 5 minutes.
217+
That data is analyzed with driver-specific tools or GDB.
218+
219+
You can find an example implementation at:
220+
`drivers/media/platform/qcom/venus/core.c
221+
<https://elixir.bootlin.com/linux/v6.11.6/source/drivers/media/platform/qcom/venus/core.c#L30>`__
222+
223+
**Copyright** ©2024 : Collabora
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
============================================
4+
Debugging advice for Linux Kernel developers
5+
============================================
6+
7+
.. toctree::
8+
:maxdepth: 1
9+
10+
driver_development_debugging_guide
11+
userspace_debugging_guide
12+
13+
.. only:: subproject and html
14+
15+
Indices
16+
=======
17+
18+
* :ref:`genindex`
19+
20+
General debugging advice
21+
========================
22+
23+
Depending on the issue, a different set of tools is available to track down the
24+
problem or even to realize whether there is one in the first place.
25+
26+
As a first step you have to figure out what kind of issue you want to debug.
27+
Depending on the answer, your methodology and choice of tools may vary.
28+
29+
Do I need to debug with limited access?
30+
---------------------------------------
31+
32+
Do you have limited access to the machine or are you unable to stop the running
33+
execution?
34+
35+
In this case your debugging capability depends on built-in debugging support of
36+
provided distribution kernel.
37+
The :doc:`/process/debugging/userspace_debugging_guide` provides a brief
38+
overview over a range of possible debugging tools in that situation. You can
39+
check the capability of your kernel, in most cases, by looking into config file
40+
within the /boot directory.
41+
42+
Do I have root access to the system?
43+
------------------------------------
44+
45+
Are you easily able to replace the module in question or to install a new
46+
kernel?
47+
48+
In that case your range of available tools is a lot bigger, you can find the
49+
tools in the :doc:`/process/debugging/driver_development_debugging_guide`.
50+
51+
Is timing a factor?
52+
-------------------
53+
54+
It is important to understand if the problem you want to debug manifests itself
55+
consistently (i.e. given a set of inputs you always get the same, incorrect
56+
output), or inconsistently. If it manifests itself inconsistently, some timing
57+
factor might be at play. If inserting delays into the code does change the
58+
behavior, then quite likely timing is a factor.
59+
60+
When timing does alter the outcome of the code execution using a simple
61+
printk() for debugging purposes may not work, a similar alternative is to use
62+
trace_printk() , which logs the debug messages to the trace file instead of the
63+
kernel log.
64+
65+
**Copyright** ©2024 : Collabora

0 commit comments

Comments
 (0)