Skip to content

Commit d2a6fd4

Browse files
committed
Merge tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes updates from Masami Hiramatsu: - fprobe: Pass return address to the fprobe entry/exit callbacks so that the callbacks don't need to analyze pt_regs/stack to find the function return address. - kprobe events: cleanup usage of TPARG_FL_FENTRY and TPARG_FL_RETURN flags so that those are not set at once. - fprobe events: - Add a new fprobe events for tracing arbitrary function entry and exit as a trace event. - Add a new tracepoint events for tracing raw tracepoint as a trace event. This allows user to trace non user-exposed tracepoints. - Move eprobe's event parser code into probe event common file. - Introduce BTF (BPF type format) support to kernel probe (kprobe, fprobe and tracepoint probe) events so that user can specify traced function arguments by name. This also applies the type of argument when fetching the argument. - Introduce '$arg*' wildcard support if BTF is available. This expands the '$arg*' meta argument to all function argument automatically. - Check the return value types by BTF. If the function returns 'void', '$retval' is rejected. - Add some selftest script for fprobe events, tracepoint events and BTF support. - Update documentation about the fprobe events. - Some fixes for above features, document and selftests. - selftests for ftrace (in addition to the new fprobe events): - Add a test case for multiple consecutive probes in a function which checks if ftrace based kprobe, optimized kprobe and normal kprobe can be defined in the same target function. - Add a test case for optimized probe, which checks whether kprobe can be optimized or not. * tag 'probes-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Fix tracepoint event with $arg* to fetch correct argument Documentation: Fix typo of reference file name tracing/probes: Fix to return NULL and keep using current argc selftests/ftrace: Add new test case which checks for optimized probes selftests/ftrace: Add new test case which adds multiple consecutive probes in a function Documentation: tracing/probes: Add fprobe event tracing document selftests/ftrace: Add BTF arguments test cases selftests/ftrace: Add tracepoint probe test case tracing/probes: Add BTF retval type support tracing/probes: Add $arg* meta argument for all function args tracing/probes: Support function parameters if BTF is available tracing/probes: Move event parameter fetching code to common parser tracing/probes: Add tracepoint support on fprobe_events selftests/ftrace: Add fprobe related testcases tracing/probes: Add fprobe events for tracing function entry and exit. tracing/probes: Avoid setting TPARG_FL_FENTRY and TPARG_FL_RETURN fprobe: Pass return address to the handlers
2 parents cccf0c2 + 5343179 commit d2a6fd4

31 files changed

+2476
-164
lines changed

Documentation/trace/fprobetrace.rst

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==========================
4+
Fprobe-based Event Tracing
5+
==========================
6+
7+
.. Author: Masami Hiramatsu <mhiramat@kernel.org>
8+
9+
Overview
10+
--------
11+
12+
Fprobe event is similar to the kprobe event, but limited to probe on
13+
the function entry and exit only. It is good enough for many use cases
14+
which only traces some specific functions.
15+
16+
This document also covers tracepoint probe events (tprobe) since this
17+
is also works only on the tracepoint entry. User can trace a part of
18+
tracepoint argument, or the tracepoint without trace-event, which is
19+
not exposed on tracefs.
20+
21+
As same as other dynamic events, fprobe events and tracepoint probe
22+
events are defined via `dynamic_events` interface file on tracefs.
23+
24+
Synopsis of fprobe-events
25+
-------------------------
26+
::
27+
28+
f[:[GRP1/][EVENT1]] SYM [FETCHARGS] : Probe on function entry
29+
f[MAXACTIVE][:[GRP1/][EVENT1]] SYM%return [FETCHARGS] : Probe on function exit
30+
t[:[GRP2/][EVENT2]] TRACEPOINT [FETCHARGS] : Probe on tracepoint
31+
32+
GRP1 : Group name for fprobe. If omitted, use "fprobes" for it.
33+
GRP2 : Group name for tprobe. If omitted, use "tracepoints" for it.
34+
EVENT1 : Event name for fprobe. If omitted, the event name is
35+
"SYM__entry" or "SYM__exit".
36+
EVENT2 : Event name for tprobe. If omitted, the event name is
37+
the same as "TRACEPOINT", but if the "TRACEPOINT" starts
38+
with a digit character, "_TRACEPOINT" is used.
39+
MAXACTIVE : Maximum number of instances of the specified function that
40+
can be probed simultaneously, or 0 for the default value
41+
as defined in Documentation/trace/fprobe.rst
42+
43+
FETCHARGS : Arguments. Each probe can have up to 128 args.
44+
ARG : Fetch "ARG" function argument using BTF (only for function
45+
entry or tracepoint.) (\*1)
46+
@ADDR : Fetch memory at ADDR (ADDR should be in kernel)
47+
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
48+
$stackN : Fetch Nth entry of stack (N >= 0)
49+
$stack : Fetch stack address.
50+
$argN : Fetch the Nth function argument. (N >= 1) (\*2)
51+
$retval : Fetch return value.(\*3)
52+
$comm : Fetch current task comm.
53+
+|-[u]OFFS(FETCHARG) : Fetch memory at FETCHARG +|- OFFS address.(\*4)(\*5)
54+
\IMM : Store an immediate value to the argument.
55+
NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
56+
FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
57+
(u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
58+
(x8/x16/x32/x64), "char", "string", "ustring", "symbol", "symstr"
59+
and bitfield are supported.
60+
61+
(\*1) This is available only when BTF is enabled.
62+
(\*2) only for the probe on function entry (offs == 0).
63+
(\*3) only for return probe.
64+
(\*4) this is useful for fetching a field of data structures.
65+
(\*5) "u" means user-space dereference.
66+
67+
For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.
68+
69+
BTF arguments
70+
-------------
71+
BTF (BPF Type Format) argument allows user to trace function and tracepoint
72+
parameters by its name instead of ``$argN``. This feature is available if the
73+
kernel is configured with CONFIG_BPF_SYSCALL and CONFIG_DEBUG_INFO_BTF.
74+
If user only specify the BTF argument, the event's argument name is also
75+
automatically set by the given name. ::
76+
77+
# echo 'f:myprobe vfs_read count pos' >> dynamic_events
78+
# cat dynamic_events
79+
f:fprobes/myprobe vfs_read count=count pos=pos
80+
81+
It also chooses the fetch type from BTF information. For example, in the above
82+
example, the ``count`` is unsigned long, and the ``pos`` is a pointer. Thus, both
83+
are converted to 64bit unsigned long, but only ``pos`` has "%Lx" print-format as
84+
below ::
85+
86+
# cat events/fprobes/myprobe/format
87+
name: myprobe
88+
ID: 1313
89+
format:
90+
field:unsigned short common_type; offset:0; size:2; signed:0;
91+
field:unsigned char common_flags; offset:2; size:1; signed:0;
92+
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
93+
field:int common_pid; offset:4; size:4; signed:1;
94+
95+
field:unsigned long __probe_ip; offset:8; size:8; signed:0;
96+
field:u64 count; offset:16; size:8; signed:0;
97+
field:u64 pos; offset:24; size:8; signed:0;
98+
99+
print fmt: "(%lx) count=%Lu pos=0x%Lx", REC->__probe_ip, REC->count, REC->pos
100+
101+
If user unsures the name of arguments, ``$arg*`` will be helpful. The ``$arg*``
102+
is expanded to all function arguments of the function or the tracepoint. ::
103+
104+
# echo 'f:myprobe vfs_read $arg*' >> dynamic_events
105+
# cat dynamic_events
106+
f:fprobes/myprobe vfs_read file=file buf=buf count=count pos=pos
107+
108+
BTF also affects the ``$retval``. If user doesn't set any type, the retval type is
109+
automatically picked from the BTF. If the function returns ``void``, ``$retval``
110+
is rejected.
111+
112+
Usage examples
113+
--------------
114+
Here is an example to add fprobe events on ``vfs_read()`` function entry
115+
and exit, with BTF arguments.
116+
::
117+
118+
# echo 'f vfs_read $arg*' >> dynamic_events
119+
# echo 'f vfs_read%return $retval' >> dynamic_events
120+
# cat dynamic_events
121+
f:fprobes/vfs_read__entry vfs_read file=file buf=buf count=count pos=pos
122+
f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
123+
# echo 1 > events/fprobes/enable
124+
# head -n 20 trace | tail
125+
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
126+
# | | | ||||| | |
127+
sh-70 [000] ...1. 335.883195: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
128+
sh-70 [000] ..... 335.883208: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
129+
sh-70 [000] ...1. 335.883220: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
130+
sh-70 [000] ..... 335.883224: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
131+
sh-70 [000] ...1. 335.883232: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c687a count=1 pos=0xffffc900005aff08
132+
sh-70 [000] ..... 335.883237: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
133+
sh-70 [000] ...1. 336.050329: vfs_read__entry: (vfs_read+0x4/0x340) file=0xffff888005cf9a80 buf=0x7ffef36c6879 count=1 pos=0xffffc900005aff08
134+
sh-70 [000] ..... 336.050343: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=1
135+
136+
You can see all function arguments and return values are recorded as signed int.
137+
138+
Also, here is an example of tracepoint events on ``sched_switch`` tracepoint.
139+
To compare the result, this also enables the ``sched_switch`` traceevent too.
140+
::
141+
142+
# echo 't sched_switch $arg*' >> dynamic_events
143+
# echo 1 > events/sched/sched_switch/enable
144+
# echo 1 > events/tracepoints/sched_switch/enable
145+
# echo > trace
146+
# head -n 20 trace | tail
147+
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
148+
# | | | ||||| | |
149+
sh-70 [000] d..2. 3912.083993: sched_switch: prev_comm=sh prev_pid=70 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
150+
sh-70 [000] d..3. 3912.083995: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff88800664e100 next=0xffffffff828229c0 prev_state=1
151+
<idle>-0 [000] d..2. 3912.084183: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
152+
<idle>-0 [000] d..3. 3912.084184: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
153+
rcu_preempt-16 [000] d..2. 3912.084196: sched_switch: prev_comm=rcu_preempt prev_pid=16 prev_prio=120 prev_state=I ==> next_comm=swapper/0 next_pid=0 next_prio=120
154+
rcu_preempt-16 [000] d..3. 3912.084196: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffff888004208000 next=0xffffffff828229c0 prev_state=1026
155+
<idle>-0 [000] d..2. 3912.085191: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcu_preempt next_pid=16 next_prio=120
156+
<idle>-0 [000] d..3. 3912.085191: sched_switch: (__probestub_sched_switch+0x4/0x10) preempt=0 prev=0xffffffff828229c0 next=0xffff888004208000 prev_state=0
157+
158+
As you can see, the ``sched_switch`` trace-event shows *cooked* parameters, on
159+
the other hand, the ``sched_switch`` tracepoint probe event shows *raw*
160+
parameters. This means you can access any field values in the task
161+
structure pointed by the ``prev`` and ``next`` arguments.
162+
163+
For example, usually ``task_struct::start_time`` is not traced, but with this
164+
traceprobe event, you can trace it as below.
165+
::
166+
167+
# echo 't sched_switch comm=+1896(next):string start_time=+1728(next):u64' > dynamic_events
168+
# head -n 20 trace | tail
169+
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
170+
# | | | ||||| | |
171+
sh-70 [000] d..3. 5606.686577: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
172+
rcu_preempt-16 [000] d..3. 5606.686602: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="sh" usage=1 start_time=1596095526
173+
sh-70 [000] d..3. 5606.686637: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
174+
<idle>-0 [000] d..3. 5606.687190: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="rcu_preempt" usage=1 start_time=245000000
175+
rcu_preempt-16 [000] d..3. 5606.687202: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
176+
<idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
177+
kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
178+
<idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
179+
180+
Currently, to find the offset of a specific field in the data structure,
181+
you need to build kernel with debuginfo and run `perf probe` command with
182+
`-D` option. e.g.
183+
::
184+
185+
# perf probe -D "__probestub_sched_switch next->comm:string next->start_time"
186+
p:probe/__probestub_sched_switch __probestub_sched_switch+0 comm=+1896(%cx):string start_time=+1728(%cx):u64
187+
188+
And replace the ``%cx`` with the ``next``.

Documentation/trace/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Linux Tracing Technologies
1313
kprobes
1414
kprobetrace
1515
uprobetracer
16+
fprobetrace
1617
tracepoints
1718
events
1819
events-kmem

Documentation/trace/kprobetrace.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ Synopsis of kprobe_events
6666
(\*3) this is useful for fetching a field of data structures.
6767
(\*4) "u" means user-space dereference. See :ref:`user_mem_access`.
6868

69+
.. _kprobetrace_types:
70+
6971
Types
7072
-----
7173
Several types are supported for fetchargs. Kprobe tracer will access memory

include/linux/fprobe.h

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,11 @@ struct fprobe {
3535
int nr_maxactive;
3636

3737
int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip,
38-
struct pt_regs *regs, void *entry_data);
38+
unsigned long ret_ip, struct pt_regs *regs,
39+
void *entry_data);
3940
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
40-
struct pt_regs *regs, void *entry_data);
41+
unsigned long ret_ip, struct pt_regs *regs,
42+
void *entry_data);
4143
};
4244

4345
/* This fprobe is soft-disabled. */
@@ -64,6 +66,7 @@ int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter
6466
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
6567
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
6668
int unregister_fprobe(struct fprobe *fp);
69+
bool fprobe_is_registered(struct fprobe *fp);
6770
#else
6871
static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
6972
{
@@ -81,6 +84,10 @@ static inline int unregister_fprobe(struct fprobe *fp)
8184
{
8285
return -EOPNOTSUPP;
8386
}
87+
static inline bool fprobe_is_registered(struct fprobe *fp)
88+
{
89+
return false;
90+
}
8491
#endif
8592

8693
/**

include/linux/rethook.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
struct rethook_node;
1616

17-
typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
17+
typedef void (*rethook_handler_t) (struct rethook_node *, void *, unsigned long, struct pt_regs *);
1818

1919
/**
2020
* struct rethook - The rethook management data structure.

include/linux/trace_events.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -318,6 +318,7 @@ enum {
318318
TRACE_EVENT_FL_KPROBE_BIT,
319319
TRACE_EVENT_FL_UPROBE_BIT,
320320
TRACE_EVENT_FL_EPROBE_BIT,
321+
TRACE_EVENT_FL_FPROBE_BIT,
321322
TRACE_EVENT_FL_CUSTOM_BIT,
322323
};
323324

@@ -332,6 +333,7 @@ enum {
332333
* KPROBE - Event is a kprobe
333334
* UPROBE - Event is a uprobe
334335
* EPROBE - Event is an event probe
336+
* FPROBE - Event is an function probe
335337
* CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint)
336338
* This is set when the custom event has not been attached
337339
* to a tracepoint yet, then it is cleared when it is.
@@ -346,6 +348,7 @@ enum {
346348
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
347349
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
348350
TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT),
351+
TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT),
349352
TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT),
350353
};
351354

include/linux/tracepoint-defs.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ struct tracepoint {
3535
struct static_call_key *static_call_key;
3636
void *static_call_tramp;
3737
void *iterator;
38+
void *probestub;
3839
int (*regfunc)(void);
3940
void (*unregfunc)(void);
4041
struct tracepoint_func __rcu *funcs;

include/linux/tracepoint.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -303,13 +303,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
303303
__section("__tracepoints_strings") = #_name; \
304304
extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name); \
305305
int __traceiter_##_name(void *__data, proto); \
306+
void __probestub_##_name(void *__data, proto); \
306307
struct tracepoint __tracepoint_##_name __used \
307308
__section("__tracepoints") = { \
308309
.name = __tpstrtab_##_name, \
309310
.key = STATIC_KEY_INIT_FALSE, \
310311
.static_call_key = &STATIC_CALL_KEY(tp_func_##_name), \
311312
.static_call_tramp = STATIC_CALL_TRAMP_ADDR(tp_func_##_name), \
312313
.iterator = &__traceiter_##_name, \
314+
.probestub = &__probestub_##_name, \
313315
.regfunc = _reg, \
314316
.unregfunc = _unreg, \
315317
.funcs = NULL }; \
@@ -330,6 +332,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
330332
} \
331333
return 0; \
332334
} \
335+
void __probestub_##_name(void *__data, proto) \
336+
{ \
337+
} \
333338
DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);
334339

335340
#define DEFINE_TRACE(name, proto, args) \

kernel/kprobes.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2127,6 +2127,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
21272127
NOKPROBE_SYMBOL(pre_handler_kretprobe);
21282128

21292129
static void kretprobe_rethook_handler(struct rethook_node *rh, void *data,
2130+
unsigned long ret_addr,
21302131
struct pt_regs *regs)
21312132
{
21322133
struct kretprobe *rp = (struct kretprobe *)data;

kernel/trace/Kconfig

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -665,6 +665,32 @@ config BLK_DEV_IO_TRACE
665665

666666
If unsure, say N.
667667

668+
config FPROBE_EVENTS
669+
depends on FPROBE
670+
depends on HAVE_REGS_AND_STACK_ACCESS_API
671+
bool "Enable fprobe-based dynamic events"
672+
select TRACING
673+
select PROBE_EVENTS
674+
select DYNAMIC_EVENTS
675+
default y
676+
help
677+
This allows user to add tracing events on the function entry and
678+
exit via ftrace interface. The syntax is same as the kprobe events
679+
and the kprobe events on function entry and exit will be
680+
transparently converted to this fprobe events.
681+
682+
config PROBE_EVENTS_BTF_ARGS
683+
depends on HAVE_FUNCTION_ARG_ACCESS_API
684+
depends on FPROBE_EVENTS || KPROBE_EVENTS
685+
depends on DEBUG_INFO_BTF && BPF_SYSCALL
686+
bool "Support BTF function arguments for probe events"
687+
default y
688+
help
689+
The user can specify the arguments of the probe event using the names
690+
of the arguments of the probed function, when the probe location is a
691+
kernel function entry or a tracepoint.
692+
This is available only if BTF (BPF Type Format) support is enabled.
693+
668694
config KPROBE_EVENTS
669695
depends on KPROBES
670696
depends on HAVE_REGS_AND_STACK_ACCESS_API

0 commit comments

Comments
 (0)