Skip to content

Commit 1bbeaf8

Browse files
committed
Merge tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim: "perf stat: - Support new 'cluster' aggregation mode for shared resources depending on the hardware configuration: $ sudo perf stat -a --per-cluster -e cycles,instructions sleep 1 Performance counter stats for 'system wide': S0-D0-CLS0 2 85,051,822 cycles S0-D0-CLS0 2 73,909,908 instructions # 0.87 insn per cycle S0-D0-CLS2 2 93,365,918 cycles S0-D0-CLS2 2 83,006,158 instructions # 0.89 insn per cycle S0-D0-CLS4 2 104,157,523 cycles S0-D0-CLS4 2 53,234,396 instructions # 0.51 insn per cycle S0-D0-CLS6 2 65,891,079 cycles S0-D0-CLS6 2 41,478,273 instructions # 0.63 insn per cycle 1.002407989 seconds time elapsed - Various fixes and cleanups for event metrics including NaN handling perf script: - Use libcapstone if available to disassemble the instructions. This enables 'perf script -F disasm' and 'perf script --insn-trace=disasm' (for Intel-PT): $ perf script -F event,ip,disasm cycles:P: ffffffffa988d428 wrmsr cycles:P: ffffffffa9839d25 movq %rax, %r14 cycles:P: ffffffffa9cdcaf0 endbr64 cycles:P: ffffffffa988d428 wrmsr cycles:P: ffffffffa988d428 wrmsr cycles:P: ffffffffaa401f86 iretq cycles:P: ffffffffa99c4de5 movq 0x30(%rcx), %r8 cycles:P: ffffffffa988d428 wrmsr cycles:P: ffffffffaa401f86 iretq cycles:P: ffffffffa9907983 movl 0x68(%rbx), %eax cycles:P: ffffffffa988d428 wrmsr - Expose sample ID / stream ID to python scripts perf test: - Add more perf test cases from Redhat internal test suites. This time it adds the base infra and a few perf probe tests. More to come. :) - Add 'perf test -p' for parallel execution and fix some issues found by the parallel test - Support symbol test to print symbols in given (active) module: $ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/fs/ext4/ext4.ko --- start --- Testing /lib/modules/6.5.13-1rodete2-amd64/kernel/fs/ext4/ext4.ko Overlapping symbols: 7a990-7a9a0 l __pfx_ext4_exit_fs 7a990-7a9a0 g __pfx_cleanup_module Overlapping symbols: 7a9a0-7aa1c l ext4_exit_fs 7a9a0-7aa1c g cleanup_module ... JSON metric updates: - A new round of Intel metric updates - Support Power11 PVR (compatible to Power10) - Fix cache latency events on Zen 4 to set SliceId properly Internal: - Fix reference counting for 'map' data structure, tireless work from Ian! - More memory optimization for struct thread and annotate histogram. Now, 'perf report' (TUI) and 'perf annotate' should be much lighter-weight in terms of memory footprint - Support cross-arch perf register access. Clean up the build configuration so that it can detect arch-register support at runtime. This can allow to parse register data in sample which was recorded in a different arch Others: - Sync task state in 'perf sched' to kernel using trace event fields. The task states have been changed so tools cannot assume a fixed encoding - Clean up 'perf mem' to generalize the arch-specific events - Add support for local and global variables to data type profiling. This would increase the success rate of type resolution with DWARF - Add short option -H for --hierarchy in 'perf report' and 'perf top'" * tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (154 commits) perf annotate: Add comments in the data structures perf annotate: Remove sym_hist.addr[] array perf annotate: Calculate instruction overhead using hashmap perf annotate: Add a hashmap for symbol histogram perf threads: Reduce table size from 256 to 8 perf threads: Switch from rbtree to hashmap perf threads: Move threads to its own files perf machine: Move machine's threads into its own abstraction perf machine: Move fprintf to for_each loop and a callback perf trace: Ignore thread hashing in summary perf report: Sort child tasks by tid perf vendor events amd: Fix Zen 4 cache latency events perf version: Display availability of OpenCSD support perf vendor events intel: Add umasks/occ_sel to PCU events. perf map: Fix map reference count issues libperf evlist: Avoid out-of-bounds access perf lock contention: Account contending locks too perf metrics: Fix segv for metrics with no events perf metrics: Fix metric matching perf pmu: Fix a potential memory leak in perf_pmu__lookup() ...
2 parents 63bd30f + 0f66dfe commit 1bbeaf8

File tree

282 files changed

+21268
-4813
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

282 files changed

+21268
-4813
lines changed

tools/build/Makefile.feature

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ FEATURE_TESTS_EXTRA := \
8787
gtk2-infobar \
8888
hello \
8989
libbabeltrace \
90+
libcapstone \
9091
libbfd-liberty \
9192
libbfd-liberty-z \
9293
libopencsd \
@@ -134,6 +135,7 @@ FEATURE_DISPLAY ?= \
134135
libcrypto \
135136
libunwind \
136137
libdw-dwarf-unwind \
138+
libcapstone \
137139
zlib \
138140
lzma \
139141
get_cpuid \

tools/build/feature/Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ FILES= \
5454
test-timerfd.bin \
5555
test-libdw-dwarf-unwind.bin \
5656
test-libbabeltrace.bin \
57+
test-libcapstone.bin \
5758
test-compile-32.bin \
5859
test-compile-x32.bin \
5960
test-zlib.bin \
@@ -286,6 +287,9 @@ $(OUTPUT)test-libdw-dwarf-unwind.bin:
286287
$(OUTPUT)test-libbabeltrace.bin:
287288
$(BUILD) # -lbabeltrace provided by $(FEATURE_CHECK_LDFLAGS-libbabeltrace)
288289

290+
$(OUTPUT)test-libcapstone.bin:
291+
$(BUILD) # -lcapstone provided by $(FEATURE_CHECK_LDFLAGS-libcapstone)
292+
289293
$(OUTPUT)test-compile-32.bin:
290294
$(CC) -m32 -o $@ test-compile.c
291295

tools/build/feature/test-all.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,10 @@
134134
#undef main
135135
#endif
136136

137+
#define main main_test_libcapstone
138+
# include "test-libcapstone.c"
139+
#undef main
140+
137141
#define main main_test_lzma
138142
# include "test-lzma.c"
139143
#undef main
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
3+
#include <capstone/capstone.h>
4+
5+
int main(void)
6+
{
7+
csh handle;
8+
9+
cs_open(CS_ARCH_X86, CS_MODE_64, &handle);
10+
return 0;
11+
}

tools/lib/perf/evlist.c

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -248,10 +248,10 @@ u64 perf_evlist__read_format(struct perf_evlist *evlist)
248248

249249
static void perf_evlist__id_hash(struct perf_evlist *evlist,
250250
struct perf_evsel *evsel,
251-
int cpu, int thread, u64 id)
251+
int cpu_map_idx, int thread, u64 id)
252252
{
253253
int hash;
254-
struct perf_sample_id *sid = SID(evsel, cpu, thread);
254+
struct perf_sample_id *sid = SID(evsel, cpu_map_idx, thread);
255255

256256
sid->id = id;
257257
sid->evsel = evsel;
@@ -269,21 +269,27 @@ void perf_evlist__reset_id_hash(struct perf_evlist *evlist)
269269

270270
void perf_evlist__id_add(struct perf_evlist *evlist,
271271
struct perf_evsel *evsel,
272-
int cpu, int thread, u64 id)
272+
int cpu_map_idx, int thread, u64 id)
273273
{
274-
perf_evlist__id_hash(evlist, evsel, cpu, thread, id);
274+
if (!SID(evsel, cpu_map_idx, thread))
275+
return;
276+
277+
perf_evlist__id_hash(evlist, evsel, cpu_map_idx, thread, id);
275278
evsel->id[evsel->ids++] = id;
276279
}
277280

278281
int perf_evlist__id_add_fd(struct perf_evlist *evlist,
279282
struct perf_evsel *evsel,
280-
int cpu, int thread, int fd)
283+
int cpu_map_idx, int thread, int fd)
281284
{
282285
u64 read_data[4] = { 0, };
283286
int id_idx = 1; /* The first entry is the counter value */
284287
u64 id;
285288
int ret;
286289

290+
if (!SID(evsel, cpu_map_idx, thread))
291+
return -1;
292+
287293
ret = ioctl(fd, PERF_EVENT_IOC_ID, &id);
288294
if (!ret)
289295
goto add;
@@ -312,7 +318,7 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
312318
id = read_data[id_idx];
313319

314320
add:
315-
perf_evlist__id_add(evlist, evsel, cpu, thread, id);
321+
perf_evlist__id_add(evlist, evsel, cpu_map_idx, thread, id);
316322
return 0;
317323
}
318324

tools/lib/perf/include/internal/evlist.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,11 @@ u64 perf_evlist__read_format(struct perf_evlist *evlist);
126126

127127
void perf_evlist__id_add(struct perf_evlist *evlist,
128128
struct perf_evsel *evsel,
129-
int cpu, int thread, u64 id);
129+
int cpu_map_idx, int thread, u64 id);
130130

131131
int perf_evlist__id_add_fd(struct perf_evlist *evlist,
132132
struct perf_evsel *evsel,
133-
int cpu, int thread, int fd);
133+
int cpu_map_idx, int thread, int fd);
134134

135135
void perf_evlist__reset_id_hash(struct perf_evlist *evlist);
136136

tools/lib/subcmd/run-command.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,8 @@ int start_command(struct child_process *cmd)
122122
}
123123
if (cmd->preexec_cb)
124124
cmd->preexec_cb();
125+
if (cmd->no_exec_cmd)
126+
exit(cmd->no_exec_cmd(cmd));
125127
if (cmd->exec_cmd) {
126128
execv_cmd(cmd->argv);
127129
} else {

tools/lib/subcmd/run-command.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ struct child_process {
4747
unsigned exec_cmd:1; /* if this is to be external sub-command */
4848
unsigned stdout_to_stderr:1;
4949
void (*preexec_cb)(void);
50+
/* If set, call function in child rather than doing an exec. */
51+
int (*no_exec_cmd)(struct child_process *process);
5052
};
5153

5254
int start_command(struct child_process *);

tools/perf/Documentation/perf-intel-pt.txt

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -115,9 +115,13 @@ toggle respectively.
115115

116116
perf script also supports higher level ways to dump instruction traces:
117117

118+
perf script --insn-trace=disasm
119+
120+
or to use the xed disassembler, which requires installing the xed tool
121+
(see XED below):
122+
118123
perf script --insn-trace --xed
119124

120-
Dump all instructions. This requires installing the xed tool (see XED below)
121125
Dumping all instructions in a long trace can be fairly slow. It is usually better
122126
to start with higher level decoding, like
123127

@@ -130,12 +134,12 @@ or
130134
and then select a time range of interest. The time range can then be examined
131135
in detail with
132136

133-
perf script --time starttime,stoptime --insn-trace --xed
137+
perf script --time starttime,stoptime --insn-trace=disasm
134138

135139
While examining the trace it's also useful to filter on specific CPUs using
136140
the -C option
137141

138-
perf script --time starttime,stoptime --insn-trace --xed -C 1
142+
perf script --time starttime,stoptime --insn-trace=disasm -C 1
139143

140144
Dump all instructions in time range on CPU 1.
141145

@@ -1306,7 +1310,7 @@ Without timestamps, --per-thread must be specified to distinguish threads.
13061310

13071311
perf script can be used to provide an instruction trace
13081312

1309-
$ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
1313+
$ perf script --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21
13101314
CPU 0/KVM 1440 ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9
13111315
CPU 0/KVM 1440 ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10
13121316
CPU 0/KVM 1440 ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11
@@ -1407,7 +1411,7 @@ There were none.
14071411

14081412
'perf script' can be used to provide an instruction trace showing timestamps
14091413

1410-
$ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
1414+
$ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace=disasm -F+ipc | grep -C10 vmresume | head -21
14111415
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133cdd __vmx_vcpu_run+0x3d ([kernel.kallsyms]) movq 0x48(%rax), %r9
14121416
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce1 __vmx_vcpu_run+0x41 ([kernel.kallsyms]) movq 0x50(%rax), %r10
14131417
CPU 1/KVM 17006 [001] 11500.262865593: ffffffff82133ce5 __vmx_vcpu_run+0x45 ([kernel.kallsyms]) movq 0x58(%rax), %r11

tools/perf/Documentation/perf-report.txt

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -531,8 +531,35 @@ include::itrace.txt[]
531531
--raw-trace::
532532
When displaying traceevent output, do not use print fmt or plugins.
533533

534+
-H::
534535
--hierarchy::
535-
Enable hierarchical output.
536+
Enable hierarchical output. In the hierarchy mode, each sort key groups
537+
samples based on the criteria and then sub-divide it using the lower
538+
level sort key.
539+
540+
For example:
541+
In normal output:
542+
543+
perf report -s dso,sym
544+
# Overhead Shared Object Symbol
545+
50.00% [kernel.kallsyms] [k] kfunc1
546+
20.00% perf [.] foo
547+
15.00% [kernel.kallsyms] [k] kfunc2
548+
10.00% perf [.] bar
549+
5.00% libc.so [.] libcall
550+
551+
In hierarchy output:
552+
553+
perf report -s dso,sym --hierarchy
554+
# Overhead Shared Object / Symbol
555+
65.00% [kernel.kallsyms]
556+
50.00% [k] kfunc1
557+
15.00% [k] kfunc2
558+
30.00% perf
559+
20.00% [.] foo
560+
10.00% [.] bar
561+
5.00% libc.so
562+
5.00% [.] libcall
536563

537564
--inline::
538565
If a callgraph address belongs to an inlined function, the inline stack

0 commit comments

Comments
 (0)