Skip to content

Commit 7ab8941

Browse files
committed
Merge tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim: "Build: - Compile BPF programs by default if clang (>= 12.0.1) is available to enable more features like kernel lock contention, off-cpu profiling, kwork, sample filtering and so on. This can be disabled by passing BUILD_BPF_SKEL=0 to make. - Produce better error messages for bison on debug build (make DEBUG=1) by defining YYDEBUG symbol internally. perf record: - Track sideband events (like FORK/MMAP) from all CPUs even if perf record targets a subset of CPUs only (using -C option). Otherwise it may lose some information happened on a CPU out of the target list. - Fix checking raw sched_switch tracepoint argument using system BTF. This affects off-cpu profiling which attaches a BPF program to the raw tracepoint. perf lock contention: - Add --lock-cgroup option to see contention by cgroups. This should be used with BPF only (using -b option). $ sudo perf lock con -ab --lock-cgroup -- sleep 1 contended total wait max wait avg wait cgroup 835 14.06 ms 41.19 us 16.83 us /system.slice/led.service 25 122.38 us 13.77 us 4.89 us / 44 23.73 us 3.87 us 539 ns /user.slice/user-657345.slice/session-c4.scope 1 491 ns 491 ns 491 ns /system.slice/connectd.service - Add -G/--cgroup-filter option to see contention only for given cgroups. This can be useful when you identified a cgroup in the above command and want to investigate more on it. It also works with other output options like -t/--threads and -l/--lock-addr. $ sudo perf lock con -ab -G /user.slice/user-657345.slice/session-c4.scope -- sleep 1 contended total wait max wait avg wait type caller 8 77.11 us 17.98 us 9.64 us spinlock futex_wake+0xc8 2 24.56 us 14.66 us 12.28 us spinlock tick_do_update_jiffies64+0x25 1 4.97 us 4.97 us 4.97 us spinlock futex_q_lock+0x2a - Use per-cpu array for better spinlock tracking. This is to improve performance of the BPF program and to avoid nested contention on a lock in the BPF hash map. - Update callstack check for PowerPC. To find a representative caller of a lock, it needs to look up the call stacks. It ends the lookup when it sees 0 in the call stack buffer. However, PowerPC call stacks can have 0 values in the beginning so skip them when it expects valid call stacks after. perf kwork: - Support 'sched' class (for -k option) so that it can see task scheduling event (using sched_switch tracepoint) as well as irq and workqueue items. - Add perf kwork top subcommand to show more accurate cpu utilization with sched class above. It works both with a recorded data (using perf kwork record command) and BPF (using -b option). Unlike perf top command, it does not support interactive mode (yet). $ sudo perf kwork top -b -k sched Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 160702.425 ms, 8 cpus %Cpu(s): 36.00% id, 0.00% hi, 0.00% si %Cpu0 [|||||||||||||||||| 61.66%] %Cpu1 [|||||||||||||||||| 61.27%] %Cpu2 [||||||||||||||||||| 66.40%] %Cpu3 [|||||||||||||||||| 61.28%] %Cpu4 [|||||||||||||||||| 61.82%] %Cpu5 [||||||||||||||||||||||| 77.41%] %Cpu6 [|||||||||||||||||| 61.73%] %Cpu7 [|||||||||||||||||| 63.25%] PID SPID %CPU RUNTIME COMMMAND ------------------------------------------------------------- 0 0 38.72 8089.463 ms [swapper/1] 0 0 38.71 8084.547 ms [swapper/3] 0 0 38.33 8007.532 ms [swapper/0] 0 0 38.26 7992.985 ms [swapper/6] 0 0 38.17 7971.865 ms [swapper/4] 0 0 36.74 7447.765 ms [swapper/7] 0 0 33.59 6486.942 ms [swapper/2] 0 0 22.58 3771.268 ms [swapper/5] 9545 9351 2.48 447.136 ms sched-messaging 9574 9351 2.09 418.583 ms sched-messaging 9724 9351 2.05 372.407 ms sched-messaging 9531 9351 2.01 368.804 ms sched-messaging 9512 9351 2.00 362.250 ms sched-messaging 9514 9351 1.95 357.767 ms sched-messaging 9538 9351 1.86 384.476 ms sched-messaging 9712 9351 1.84 386.490 ms sched-messaging 9723 9351 1.83 380.021 ms sched-messaging 9722 9351 1.82 382.738 ms sched-messaging 9517 9351 1.81 354.794 ms sched-messaging 9559 9351 1.79 344.305 ms sched-messaging 9725 9351 1.77 365.315 ms sched-messaging <SNIP> - Add hard/soft-irq statistics to perf kwork top. This will show the total CPU utilization with IRQ stats like below: $ sudo perf kwork top -b -k sched,irq,softirq Starting trace, Hit <Ctrl+C> to stop and report ^C Total : 12554.889 ms, 8 cpus %Cpu(s): 96.23% id, 0.10% hi, 0.19% si <---- here %Cpu0 [| 4.60%] %Cpu1 [| 4.59%] %Cpu2 [ 2.73%] %Cpu3 [| 3.81%] <SNIP> perf bench: - Add -G/--cgroups option to perf bench sched pipe. The pipe bench is good to measure context switch overhead. With this option, it puts the reader and writer tasks in separate cgroups to enforce context switch between two different cgroups. Also it needs to set CPU affinity of the tasks in a CPU to accurately measure the impact of cgroup context switches. $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.307 [sec] 3.078180 usecs/op 324867 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000': 200,026 context-switches 63 cgroup-switches 0.321637922 seconds time elapsed You can see small number of cgroup-switches because both write and read tasks are in the same cgroup. $ sudo mkdir /sys/fs/cgroup/{AAA,BBB} $ sudo perf stat -e context-switches,cgroup-switches -- \ > taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB # Running 'sched/pipe' benchmark: # Executed 100000 pipe operations between two processes Total time: 0.351 [sec] 3.512990 usecs/op 284657 ops/sec Performance counter stats for 'taskset -c 0 perf bench sched pipe -l 100000 -G AAA,BBB': 200,020 context-switches 200,019 cgroup-switches 0.365034567 seconds time elapsed Now context-switches and cgroup-switches are almost same. And you can see the pipe operation took little more. - Kill child processes when perf bench sched messaging exited abnormally. Otherwise it'd leave the child doing unnecessary work. perf test: - Fix various shellcheck issues on the tests written in shell script. - Skip tests when condition is not satisfied: - object code reading test for non-text section addresses. - CoreSight test if cs_etm// event is not available. - lock contention test if not enough CPUs. Event parsing: - Make PMU alias name loading lazy to reduce the startup time in the event parsing code for perf record, stat and others in the general case. - Lazily compute PMU default config. In the same sense, delay PMU initialization until it's really needed to reduce the startup cost. - Fix event term values that are raw events. The event specification can have several terms including event name. But sometimes it clashes with raw event encoding which starts with 'r' and has hex-digits. For example, an event named 'read' should be processed as a normal event but it was mis-treated as a raw encoding and caused a failure. $ perf stat -e 'uncore_imc_free_running/event=read/' -a sleep 1 event syntax error: '..nning/event=read/' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events Event metrics: - Add "Compat" regex to match event with multiple identifiers. - Usual updates for Intel, Power10, Arm telemetry/CMN and AmpereOne. Misc: - Assorted memory leak fixes and footprint reduction. - Add "bpf_skeletons" to perf version --build-options so that users can check whether their perf tools have BPF support easily. - Fix unaligned access in Intel-PT packet decoder found by undefined-behavior sanitizer. - Avoid frequency mode for the dummy event. Surprisingly it'd impact kernel timer tick handler performance by force iterating all PMU events. - Update bash shell completion for events and metrics" * tag 'perf-tools-for-v6.7-1-2023-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (187 commits) perf vendor events intel: Update tsx_cycles_per_elision metrics perf vendor events intel: Update bonnell version number to v5 perf vendor events intel: Update westmereex events to v4 perf vendor events intel: Update meteorlake events to v1.06 perf vendor events intel: Update knightslanding events to v16 perf vendor events intel: Add typo fix for ivybridge FP perf vendor events intel: Update a spelling in haswell/haswellx perf vendor events intel: Update emeraldrapids to v1.01 perf vendor events intel: Update alderlake/alderlake events to v1.23 perf build: Disable BPF skeletons if clang version is < 12.0.1 perf callchain: Fix spelling mistake "statisitcs" -> "statistics" perf report: Fix spelling mistake "heirachy" -> "hierarchy" perf python: Fix binding linkage due to rename and move of evsel__increase_rlimit() perf tests: test_arm_coresight: Simplify source iteration perf vendor events intel: Add tigerlake two metrics perf vendor events intel: Add broadwellde two metrics perf vendor events intel: Fix broadwellde tma_info_system_dram_bw_use metric perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit perf callchain: Minor layout changes to callchain_list perf callchain: Make brtype_stat in callchain_list optional ...
2 parents 31e5f93 + fed3a1b commit 7ab8941

File tree

251 files changed

+31705
-1821
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

251 files changed

+31705
-1821
lines changed

scripts/clang-tools/gen_compile_commands.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
_DEFAULT_LOG_LEVEL = 'WARNING'
2020

2121
_FILENAME_PATTERN = r'^\..*\.cmd$'
22-
_LINE_PATTERN = r'^savedcmd_[^ ]*\.o := (.* )([^ ]*\.[cS]) *(;|$)'
22+
_LINE_PATTERN = r'^(saved)?cmd_[^ ]*\.o := (?P<command_prefix>.* )(?P<file_path>[^ ]*\.[cS]) *(;|$)'
2323
_VALID_LOG_LEVELS = ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
2424
# The tools/ directory adopts a different build system, and produces .cmd
2525
# files in a different format. Do not support it.
@@ -213,15 +213,15 @@ def main():
213213
result = line_matcher.match(f.readline())
214214
if result:
215215
try:
216-
entry = process_line(directory, result.group(1),
217-
result.group(2))
216+
entry = process_line(directory, result.group('command_prefix'),
217+
result.group('file_path'))
218218
compile_commands.append(entry)
219219
except ValueError as err:
220220
logging.info('Could not add line from %s: %s',
221221
cmdfile, err)
222222

223223
with open(output, 'wt') as f:
224-
json.dump(compile_commands, f, indent=2, sort_keys=True)
224+
json.dump(sorted(compile_commands, key=lambda x: x["file"]), f, indent=2, sort_keys=True)
225225

226226

227227
if __name__ == '__main__':

scripts/clang-tools/run-clang-tools.py

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ def parse_arguments():
3333
path_help = "Path to the compilation database to parse"
3434
parser.add_argument("path", type=str, help=path_help)
3535

36+
checks_help = "Checks to pass to the analysis"
37+
parser.add_argument("-checks", type=str, default=None, help=checks_help)
38+
header_filter_help = "Pass the -header-filter value to the tool"
39+
parser.add_argument("-header-filter", type=str, default=None, help=header_filter_help)
40+
3641
return parser.parse_args()
3742

3843

@@ -45,14 +50,27 @@ def init(l, a):
4550

4651
def run_analysis(entry):
4752
# Disable all checks, then re-enable the ones we want
48-
checks = []
49-
checks.append("-checks=-*")
50-
if args.type == "clang-tidy":
51-
checks.append("linuxkernel-*")
53+
global args
54+
checks = None
55+
if args.checks:
56+
checks = args.checks.split(',')
5257
else:
53-
checks.append("clang-analyzer-*")
54-
checks.append("-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling")
55-
p = subprocess.run(["clang-tidy", "-p", args.path, ",".join(checks), entry["file"]],
58+
checks = ["-*"]
59+
if args.type == "clang-tidy":
60+
checks.append("linuxkernel-*")
61+
else:
62+
checks.append("clang-analyzer-*")
63+
checks.append("-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling")
64+
file = entry["file"]
65+
if not file.endswith(".c") and not file.endswith(".cpp"):
66+
with lock:
67+
print(f"Skipping non-C file: '{file}'", file=sys.stderr)
68+
return
69+
pargs = ["clang-tidy", "-p", args.path, "-checks=" + ",".join(checks)]
70+
if args.header_filter:
71+
pargs.append("-header-filter=" + args.header_filter)
72+
pargs.append(file)
73+
p = subprocess.run(pargs,
5674
stdout=subprocess.PIPE,
5775
stderr=subprocess.STDOUT,
5876
cwd=entry["directory"])

tools/build/Makefile.build

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,15 @@ else
2020
Q=@
2121
endif
2222

23-
ifneq ($(findstring s,$(filter-out --%,$(MAKEFLAGS))),)
23+
# If the user is running make -s (silent mode), suppress echoing of commands
24+
# make-4.0 (and later) keep single letter options in the 1st word of MAKEFLAGS.
25+
ifeq ($(filter 3.%,$(MAKE_VERSION)),)
26+
short-opts := $(firstword -$(MAKEFLAGS))
27+
else
28+
short-opts := $(filter-out --%,$(MAKEFLAGS))
29+
endif
30+
31+
ifneq ($(findstring s,$(short-opts)),)
2432
quiet=silent_
2533
endif
2634

tools/include/asm-generic/unaligned.h

Lines changed: 139 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
1-
/* SPDX-License-Identifier: GPL-2.0-or-later */
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef __ASM_GENERIC_UNALIGNED_H
3+
#define __ASM_GENERIC_UNALIGNED_H
4+
25
/*
3-
* Copied from the kernel sources to tools/perf/:
6+
* This is the most generic implementation of unaligned accesses
7+
* and should work almost anywhere.
48
*/
5-
6-
#ifndef __TOOLS_LINUX_ASM_GENERIC_UNALIGNED_H
7-
#define __TOOLS_LINUX_ASM_GENERIC_UNALIGNED_H
9+
#pragma GCC diagnostic push
10+
#pragma GCC diagnostic ignored "-Wpacked"
811

912
#define __get_unaligned_t(type, ptr) ({ \
1013
const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
@@ -19,5 +22,135 @@
1922
#define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr))
2023
#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))
2124

22-
#endif /* __TOOLS_LINUX_ASM_GENERIC_UNALIGNED_H */
25+
static inline u16 get_unaligned_le16(const void *p)
26+
{
27+
return le16_to_cpu(__get_unaligned_t(__le16, p));
28+
}
29+
30+
static inline u32 get_unaligned_le32(const void *p)
31+
{
32+
return le32_to_cpu(__get_unaligned_t(__le32, p));
33+
}
34+
35+
static inline u64 get_unaligned_le64(const void *p)
36+
{
37+
return le64_to_cpu(__get_unaligned_t(__le64, p));
38+
}
39+
40+
static inline void put_unaligned_le16(u16 val, void *p)
41+
{
42+
__put_unaligned_t(__le16, cpu_to_le16(val), p);
43+
}
44+
45+
static inline void put_unaligned_le32(u32 val, void *p)
46+
{
47+
__put_unaligned_t(__le32, cpu_to_le32(val), p);
48+
}
49+
50+
static inline void put_unaligned_le64(u64 val, void *p)
51+
{
52+
__put_unaligned_t(__le64, cpu_to_le64(val), p);
53+
}
54+
55+
static inline u16 get_unaligned_be16(const void *p)
56+
{
57+
return be16_to_cpu(__get_unaligned_t(__be16, p));
58+
}
59+
60+
static inline u32 get_unaligned_be32(const void *p)
61+
{
62+
return be32_to_cpu(__get_unaligned_t(__be32, p));
63+
}
64+
65+
static inline u64 get_unaligned_be64(const void *p)
66+
{
67+
return be64_to_cpu(__get_unaligned_t(__be64, p));
68+
}
69+
70+
static inline void put_unaligned_be16(u16 val, void *p)
71+
{
72+
__put_unaligned_t(__be16, cpu_to_be16(val), p);
73+
}
74+
75+
static inline void put_unaligned_be32(u32 val, void *p)
76+
{
77+
__put_unaligned_t(__be32, cpu_to_be32(val), p);
78+
}
79+
80+
static inline void put_unaligned_be64(u64 val, void *p)
81+
{
82+
__put_unaligned_t(__be64, cpu_to_be64(val), p);
83+
}
84+
85+
static inline u32 __get_unaligned_be24(const u8 *p)
86+
{
87+
return p[0] << 16 | p[1] << 8 | p[2];
88+
}
89+
90+
static inline u32 get_unaligned_be24(const void *p)
91+
{
92+
return __get_unaligned_be24(p);
93+
}
94+
95+
static inline u32 __get_unaligned_le24(const u8 *p)
96+
{
97+
return p[0] | p[1] << 8 | p[2] << 16;
98+
}
99+
100+
static inline u32 get_unaligned_le24(const void *p)
101+
{
102+
return __get_unaligned_le24(p);
103+
}
104+
105+
static inline void __put_unaligned_be24(const u32 val, u8 *p)
106+
{
107+
*p++ = val >> 16;
108+
*p++ = val >> 8;
109+
*p++ = val;
110+
}
111+
112+
static inline void put_unaligned_be24(const u32 val, void *p)
113+
{
114+
__put_unaligned_be24(val, p);
115+
}
116+
117+
static inline void __put_unaligned_le24(const u32 val, u8 *p)
118+
{
119+
*p++ = val;
120+
*p++ = val >> 8;
121+
*p++ = val >> 16;
122+
}
123+
124+
static inline void put_unaligned_le24(const u32 val, void *p)
125+
{
126+
__put_unaligned_le24(val, p);
127+
}
128+
129+
static inline void __put_unaligned_be48(const u64 val, u8 *p)
130+
{
131+
*p++ = val >> 40;
132+
*p++ = val >> 32;
133+
*p++ = val >> 24;
134+
*p++ = val >> 16;
135+
*p++ = val >> 8;
136+
*p++ = val;
137+
}
138+
139+
static inline void put_unaligned_be48(const u64 val, void *p)
140+
{
141+
__put_unaligned_be48(val, p);
142+
}
143+
144+
static inline u64 __get_unaligned_be48(const u8 *p)
145+
{
146+
return (u64)p[0] << 40 | (u64)p[1] << 32 | (u64)p[2] << 24 |
147+
p[3] << 16 | p[4] << 8 | p[5];
148+
}
149+
150+
static inline u64 get_unaligned_be48(const void *p)
151+
{
152+
return __get_unaligned_be48(p);
153+
}
154+
#pragma GCC diagnostic pop
23155

156+
#endif /* __ASM_GENERIC_UNALIGNED_H */

tools/lib/api/io.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,7 @@ static inline ssize_t io__getline(struct io *io, char **line_out, size_t *line_l
180180
return line_len;
181181
err_out:
182182
free(line);
183+
*line_out = NULL;
183184
return -ENOMEM;
184185
}
185186

tools/lib/perf/evlist.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -738,3 +738,12 @@ int perf_evlist__nr_groups(struct perf_evlist *evlist)
738738
}
739739
return nr_groups;
740740
}
741+
742+
void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel)
743+
{
744+
if (!evsel->system_wide) {
745+
evsel->system_wide = true;
746+
if (evlist->needs_map_propagation)
747+
__perf_evlist__propagate_maps(evlist, evsel);
748+
}
749+
}

tools/lib/perf/include/internal/evlist.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,4 +135,6 @@ int perf_evlist__id_add_fd(struct perf_evlist *evlist,
135135
void perf_evlist__reset_id_hash(struct perf_evlist *evlist);
136136

137137
void __perf_evlist__set_leader(struct list_head *list, struct perf_evsel *leader);
138+
139+
void perf_evlist__go_system_wide(struct perf_evlist *evlist, struct perf_evsel *evsel);
138140
#endif /* __LIBPERF_INTERNAL_EVLIST_H */

tools/lib/perf/include/internal/rc_check.h

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,12 @@
99
* Enable reference count checking implicitly with leak checking, which is
1010
* integrated into address sanitizer.
1111
*/
12-
#if defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
12+
#if defined(__SANITIZE_ADDRESS__) || defined(LEAK_SANITIZER) || defined(ADDRESS_SANITIZER)
1313
#define REFCNT_CHECKING 1
14+
#elif defined(__has_feature)
15+
#if __has_feature(address_sanitizer) || __has_feature(leak_sanitizer)
16+
#define REFCNT_CHECKING 1
17+
#endif
1418
#endif
1519

1620
/*
@@ -50,6 +54,9 @@
5054
/* A put operation removing the indirection layer. */
5155
#define RC_CHK_PUT(object) {}
5256

57+
/* Pointer equality when the indirection may or may not be there. */
58+
#define RC_CHK_EQUAL(object1, object2) (object1 == object2)
59+
5360
#else
5461

5562
/* Replaces "struct foo" so that the pointer may be interposed. */
@@ -97,6 +104,10 @@
97104
} \
98105
} while(0)
99106

107+
/* Pointer equality when the indirection may or may not be there. */
108+
#define RC_CHK_EQUAL(object1, object2) (object1 == object2 || \
109+
(object1 && object2 && object1->orig == object2->orig))
110+
100111
#endif
101112

102113
#endif /* __LIBPERF_INTERNAL_RC_CHECK_H */

tools/perf/Documentation/perf-bench.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,14 @@ Options of *pipe*
124124
--loop=::
125125
Specify number of loops.
126126

127+
-G::
128+
--cgroups=::
129+
Names of cgroups for sender and receiver, separated by a comma.
130+
This is useful to check cgroup context switching overhead.
131+
Note that perf doesn't create nor delete the cgroups, so users should
132+
make sure that the cgroups exist and are accessible before use.
133+
134+
127135
Example of *pipe*
128136
^^^^^^^^^^^^^^^^^
129137

@@ -141,6 +149,17 @@ Example of *pipe*
141149
Total time:0.016 sec
142150
16.948000 usecs/op
143151
59004 ops/sec
152+
153+
% perf bench sched pipe -G AAA,BBB
154+
(executing 1000000 pipe operations between cgroups)
155+
# Running 'sched/pipe' benchmark:
156+
# Executed 1000000 pipe operations between two processes
157+
158+
Total time: 6.886 [sec]
159+
160+
6.886208 usecs/op
161+
145217 ops/sec
162+
144163
---------------------
145164

146165
SUITES FOR 'syscall'

0 commit comments

Comments
 (0)