Skip to content

Commit 57d1737

Browse files
committed
Merge tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo: "New features: - Add 'trace' subcommand for 'perf ftrace', setting the stage for more 'perf ftrace' subcommands. Not using a subcommand yields the previous behaviour of 'perf ftrace'. - Add 'latency' subcommand to 'perf ftrace', that can use the function graph tracer or a BPF optimized one, via the -b/--use-bpf option. E.g.: $ sudo perf ftrace latency -a -T mutex_lock sleep 1 # DURATION | COUNT | GRAPH | 0 - 1 us | 4596 | ######################## | 1 - 2 us | 1680 | ######### | 2 - 4 us | 1106 | ##### | 4 - 8 us | 546 | ## | 8 - 16 us | 562 | ### | 16 - 32 us | 1 | | 32 - 64 us | 0 | | 64 - 128 us | 0 | | 128 - 256 us | 0 | | 256 - 512 us | 0 | | 512 - 1024 us | 0 | | 1 - 2 ms | 0 | | 2 - 4 ms | 0 | | 4 - 8 ms | 0 | | 8 - 16 ms | 0 | | 16 - 32 ms | 0 | | 32 - 64 ms | 0 | | 64 - 128 ms | 0 | | 128 - 256 ms | 0 | | 256 - 512 ms | 0 | | 512 - 1024 ms | 0 | | 1 - ... s | 0 | | The original implementation of this command was in the bcc tool. - Support --cputype option for hybrid events in 'perf stat'. Improvements: - Call chain improvements for ARM64. - No need to do any affinity setup when profiling pids. - Reduce multiplexing with duration_time in 'perf stat' metrics. - Improve error message for uncore events, stating that some event groups are can only be used in system wide (-a) mode. - perf stat metric group leader fixes/improvements, including arch specific changes to better support Intel topdown events. - Probe non-deprecated sysfs path first, i.e. try the path /sys/devices/system/cpu/cpuN/topology/thread_siblings first, then the old /sys/devices/system/cpu/cpuN/topology/core_cpus. - Disable debuginfod by default in 'perf record', to avoid stalls on distros such as Fedora 35. - Use unbuffered output in 'perf bench' when pipe/tee'ing to a file. - Enable ignore_missing_thread in 'perf trace' Fixes: - Avoid TUI crash when navigating in the annotation of recursive functions. - Fix hex dump character output in 'perf script'. - Fix JSON indentation to 4 spaces standard in the ARM vendor event files. - Fix use after free in metric__new(). - Fix IS_ERR_OR_NULL() usage in the perf BPF loader. - Fix up cross-arch register support, i.e. when printing register names take into account the architecture where the perf.data file was collected. - Fix SMT fallback with large core counts. - Don't lower case MetricExpr when parsing JSON files so as not to lose info such as the ":G" event modifier in metrics. perf test: - Add basic stress test for sigtrap handling to 'perf test'. - Fix 'perf test' failures on s/390 - Enable system wide for metricgroups test in 'perf test´. - Use 3 digits for test numbering now we can have more tests. Arch specific: - Add events for Arm Neoverse N2 in the ARM JSON vendor event files - Support PERF_MEM_LVLNUM encodings in powerpc, that came from a single patch series, where I incorrectly merged the kernel bits, that were then reverted after coordination with Michael Ellerman and Stephen Rothwell. - Add ARM SPE total latency as PERF_SAMPLE_WEIGHT. - Update AMD documentation, with info on raw event encoding. - Add support for global and local variants of the "p_stage_cyc" sort key, applicable to perf.data files collected on powerpc. - Remove duplicate and incorrect aux size checks in the ARM CoreSight ETM code. Refactorings: - Add a perf_cpu abstraction to disambiguate CPUs and CPU map indexes, fixing problems along the way. - Document CPU map methods. UAPI sync: - Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' - Sync UAPI files with the kernel sources: drm, msr-index, cpufeatures. Build system - Enable warnings through HOSTCFLAGS. - Drop requirement for libstdc++.so for libopencsd check libperf: - Make libperf adopt perf_counts_values__scale() from tools/perf/util/. - Add a stat multiplexing test to libperf" * tag 'perf-tools-for-v5.17-2022-01-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (115 commits) perf record: Disable debuginfod by default perf evlist: No need to do any affinity setup when profiling pids perf cpumap: Add is_dummy() method perf metric: Fix metric_leader perf cputopo: Fix CPU topology reading on s/390 perf metricgroup: Fix use after free in metric__new() libperf tests: Update a use of the new cpumap API perf arm: Fix off-by-one directory path tools arch x86: Sync the msr-index.h copy with the kernel sources tools headers cpufeatures: Sync with the kernel sources tools headers UAPI: Update tools's copy of drm.h header tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy' perf pmu-events: Don't lower case MetricExpr perf expr: Add debug logging for literals perf tools: Probe non-deprecated sysfs path 1st perf tools: Fix SMT fallback with large core counts perf cpumap: Give CPUs their own type perf stat: Correct first_shadow_cpu to return index perf script: Fix flipped index and cpu perf c2c: Use more intention revealing iterator ...
2 parents f003368 + 9bce13e commit 57d1737

File tree

153 files changed

+4685
-2175
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

153 files changed

+4685
-2175
lines changed

tools/arch/x86/include/asm/cpufeatures.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,7 @@
315315
#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */
316316
#define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */
317317
#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
318+
#define X86_FEATURE_CPPC (13*32+27) /* Collaborative Processor Performance Control */
318319

319320
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */
320321
#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */

tools/arch/x86/include/asm/msr-index.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,23 @@
486486

487487
#define MSR_AMD64_VIRT_SPEC_CTRL 0xc001011f
488488

489+
/* AMD Collaborative Processor Performance Control MSRs */
490+
#define MSR_AMD_CPPC_CAP1 0xc00102b0
491+
#define MSR_AMD_CPPC_ENABLE 0xc00102b1
492+
#define MSR_AMD_CPPC_CAP2 0xc00102b2
493+
#define MSR_AMD_CPPC_REQ 0xc00102b3
494+
#define MSR_AMD_CPPC_STATUS 0xc00102b4
495+
496+
#define AMD_CPPC_LOWEST_PERF(x) (((x) >> 0) & 0xff)
497+
#define AMD_CPPC_LOWNONLIN_PERF(x) (((x) >> 8) & 0xff)
498+
#define AMD_CPPC_NOMINAL_PERF(x) (((x) >> 16) & 0xff)
499+
#define AMD_CPPC_HIGHEST_PERF(x) (((x) >> 24) & 0xff)
500+
501+
#define AMD_CPPC_MAX_PERF(x) (((x) & 0xff) << 0)
502+
#define AMD_CPPC_MIN_PERF(x) (((x) & 0xff) << 8)
503+
#define AMD_CPPC_DES_PERF(x) (((x) & 0xff) << 16)
504+
#define AMD_CPPC_ENERGY_PERF_PREF(x) (((x) & 0xff) << 24)
505+
489506
/* Fam 17h MSRs */
490507
#define MSR_F17H_IRPERF 0xc00000e9
491508

tools/arch/x86/lib/memcpy_64.S

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ SYM_FUNC_START_WEAK(memcpy)
3939
rep movsq
4040
movl %edx, %ecx
4141
rep movsb
42-
ret
42+
RET
4343
SYM_FUNC_END(memcpy)
4444
SYM_FUNC_END_ALIAS(__memcpy)
4545
EXPORT_SYMBOL(memcpy)
@@ -53,7 +53,7 @@ SYM_FUNC_START_LOCAL(memcpy_erms)
5353
movq %rdi, %rax
5454
movq %rdx, %rcx
5555
rep movsb
56-
ret
56+
RET
5757
SYM_FUNC_END(memcpy_erms)
5858

5959
SYM_FUNC_START_LOCAL(memcpy_orig)
@@ -137,7 +137,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
137137
movq %r9, 1*8(%rdi)
138138
movq %r10, -2*8(%rdi, %rdx)
139139
movq %r11, -1*8(%rdi, %rdx)
140-
retq
140+
RET
141141
.p2align 4
142142
.Lless_16bytes:
143143
cmpl $8, %edx
@@ -149,7 +149,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
149149
movq -1*8(%rsi, %rdx), %r9
150150
movq %r8, 0*8(%rdi)
151151
movq %r9, -1*8(%rdi, %rdx)
152-
retq
152+
RET
153153
.p2align 4
154154
.Lless_8bytes:
155155
cmpl $4, %edx
@@ -162,7 +162,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
162162
movl -4(%rsi, %rdx), %r8d
163163
movl %ecx, (%rdi)
164164
movl %r8d, -4(%rdi, %rdx)
165-
retq
165+
RET
166166
.p2align 4
167167
.Lless_3bytes:
168168
subl $1, %edx
@@ -180,7 +180,7 @@ SYM_FUNC_START_LOCAL(memcpy_orig)
180180
movb %cl, (%rdi)
181181

182182
.Lend:
183-
retq
183+
RET
184184
SYM_FUNC_END(memcpy_orig)
185185

186186
.popsection

tools/arch/x86/lib/memset_64.S

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ SYM_FUNC_START(__memset)
4040
movl %edx,%ecx
4141
rep stosb
4242
movq %r9,%rax
43-
ret
43+
RET
4444
SYM_FUNC_END(__memset)
4545
SYM_FUNC_END_ALIAS(memset)
4646
EXPORT_SYMBOL(memset)
@@ -63,7 +63,7 @@ SYM_FUNC_START_LOCAL(memset_erms)
6363
movq %rdx,%rcx
6464
rep stosb
6565
movq %r9,%rax
66-
ret
66+
RET
6767
SYM_FUNC_END(memset_erms)
6868

6969
SYM_FUNC_START_LOCAL(memset_orig)
@@ -125,7 +125,7 @@ SYM_FUNC_START_LOCAL(memset_orig)
125125

126126
.Lende:
127127
movq %r10,%rax
128-
ret
128+
RET
129129

130130
.Lbad_alignment:
131131
cmpq $7,%rdx

tools/build/Build.include

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXX
9999
###
100100
## HOSTCC C flags
101101

102-
host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
102+
host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
103103

104104
# output directory for tests below
105105
TMPOUT = .tmp_$$$$

tools/include/uapi/drm/drm.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1096,6 +1096,24 @@ extern "C" {
10961096
#define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct drm_syncobj_transfer)
10971097
#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL DRM_IOWR(0xCD, struct drm_syncobj_timeline_array)
10981098

1099+
/**
1100+
* DRM_IOCTL_MODE_GETFB2 - Get framebuffer metadata.
1101+
*
1102+
* This queries metadata about a framebuffer. User-space fills
1103+
* &drm_mode_fb_cmd2.fb_id as the input, and the kernels fills the rest of the
1104+
* struct as the output.
1105+
*
1106+
* If the client is DRM master or has &CAP_SYS_ADMIN, &drm_mode_fb_cmd2.handles
1107+
* will be filled with GEM buffer handles. Planes are valid until one has a
1108+
* zero handle -- this can be used to compute the number of planes.
1109+
*
1110+
* Otherwise, &drm_mode_fb_cmd2.handles will be zeroed and planes are valid
1111+
* until one has a zero &drm_mode_fb_cmd2.pitches.
1112+
*
1113+
* If the framebuffer has a format modifier, &DRM_MODE_FB_MODIFIERS will be set
1114+
* in &drm_mode_fb_cmd2.flags and &drm_mode_fb_cmd2.modifier will contain the
1115+
* modifier. Otherwise, user-space must ignore &drm_mode_fb_cmd2.modifier.
1116+
*/
10991117
#define DRM_IOCTL_MODE_GETFB2 DRM_IOWR(0xCE, struct drm_mode_fb_cmd2)
11001118

11011119
/*

tools/include/uapi/linux/perf_event.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1332,7 +1332,10 @@ union perf_mem_data_src {
13321332

13331333
/* hop level */
13341334
#define PERF_MEM_HOPS_0 0x01 /* remote core, same node */
1335-
/* 2-7 available */
1335+
#define PERF_MEM_HOPS_1 0x02 /* remote node, same socket */
1336+
#define PERF_MEM_HOPS_2 0x03 /* remote socket, same board */
1337+
#define PERF_MEM_HOPS_3 0x04 /* remote board */
1338+
/* 5-7 available */
13361339
#define PERF_MEM_HOPS_SHIFT 43
13371340

13381341
#define PERF_MEM_S(a, s) \

tools/lib/perf/Documentation/libperf.txt

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ SYNOPSIS
4848
int perf_cpu_map__nr(const struct perf_cpu_map *cpus);
4949
bool perf_cpu_map__empty(const struct perf_cpu_map *map);
5050
int perf_cpu_map__max(struct perf_cpu_map *map);
51+
bool perf_cpu_map__has(const struct perf_cpu_map *map, int cpu);
5152

5253
#define perf_cpu_map__for_each_cpu(cpu, idx, cpus)
5354
--
@@ -135,16 +136,16 @@ SYNOPSIS
135136
int perf_evsel__open(struct perf_evsel *evsel, struct perf_cpu_map *cpus,
136137
struct perf_thread_map *threads);
137138
void perf_evsel__close(struct perf_evsel *evsel);
138-
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
139+
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu_map_idx);
139140
int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
140141
void perf_evsel__munmap(struct perf_evsel *evsel);
141-
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
142-
int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
142+
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu_map_idx, int thread);
143+
int perf_evsel__read(struct perf_evsel *evsel, int cpu_map_idx, int thread,
143144
struct perf_counts_values *count);
144145
int perf_evsel__enable(struct perf_evsel *evsel);
145-
int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu);
146+
int perf_evsel__enable_cpu(struct perf_evsel *evsel, int cpu_map_idx);
146147
int perf_evsel__disable(struct perf_evsel *evsel);
147-
int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu);
148+
int perf_evsel__disable_cpu(struct perf_evsel *evsel, int cpu_map_idx);
148149
struct perf_cpu_map *perf_evsel__cpus(struct perf_evsel *evsel);
149150
struct perf_thread_map *perf_evsel__threads(struct perf_evsel *evsel);
150151
struct perf_event_attr *perf_evsel__attr(struct perf_evsel *evsel);

0 commit comments

Comments
 (0)