Skip to content

Commit 1dd3393

Browse files
committed
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (29 commits) perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ drivers/perf: apple_m1: Map generic branch events drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association perf: imx9_perf: Introduce AXI filter version to refactor the driver and better extension perf/arm-cmn: Permit more exhaustive groups perf/dwc_pcie: Qualify RAS DES VSEC Capability by Vendor, Revision drivers/perf: hisi: Delete redundant blank line of DDRC PMU drivers/perf: hisi: Fix incorrect variable name "hha_pmu" in DDRC PMU driver drivers/perf: hisi: Export associated CPUs of each PMU through sysfs drivers/perf: hisi: Provide a generic implementation of cpumask/identifier drivers/perf: hisi: Add a common function to retrieve topology from firmware drivers/perf: hisi: Extract topology information to a separate structure drivers/perf: hisi: Refactor the detection of associated CPUs drivers/perf: hisi: Migrate to one online CPU if no associated one online drivers/perf: hisi: Don't update the associated_cpus on CPU offline drivers/perf: hisi: Define a symbol namespace for HiSilicon Uncore PMUs perf/marvell: Odyssey LLC-TAD performance monitor support perf/marvell: Refactor to extract platform data ...
2 parents 602ffd4 + ba113ec commit 1dd3393

25 files changed

+1059
-499
lines changed

Documentation/admin-guide/perf/dwc_pcie_pmu.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ description of available events and configuration options in sysfs, see
6060
The "format" directory describes format of the config fields of the
6161
perf_event_attr structure. The "events" directory provides configuration
6262
templates for all documented events. For example,
63-
"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
63+
"rx_pcie_tlp_data_payload" is an equivalent of "eventid=0x21,type=0x0".
6464

6565
The "perf list" command shall list the available events from sysfs, e.g.::
6666

@@ -79,8 +79,8 @@ Example usage of counting PCIe RX TLP data payload (Units of bytes)::
7979

8080
The average RX/TX bandwidth can be calculated using the following formula:
8181

82-
PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
83-
PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
82+
PCIe RX Bandwidth = rx_pcie_tlp_data_payload / Measure_Time_Window
83+
PCIe TX Bandwidth = tx_pcie_tlp_data_payload / Measure_Time_Window
8484

8585
Lane Event Usage
8686
-------------------------------

Documentation/admin-guide/perf/hisi-pmu.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
3535
SCCL ID #1.
3636

3737
The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
38-
ID used to count the uncore PMU event.
38+
ID used to count the uncore PMU event. An "associated_cpus" sysfs attribute is
39+
also provided to show the CPUs associated with this PMU. The "cpumask" indicates
40+
the CPUs to open the events, usually as a hint for userspaces tools like perf.
41+
It only contains one associated CPU from the "associated_cpus".
3942

4043
Example usage of perf::
4144

Documentation/admin-guide/perf/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Performance monitor support
1414
qcom_l2_pmu
1515
qcom_l3_pmu
1616
starfive_starlink_pmu
17+
mrvl-odyssey-ddr-pmu
18+
mrvl-odyssey-tad-pmu
1719
arm-ccn
1820
arm-cmn
1921
arm-ni
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
===================================================================
2+
Marvell Odyssey DDR PMU Performance Monitoring Unit (PMU UNCORE)
3+
===================================================================
4+
5+
Odyssey DRAM Subsystem supports eight counters for monitoring performance
6+
and software can program those counters to monitor any of the defined
7+
performance events. Supported performance events include those counted
8+
at the interface between the DDR controller and the PHY, interface between
9+
the DDR Controller and the CHI interconnect, or within the DDR Controller.
10+
11+
Additionally DSS also supports two fixed performance event counters, one
12+
for ddr reads and the other for ddr writes.
13+
14+
The counter will be operating in either manual or auto mode.
15+
16+
The PMU driver exposes the available events and format options under sysfs::
17+
18+
/sys/bus/event_source/devices/mrvl_ddr_pmu_<>/events/
19+
/sys/bus/event_source/devices/mrvl_ddr_pmu_<>/format/
20+
21+
Examples::
22+
23+
$ perf list | grep ddr
24+
mrvl_ddr_pmu_<>/ddr_act_bypass_access/ [Kernel PMU event]
25+
mrvl_ddr_pmu_<>/ddr_bsm_alloc/ [Kernel PMU event]
26+
mrvl_ddr_pmu_<>/ddr_bsm_starvation/ [Kernel PMU event]
27+
mrvl_ddr_pmu_<>/ddr_cam_active_access/ [Kernel PMU event]
28+
mrvl_ddr_pmu_<>/ddr_cam_mwr/ [Kernel PMU event]
29+
mrvl_ddr_pmu_<>/ddr_cam_rd_active_access/ [Kernel PMU event]
30+
mrvl_ddr_pmu_<>/ddr_cam_rd_or_wr_access/ [Kernel PMU event]
31+
mrvl_ddr_pmu_<>/ddr_cam_read/ [Kernel PMU event]
32+
mrvl_ddr_pmu_<>/ddr_cam_wr_access/ [Kernel PMU event]
33+
mrvl_ddr_pmu_<>/ddr_cam_write/ [Kernel PMU event]
34+
mrvl_ddr_pmu_<>/ddr_capar_error/ [Kernel PMU event]
35+
mrvl_ddr_pmu_<>/ddr_crit_ref/ [Kernel PMU event]
36+
mrvl_ddr_pmu_<>/ddr_ddr_reads/ [Kernel PMU event]
37+
mrvl_ddr_pmu_<>/ddr_ddr_writes/ [Kernel PMU event]
38+
mrvl_ddr_pmu_<>/ddr_dfi_cmd_is_retry/ [Kernel PMU event]
39+
mrvl_ddr_pmu_<>/ddr_dfi_cycles/ [Kernel PMU event]
40+
mrvl_ddr_pmu_<>/ddr_dfi_parity_poison/ [Kernel PMU event]
41+
mrvl_ddr_pmu_<>/ddr_dfi_rd_data_access/ [Kernel PMU event]
42+
mrvl_ddr_pmu_<>/ddr_dfi_wr_data_access/ [Kernel PMU event]
43+
mrvl_ddr_pmu_<>/ddr_dqsosc_mpc/ [Kernel PMU event]
44+
mrvl_ddr_pmu_<>/ddr_dqsosc_mrr/ [Kernel PMU event]
45+
mrvl_ddr_pmu_<>/ddr_enter_mpsm/ [Kernel PMU event]
46+
mrvl_ddr_pmu_<>/ddr_enter_powerdown/ [Kernel PMU event]
47+
mrvl_ddr_pmu_<>/ddr_enter_selfref/ [Kernel PMU event]
48+
mrvl_ddr_pmu_<>/ddr_hif_pri_rdaccess/ [Kernel PMU event]
49+
mrvl_ddr_pmu_<>/ddr_hif_rd_access/ [Kernel PMU event]
50+
mrvl_ddr_pmu_<>/ddr_hif_rd_or_wr_access/ [Kernel PMU event]
51+
mrvl_ddr_pmu_<>/ddr_hif_rmw_access/ [Kernel PMU event]
52+
mrvl_ddr_pmu_<>/ddr_hif_wr_access/ [Kernel PMU event]
53+
mrvl_ddr_pmu_<>/ddr_hpri_sched_rd_crit_access/ [Kernel PMU event]
54+
mrvl_ddr_pmu_<>/ddr_load_mode/ [Kernel PMU event]
55+
mrvl_ddr_pmu_<>/ddr_lpri_sched_rd_crit_access/ [Kernel PMU event]
56+
mrvl_ddr_pmu_<>/ddr_precharge/ [Kernel PMU event]
57+
mrvl_ddr_pmu_<>/ddr_precharge_for_other/ [Kernel PMU event]
58+
mrvl_ddr_pmu_<>/ddr_precharge_for_rdwr/ [Kernel PMU event]
59+
mrvl_ddr_pmu_<>/ddr_raw_hazard/ [Kernel PMU event]
60+
mrvl_ddr_pmu_<>/ddr_rd_bypass_access/ [Kernel PMU event]
61+
mrvl_ddr_pmu_<>/ddr_rd_crc_error/ [Kernel PMU event]
62+
mrvl_ddr_pmu_<>/ddr_rd_uc_ecc_error/ [Kernel PMU event]
63+
mrvl_ddr_pmu_<>/ddr_rdwr_transitions/ [Kernel PMU event]
64+
mrvl_ddr_pmu_<>/ddr_refresh/ [Kernel PMU event]
65+
mrvl_ddr_pmu_<>/ddr_retry_fifo_full/ [Kernel PMU event]
66+
mrvl_ddr_pmu_<>/ddr_spec_ref/ [Kernel PMU event]
67+
mrvl_ddr_pmu_<>/ddr_tcr_mrr/ [Kernel PMU event]
68+
mrvl_ddr_pmu_<>/ddr_war_hazard/ [Kernel PMU event]
69+
mrvl_ddr_pmu_<>/ddr_waw_hazard/ [Kernel PMU event]
70+
mrvl_ddr_pmu_<>/ddr_win_limit_reached_rd/ [Kernel PMU event]
71+
mrvl_ddr_pmu_<>/ddr_win_limit_reached_wr/ [Kernel PMU event]
72+
mrvl_ddr_pmu_<>/ddr_wr_crc_error/ [Kernel PMU event]
73+
mrvl_ddr_pmu_<>/ddr_wr_trxn_crit_access/ [Kernel PMU event]
74+
mrvl_ddr_pmu_<>/ddr_write_combine/ [Kernel PMU event]
75+
mrvl_ddr_pmu_<>/ddr_zqcl/ [Kernel PMU event]
76+
mrvl_ddr_pmu_<>/ddr_zqlatch/ [Kernel PMU event]
77+
mrvl_ddr_pmu_<>/ddr_zqstart/ [Kernel PMU event]
78+
79+
$ perf stat -e ddr_cam_read,ddr_cam_write,ddr_cam_active_access,ddr_cam
80+
rd_or_wr_access,ddr_cam_rd_active_access,ddr_cam_mwr <workload>
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
====================================================================
2+
Marvell Odyssey LLC-TAD Performance Monitoring Unit (PMU UNCORE)
3+
====================================================================
4+
5+
Each TAD provides eight 64-bit counters for monitoring
6+
cache behavior.The driver always configures the same counter for
7+
all the TADs. The user would end up effectively reserving one of
8+
eight counters in every TAD to look across all TADs.
9+
The occurrences of events are aggregated and presented to the user
10+
at the end of running the workload. The driver does not provide a
11+
way for the user to partition TADs so that different TADs are used for
12+
different applications.
13+
14+
The performance events reflect various internal or interface activities.
15+
By combining the values from multiple performance counters, cache
16+
performance can be measured in terms such as: cache miss rate, cache
17+
allocations, interface retry rate, internal resource occupancy, etc.
18+
19+
The PMU driver exposes the available events and format options under sysfs::
20+
21+
/sys/bus/event_source/devices/tad/events/
22+
/sys/bus/event_source/devices/tad/format/
23+
24+
Examples::
25+
26+
$ perf list | grep tad
27+
tad/tad_alloc_any/ [Kernel PMU event]
28+
tad/tad_alloc_dtg/ [Kernel PMU event]
29+
tad/tad_alloc_ltg/ [Kernel PMU event]
30+
tad/tad_hit_any/ [Kernel PMU event]
31+
tad/tad_hit_dtg/ [Kernel PMU event]
32+
tad/tad_hit_ltg/ [Kernel PMU event]
33+
tad/tad_req_msh_in_exlmn/ [Kernel PMU event]
34+
tad/tad_tag_rd/ [Kernel PMU event]
35+
tad/tad_tot_cycle/ [Kernel PMU event]
36+
37+
$ perf stat -e tad_alloc_dtg,tad_alloc_ltg,tad_alloc_any,tad_hit_dtg,tad_hit_ltg,tad_hit_any,tad_tag_rd <workload>

Documentation/admin-guide/perf/nvidia-pmu.rst

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
3434
traffic coverage.
3535

3636
The events and configuration options of this PMU device are described in sysfs,
37-
see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
37+
see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
3838

3939
Example usage:
4040

@@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
6666
the PMU traffic coverage.
6767

6868
The events and configuration options of this PMU device are described in sysfs,
69-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
69+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
7070

7171
Example usage:
7272

@@ -86,6 +86,22 @@ Example usage:
8686

8787
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
8888

89+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
90+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
91+
parameter to select the port(s) to monitor. Each bit represents the port number,
92+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
93+
PMU will monitor both ports by default if not specified.
94+
95+
Example for port filtering:
96+
97+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
98+
99+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
100+
101+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
102+
103+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
104+
89105
NVLink-C2C1 PMU
90106
-------------------
91107

@@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
96112
the PMU traffic coverage.
97113

98114
The events and configuration options of this PMU device are described in sysfs,
99-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
115+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
100116

101117
Example usage:
102118

@@ -116,6 +132,22 @@ Example usage:
116132

117133
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
118134

135+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
136+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
137+
parameter to select the port(s) to monitor. Each bit represents the port number,
138+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
139+
PMU will monitor both ports by default if not specified.
140+
141+
Example for port filtering:
142+
143+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
144+
145+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
146+
147+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
148+
149+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
150+
119151
CNVLink PMU
120152
---------------
121153

@@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
125157
for more info about the PMU traffic coverage.
126158

127159
The events and configuration options of this PMU device are described in sysfs,
128-
see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
160+
see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
129161

130162
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
131163
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
132164
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
133-
socket 1 to 3.
134-
/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
165+
socket 1 to 3. The PMU will monitor all remote sockets by default if not
166+
specified.
167+
/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
135168
shows the valid bits that can be set in the "rem_socket" parameter.
136169

137170
The PMU can not distinguish the remote traffic initiator, therefore it does not
@@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
165198
for more info about the PMU traffic coverage.
166199

167200
The events and configuration options of this PMU device are described in sysfs,
168-
see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
201+
see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
169202

170203
Each SoC socket can support multiple root ports. The user can use
171204
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
172-
"root_port=0xF" corresponds to root port 0 to 3.
173-
/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
205+
"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
206+
ports by default if not specified.
207+
/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
174208
shows the valid bits that can be set in the "root_port" parameter.
175209

176210
Example usage:

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1920,6 +1920,7 @@ ARM PMU PROFILING AND DEBUGGING
19201920
M: Will Deacon <will@kernel.org>
19211921
M: Mark Rutland <mark.rutland@arm.com>
19221922
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
1923+
L: linux-perf-users@vger.kernel.org
19231924
S: Maintained
19241925
F: Documentation/devicetree/bindings/arm/pmu.yaml
19251926
F: Documentation/devicetree/bindings/perf/

drivers/perf/apple_m1_cpu_pmu.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,8 @@ static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = {
168168
PERF_MAP_ALL_UNSUPPORTED,
169169
[PERF_COUNT_HW_CPU_CYCLES] = M1_PMU_PERFCTR_CORE_ACTIVE_CYCLE,
170170
[PERF_COUNT_HW_INSTRUCTIONS] = M1_PMU_PERFCTR_INST_ALL,
171+
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = M1_PMU_PERFCTR_INST_BRANCH,
172+
[PERF_COUNT_HW_BRANCH_MISSES] = M1_PMU_PERFCTR_BRANCH_MISPRED_NONSPEC,
171173
};
172174

173175
/* sysfs definitions */

drivers/perf/arm-cmn.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1713,8 +1713,8 @@ static int arm_cmn_validate_group(struct arm_cmn *cmn, struct perf_event *event)
17131713
goto done;
17141714
}
17151715

1716-
for (i = 0; i < CMN_MAX_DTCS; i++)
1717-
if (val->dtc_count[i] == CMN_DT_NUM_COUNTERS)
1716+
for_each_hw_dtc_idx(hw, dtc, idx)
1717+
if (val->dtc_count[dtc] == CMN_DT_NUM_COUNTERS)
17181718
goto done;
17191719

17201720
for_each_hw_dn(hw, dn, i) {

0 commit comments

Comments
 (0)