Skip to content

Commit 9ad09c4

Browse files
committed
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon: "We've got a little less than normal thanks to the holidays in December, but there's the usual summary below. The highlight is probably the 52-bit physical addressing (LPA2) clean-up from Ard. Confidential Computing: - Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules CPU Features: - Update a bunch of system register definitions to pick up new field encodings from the architectural documentation - Add hwcaps and selftests for the new (2024) dpISA extensions Documentation: - Update EL3 (firmware) requirements for booting Linux on modern arm64 designs - Remove stale information about the kernel virtual memory map Miscellaneous: - Minor cleanups and typo fixes Memory management: - Fix vmemmap_check_pmd() to look at the PMD type bits - LPA2 (52-bit physical addressing) cleanups and minor fixes - Adjust physical address space depending upon whether or not LPA2 is enabled Perf and PMUs: - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs - Fix Designware PCIe PMU event numbering - Add generic branch events for the Apple M1 CPU PMU - Add support for Marvell Odyssey DDR and LLC-TAD PMUs - Cleanups to the Hisilicon DDRC and Uncore PMU code - Advertise discard mode for the SPE PMU - Add the perf users mailing list to our MAINTAINERS entry" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) Documentation: arm64: Remove stale and redundant virtual memory diagrams perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ arm64: Remove duplicate included header drivers/perf: apple_m1: Map generic branch events arm64: rsi: Add automatic arm-cca-guest module loading kselftest/arm64: Add 2024 dpISA extensions to hwcap test KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 arm64/hwcap: Describe 2024 dpISA extensions to userspace arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12 arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09 ...
2 parents e7244cc + 1dd3393 commit 9ad09c4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+1755
-751
lines changed

Documentation/admin-guide/perf/dwc_pcie_pmu.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ description of available events and configuration options in sysfs, see
6060
The "format" directory describes format of the config fields of the
6161
perf_event_attr structure. The "events" directory provides configuration
6262
templates for all documented events. For example,
63-
"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
63+
"rx_pcie_tlp_data_payload" is an equivalent of "eventid=0x21,type=0x0".
6464

6565
The "perf list" command shall list the available events from sysfs, e.g.::
6666

@@ -79,8 +79,8 @@ Example usage of counting PCIe RX TLP data payload (Units of bytes)::
7979

8080
The average RX/TX bandwidth can be calculated using the following formula:
8181

82-
PCIe RX Bandwidth = Rx_PCIe_TLP_Data_Payload / Measure_Time_Window
83-
PCIe TX Bandwidth = Tx_PCIe_TLP_Data_Payload / Measure_Time_Window
82+
PCIe RX Bandwidth = rx_pcie_tlp_data_payload / Measure_Time_Window
83+
PCIe TX Bandwidth = tx_pcie_tlp_data_payload / Measure_Time_Window
8484

8585
Lane Event Usage
8686
-------------------------------

Documentation/admin-guide/perf/hisi-pmu.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ e.g. hisi_sccl1_hha0/rx_operations is RX_OPERATIONS event of HHA index #0 in
3535
SCCL ID #1.
3636

3737
The driver also provides a "cpumask" sysfs attribute, which shows the CPU core
38-
ID used to count the uncore PMU event.
38+
ID used to count the uncore PMU event. An "associated_cpus" sysfs attribute is
39+
also provided to show the CPUs associated with this PMU. The "cpumask" indicates
40+
the CPUs to open the events, usually as a hint for userspaces tools like perf.
41+
It only contains one associated CPU from the "associated_cpus".
3942

4043
Example usage of perf::
4144

Documentation/admin-guide/perf/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Performance monitor support
1414
qcom_l2_pmu
1515
qcom_l3_pmu
1616
starfive_starlink_pmu
17+
mrvl-odyssey-ddr-pmu
18+
mrvl-odyssey-tad-pmu
1719
arm-ccn
1820
arm-cmn
1921
arm-ni
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
===================================================================
2+
Marvell Odyssey DDR PMU Performance Monitoring Unit (PMU UNCORE)
3+
===================================================================
4+
5+
Odyssey DRAM Subsystem supports eight counters for monitoring performance
6+
and software can program those counters to monitor any of the defined
7+
performance events. Supported performance events include those counted
8+
at the interface between the DDR controller and the PHY, interface between
9+
the DDR Controller and the CHI interconnect, or within the DDR Controller.
10+
11+
Additionally DSS also supports two fixed performance event counters, one
12+
for ddr reads and the other for ddr writes.
13+
14+
The counter will be operating in either manual or auto mode.
15+
16+
The PMU driver exposes the available events and format options under sysfs::
17+
18+
/sys/bus/event_source/devices/mrvl_ddr_pmu_<>/events/
19+
/sys/bus/event_source/devices/mrvl_ddr_pmu_<>/format/
20+
21+
Examples::
22+
23+
$ perf list | grep ddr
24+
mrvl_ddr_pmu_<>/ddr_act_bypass_access/ [Kernel PMU event]
25+
mrvl_ddr_pmu_<>/ddr_bsm_alloc/ [Kernel PMU event]
26+
mrvl_ddr_pmu_<>/ddr_bsm_starvation/ [Kernel PMU event]
27+
mrvl_ddr_pmu_<>/ddr_cam_active_access/ [Kernel PMU event]
28+
mrvl_ddr_pmu_<>/ddr_cam_mwr/ [Kernel PMU event]
29+
mrvl_ddr_pmu_<>/ddr_cam_rd_active_access/ [Kernel PMU event]
30+
mrvl_ddr_pmu_<>/ddr_cam_rd_or_wr_access/ [Kernel PMU event]
31+
mrvl_ddr_pmu_<>/ddr_cam_read/ [Kernel PMU event]
32+
mrvl_ddr_pmu_<>/ddr_cam_wr_access/ [Kernel PMU event]
33+
mrvl_ddr_pmu_<>/ddr_cam_write/ [Kernel PMU event]
34+
mrvl_ddr_pmu_<>/ddr_capar_error/ [Kernel PMU event]
35+
mrvl_ddr_pmu_<>/ddr_crit_ref/ [Kernel PMU event]
36+
mrvl_ddr_pmu_<>/ddr_ddr_reads/ [Kernel PMU event]
37+
mrvl_ddr_pmu_<>/ddr_ddr_writes/ [Kernel PMU event]
38+
mrvl_ddr_pmu_<>/ddr_dfi_cmd_is_retry/ [Kernel PMU event]
39+
mrvl_ddr_pmu_<>/ddr_dfi_cycles/ [Kernel PMU event]
40+
mrvl_ddr_pmu_<>/ddr_dfi_parity_poison/ [Kernel PMU event]
41+
mrvl_ddr_pmu_<>/ddr_dfi_rd_data_access/ [Kernel PMU event]
42+
mrvl_ddr_pmu_<>/ddr_dfi_wr_data_access/ [Kernel PMU event]
43+
mrvl_ddr_pmu_<>/ddr_dqsosc_mpc/ [Kernel PMU event]
44+
mrvl_ddr_pmu_<>/ddr_dqsosc_mrr/ [Kernel PMU event]
45+
mrvl_ddr_pmu_<>/ddr_enter_mpsm/ [Kernel PMU event]
46+
mrvl_ddr_pmu_<>/ddr_enter_powerdown/ [Kernel PMU event]
47+
mrvl_ddr_pmu_<>/ddr_enter_selfref/ [Kernel PMU event]
48+
mrvl_ddr_pmu_<>/ddr_hif_pri_rdaccess/ [Kernel PMU event]
49+
mrvl_ddr_pmu_<>/ddr_hif_rd_access/ [Kernel PMU event]
50+
mrvl_ddr_pmu_<>/ddr_hif_rd_or_wr_access/ [Kernel PMU event]
51+
mrvl_ddr_pmu_<>/ddr_hif_rmw_access/ [Kernel PMU event]
52+
mrvl_ddr_pmu_<>/ddr_hif_wr_access/ [Kernel PMU event]
53+
mrvl_ddr_pmu_<>/ddr_hpri_sched_rd_crit_access/ [Kernel PMU event]
54+
mrvl_ddr_pmu_<>/ddr_load_mode/ [Kernel PMU event]
55+
mrvl_ddr_pmu_<>/ddr_lpri_sched_rd_crit_access/ [Kernel PMU event]
56+
mrvl_ddr_pmu_<>/ddr_precharge/ [Kernel PMU event]
57+
mrvl_ddr_pmu_<>/ddr_precharge_for_other/ [Kernel PMU event]
58+
mrvl_ddr_pmu_<>/ddr_precharge_for_rdwr/ [Kernel PMU event]
59+
mrvl_ddr_pmu_<>/ddr_raw_hazard/ [Kernel PMU event]
60+
mrvl_ddr_pmu_<>/ddr_rd_bypass_access/ [Kernel PMU event]
61+
mrvl_ddr_pmu_<>/ddr_rd_crc_error/ [Kernel PMU event]
62+
mrvl_ddr_pmu_<>/ddr_rd_uc_ecc_error/ [Kernel PMU event]
63+
mrvl_ddr_pmu_<>/ddr_rdwr_transitions/ [Kernel PMU event]
64+
mrvl_ddr_pmu_<>/ddr_refresh/ [Kernel PMU event]
65+
mrvl_ddr_pmu_<>/ddr_retry_fifo_full/ [Kernel PMU event]
66+
mrvl_ddr_pmu_<>/ddr_spec_ref/ [Kernel PMU event]
67+
mrvl_ddr_pmu_<>/ddr_tcr_mrr/ [Kernel PMU event]
68+
mrvl_ddr_pmu_<>/ddr_war_hazard/ [Kernel PMU event]
69+
mrvl_ddr_pmu_<>/ddr_waw_hazard/ [Kernel PMU event]
70+
mrvl_ddr_pmu_<>/ddr_win_limit_reached_rd/ [Kernel PMU event]
71+
mrvl_ddr_pmu_<>/ddr_win_limit_reached_wr/ [Kernel PMU event]
72+
mrvl_ddr_pmu_<>/ddr_wr_crc_error/ [Kernel PMU event]
73+
mrvl_ddr_pmu_<>/ddr_wr_trxn_crit_access/ [Kernel PMU event]
74+
mrvl_ddr_pmu_<>/ddr_write_combine/ [Kernel PMU event]
75+
mrvl_ddr_pmu_<>/ddr_zqcl/ [Kernel PMU event]
76+
mrvl_ddr_pmu_<>/ddr_zqlatch/ [Kernel PMU event]
77+
mrvl_ddr_pmu_<>/ddr_zqstart/ [Kernel PMU event]
78+
79+
$ perf stat -e ddr_cam_read,ddr_cam_write,ddr_cam_active_access,ddr_cam
80+
rd_or_wr_access,ddr_cam_rd_active_access,ddr_cam_mwr <workload>
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
====================================================================
2+
Marvell Odyssey LLC-TAD Performance Monitoring Unit (PMU UNCORE)
3+
====================================================================
4+
5+
Each TAD provides eight 64-bit counters for monitoring
6+
cache behavior.The driver always configures the same counter for
7+
all the TADs. The user would end up effectively reserving one of
8+
eight counters in every TAD to look across all TADs.
9+
The occurrences of events are aggregated and presented to the user
10+
at the end of running the workload. The driver does not provide a
11+
way for the user to partition TADs so that different TADs are used for
12+
different applications.
13+
14+
The performance events reflect various internal or interface activities.
15+
By combining the values from multiple performance counters, cache
16+
performance can be measured in terms such as: cache miss rate, cache
17+
allocations, interface retry rate, internal resource occupancy, etc.
18+
19+
The PMU driver exposes the available events and format options under sysfs::
20+
21+
/sys/bus/event_source/devices/tad/events/
22+
/sys/bus/event_source/devices/tad/format/
23+
24+
Examples::
25+
26+
$ perf list | grep tad
27+
tad/tad_alloc_any/ [Kernel PMU event]
28+
tad/tad_alloc_dtg/ [Kernel PMU event]
29+
tad/tad_alloc_ltg/ [Kernel PMU event]
30+
tad/tad_hit_any/ [Kernel PMU event]
31+
tad/tad_hit_dtg/ [Kernel PMU event]
32+
tad/tad_hit_ltg/ [Kernel PMU event]
33+
tad/tad_req_msh_in_exlmn/ [Kernel PMU event]
34+
tad/tad_tag_rd/ [Kernel PMU event]
35+
tad/tad_tot_cycle/ [Kernel PMU event]
36+
37+
$ perf stat -e tad_alloc_dtg,tad_alloc_ltg,tad_alloc_any,tad_hit_dtg,tad_hit_ltg,tad_hit_any,tad_tag_rd <workload>

Documentation/admin-guide/perf/nvidia-pmu.rst

Lines changed: 43 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ strongly-ordered (SO) PCIE write traffic to local/remote memory. Please see
3434
traffic coverage.
3535

3636
The events and configuration options of this PMU device are described in sysfs,
37-
see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
37+
see /sys/bus/event_source/devices/nvidia_scf_pmu_<socket-id>.
3838

3939
Example usage:
4040

@@ -66,7 +66,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
6666
the PMU traffic coverage.
6767

6868
The events and configuration options of this PMU device are described in sysfs,
69-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
69+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c0_pmu_<socket-id>.
7070

7171
Example usage:
7272

@@ -86,6 +86,22 @@ Example usage:
8686

8787
perf stat -a -e nvidia_nvlink_c2c0_pmu_3/event=0x0/
8888

89+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
90+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
91+
parameter to select the port(s) to monitor. Each bit represents the port number,
92+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
93+
PMU will monitor both ports by default if not specified.
94+
95+
Example for port filtering:
96+
97+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
98+
99+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x1/
100+
101+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
102+
103+
perf stat -a -e nvidia_nvlink_c2c0_pmu_0/event=0x0,port=0x3/
104+
89105
NVLink-C2C1 PMU
90106
-------------------
91107

@@ -96,7 +112,7 @@ Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section` for more info about
96112
the PMU traffic coverage.
97113

98114
The events and configuration options of this PMU device are described in sysfs,
99-
see /sys/bus/event_sources/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
115+
see /sys/bus/event_source/devices/nvidia_nvlink_c2c1_pmu_<socket-id>.
100116

101117
Example usage:
102118

@@ -116,6 +132,22 @@ Example usage:
116132

117133
perf stat -a -e nvidia_nvlink_c2c1_pmu_3/event=0x0/
118134

135+
The NVLink-C2C has two ports that can be connected to one GPU (occupying both
136+
ports) or to two GPUs (one GPU per port). The user can use "port" bitmap
137+
parameter to select the port(s) to monitor. Each bit represents the port number,
138+
e.g. "port=0x1" corresponds to port 0 and "port=0x3" is for port 0 and 1. The
139+
PMU will monitor both ports by default if not specified.
140+
141+
Example for port filtering:
142+
143+
* Count event id 0x0 from the GPU connected with socket 0 on port 0::
144+
145+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x1/
146+
147+
* Count event id 0x0 from the GPUs connected with socket 0 on port 0 and port 1::
148+
149+
perf stat -a -e nvidia_nvlink_c2c1_pmu_0/event=0x0,port=0x3/
150+
119151
CNVLink PMU
120152
---------------
121153

@@ -125,13 +157,14 @@ to local memory. For PCIE traffic, this PMU captures read and relaxed ordered
125157
for more info about the PMU traffic coverage.
126158

127159
The events and configuration options of this PMU device are described in sysfs,
128-
see /sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>.
160+
see /sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>.
129161

130162
Each SoC socket can be connected to one or more sockets via CNVLink. The user can
131163
use "rem_socket" bitmap parameter to select the remote socket(s) to monitor.
132164
Each bit represents the socket number, e.g. "rem_socket=0xE" corresponds to
133-
socket 1 to 3.
134-
/sys/bus/event_sources/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
165+
socket 1 to 3. The PMU will monitor all remote sockets by default if not
166+
specified.
167+
/sys/bus/event_source/devices/nvidia_cnvlink_pmu_<socket-id>/format/rem_socket
135168
shows the valid bits that can be set in the "rem_socket" parameter.
136169

137170
The PMU can not distinguish the remote traffic initiator, therefore it does not
@@ -165,12 +198,13 @@ local/remote memory. Please see :ref:`NVIDIA_Uncore_PMU_Traffic_Coverage_Section
165198
for more info about the PMU traffic coverage.
166199

167200
The events and configuration options of this PMU device are described in sysfs,
168-
see /sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>.
201+
see /sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>.
169202

170203
Each SoC socket can support multiple root ports. The user can use
171204
"root_port" bitmap parameter to select the port(s) to monitor, i.e.
172-
"root_port=0xF" corresponds to root port 0 to 3.
173-
/sys/bus/event_sources/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
205+
"root_port=0xF" corresponds to root port 0 to 3. The PMU will monitor all root
206+
ports by default if not specified.
207+
/sys/bus/event_source/devices/nvidia_pcie_pmu_<socket-id>/format/root_port
174208
shows the valid bits that can be set in the "root_port" parameter.
175209

176210
Example usage:

Documentation/arch/arm64/booting.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -449,6 +449,18 @@ Before jumping into the kernel, the following conditions must be met:
449449

450450
- HFGWTR_EL2.nGCS_EL0 (bit 52) must be initialised to 0b1.
451451

452+
- For CPUs with debug architecture i.e FEAT_Debugv8pN (all versions):
453+
454+
- If EL3 is present:
455+
456+
- MDCR_EL3.TDA (bit 9) must be initialized to 0b0
457+
458+
- For CPUs with FEAT_PMUv3:
459+
460+
- If EL3 is present:
461+
462+
- MDCR_EL3.TPM (bit 6) must be initialized to 0b0
463+
452464
The requirements described above for CPU mode, caches, MMUs, architected
453465
timers, coherency and system registers apply to all CPUs. All CPUs must
454466
enter the kernel in the same exception level. Where the values documented

0 commit comments

Comments
 (0)