Skip to content

Commit 25dfc9e

Browse files
Kan LiangKAGA-KOKO
authored andcommitted
perf/x86/intel: Limit the period on Haswell
Running the ltp test cve-2015-3290 concurrently reports the following warnings. perfevents: irq loop stuck! WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370 Call Trace: <NMI> ? __warn+0xa4/0x220 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? __report_bug+0x123/0x130 ? intel_pmu_handle_irq+0x285/0x370 ? report_bug+0x3e/0xa0 ? handle_bug+0x3c/0x70 ? exc_invalid_op+0x18/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? irq_work_claim+0x1e/0x40 ? intel_pmu_handle_irq+0x285/0x370 perf_event_nmi_handler+0x3d/0x60 nmi_handle+0x104/0x330 Thanks to Thomas Gleixner's analysis, the issue is caused by the low initial period (1) of the frequency estimation algorithm, which triggers the defects of the HW, specifically erratum HSW11 and HSW143. (For the details, please refer https://lore.kernel.org/lkml/87plq9l5d2.ffs@tglx/) The HSW11 requires a period larger than 100 for the INST_RETIRED.ALL event, but the initial period in the freq mode is 1. The erratum is the same as the BDM11, which has been supported in the kernel. A minimum period of 128 is enforced as well on HSW. HSW143 is regarding that the fixed counter 1 may overcount 32 with the Hyper-Threading is enabled. However, based on the test, the hardware has more issues than it tells. Besides the fixed counter 1, the message 'interrupt took too long' can be observed on any counter which was armed with a period < 32 and two events expired in the same NMI. A minimum period of 32 is enforced for the rest of the events. The recommended workaround code of the HSW143 is not implemented. Because it only addresses the issue for the fixed counter. It brings extra overhead through extra MSR writing. No related overcounting issue has been reported so far. Fixes: 3a632cb ("perf/x86/intel: Add simple Haswell PMU support") Reported-by: Li Huafei <lihuafei1@huawei.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240819183004.3132920-1-kan.liang@linux.intel.com Closes: https://lore.kernel.org/lkml/20240729223328.327835-1-lihuafei1@huawei.com/
1 parent 5be63fc commit 25dfc9e

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

arch/x86/events/intel/core.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4589,6 +4589,25 @@ static enum hybrid_cpu_type adl_get_hybrid_cpu_type(void)
45894589
return HYBRID_INTEL_CORE;
45904590
}
45914591

4592+
static inline bool erratum_hsw11(struct perf_event *event)
4593+
{
4594+
return (event->hw.config & INTEL_ARCH_EVENT_MASK) ==
4595+
X86_CONFIG(.event=0xc0, .umask=0x01);
4596+
}
4597+
4598+
/*
4599+
* The HSW11 requires a period larger than 100 which is the same as the BDM11.
4600+
* A minimum period of 128 is enforced as well for the INST_RETIRED.ALL.
4601+
*
4602+
* The message 'interrupt took too long' can be observed on any counter which
4603+
* was armed with a period < 32 and two events expired in the same NMI.
4604+
* A minimum period of 32 is enforced for the rest of the events.
4605+
*/
4606+
static void hsw_limit_period(struct perf_event *event, s64 *left)
4607+
{
4608+
*left = max(*left, erratum_hsw11(event) ? 128 : 32);
4609+
}
4610+
45924611
/*
45934612
* Broadwell:
45944613
*
@@ -4606,8 +4625,7 @@ static enum hybrid_cpu_type adl_get_hybrid_cpu_type(void)
46064625
*/
46074626
static void bdw_limit_period(struct perf_event *event, s64 *left)
46084627
{
4609-
if ((event->hw.config & INTEL_ARCH_EVENT_MASK) ==
4610-
X86_CONFIG(.event=0xc0, .umask=0x01)) {
4628+
if (erratum_hsw11(event)) {
46114629
if (*left < 128)
46124630
*left = 128;
46134631
*left &= ~0x3fULL;
@@ -6766,6 +6784,7 @@ __init int intel_pmu_init(void)
67666784

67676785
x86_pmu.hw_config = hsw_hw_config;
67686786
x86_pmu.get_event_constraints = hsw_get_event_constraints;
6787+
x86_pmu.limit_period = hsw_limit_period;
67696788
x86_pmu.lbr_double_abort = true;
67706789
extra_attr = boot_cpu_has(X86_FEATURE_RTM) ?
67716790
hsw_format_attr : nhm_format_attr;

0 commit comments

Comments
 (0)