Skip to content

Commit 636f64d

Browse files
committed
Merge tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Borislav Petkov: - More noinstr fixes - Add an erratum workaround for Intel CPUs which, in certain circumstances, end up consuming an unrelated uncorrectable memory error when using fast string copy insns - Remove the MCE tolerance level control as it is not really needed or used anymore * tag 'ras_core_for_v5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Remove the tolerance level control x86/mce: Work around an erratum on fast string copy instructions x86/mce: Use arch atomic and bit helpers
2 parents ebcb577 + 7f1b8e0 commit 636f64d

File tree

7 files changed

+177
-132
lines changed

7 files changed

+177
-132
lines changed

Documentation/ABI/removed/sysfs-mce

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
What: /sys/devices/system/machinecheck/machinecheckX/tolerant
2+
Contact: Borislav Petkov <bp@suse.de>
3+
Date: Dec, 2021
4+
Description:
5+
Unused and obsolete after the advent of recoverable machine
6+
checks (see last sentence below) and those are present since
7+
2010 (Nehalem).
8+
9+
Original description:
10+
11+
The entries appear for each CPU, but they are truly shared
12+
between all CPUs.
13+
14+
Tolerance level. When a machine check exception occurs for a
15+
non corrected machine check the kernel can take different
16+
actions.
17+
18+
Since machine check exceptions can happen any time it is
19+
sometimes risky for the kernel to kill a process because it
20+
defies normal kernel locking rules. The tolerance level
21+
configures how hard the kernel tries to recover even at some
22+
risk of deadlock. Higher tolerant values trade potentially
23+
better uptime with the risk of a crash or even corruption
24+
(for tolerant >= 3).
25+
26+
== ===========================================================
27+
0 always panic on uncorrected errors, log corrected errors
28+
1 panic or SIGBUS on uncorrected errors, log corrected errors
29+
2 SIGBUS or log uncorrected errors, log corrected errors
30+
3 never panic or SIGBUS, log all errors (for testing only)
31+
== ===========================================================
32+
33+
Default: 1
34+
35+
Note this only makes a difference if the CPU allows recovery
36+
from a machine check exception. Current x86 CPUs generally
37+
do not.

Documentation/ABI/testing/sysfs-mce

Lines changed: 0 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -53,38 +53,6 @@ Description:
5353
(but some corrected errors might be still reported
5454
in other ways)
5555

56-
What: /sys/devices/system/machinecheck/machinecheckX/tolerant
57-
Contact: Andi Kleen <ak@linux.intel.com>
58-
Date: Feb, 2007
59-
Description:
60-
The entries appear for each CPU, but they are truly shared
61-
between all CPUs.
62-
63-
Tolerance level. When a machine check exception occurs for a
64-
non corrected machine check the kernel can take different
65-
actions.
66-
67-
Since machine check exceptions can happen any time it is
68-
sometimes risky for the kernel to kill a process because it
69-
defies normal kernel locking rules. The tolerance level
70-
configures how hard the kernel tries to recover even at some
71-
risk of deadlock. Higher tolerant values trade potentially
72-
better uptime with the risk of a crash or even corruption
73-
(for tolerant >= 3).
74-
75-
== ===========================================================
76-
0 always panic on uncorrected errors, log corrected errors
77-
1 panic or SIGBUS on uncorrected errors, log corrected errors
78-
2 SIGBUS or log uncorrected errors, log corrected errors
79-
3 never panic or SIGBUS, log all errors (for testing only)
80-
== ===========================================================
81-
82-
Default: 1
83-
84-
Note this only makes a difference if the CPU allows recovery
85-
from a machine check exception. Current x86 CPUs generally
86-
do not.
87-
8856
What: /sys/devices/system/machinecheck/machinecheckX/trigger
8957
Contact: Andi Kleen <ak@linux.intel.com>
9058
Date: Feb, 2007

Documentation/vm/hwpoison.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,6 @@ There are two (actually three) modes memory failure recovery can be in:
6060

6161
vm.memory_failure_recovery sysctl set to zero:
6262
All memory failures cause a panic. Do not attempt recovery.
63-
(on x86 this can be also affected by the tolerant level of the
64-
MCE subsystem)
6563

6664
early kill
6765
(can be controlled globally and per process)

Documentation/x86/x86_64/boot-options.rst

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,7 @@ Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
4747
in a reboot. On Intel systems it is enabled by default.
4848
mce=nobootlog
4949
Disable boot machine check logging.
50-
mce=tolerancelevel[,monarchtimeout] (number,number)
51-
tolerance levels:
52-
0: always panic on uncorrected errors, log corrected errors
53-
1: panic or SIGBUS on uncorrected errors, log corrected errors
54-
2: SIGBUS or log uncorrected errors, log corrected errors
55-
3: never panic or SIGBUS, log all errors (for testing only)
56-
Default is 1
57-
Can be also set using sysfs which is preferable.
50+
mce=monarchtimeout (number)
5851
monarchtimeout:
5952
Sets the time in us to wait for other CPUs on machine checks. 0
6053
to disable.

0 commit comments

Comments
 (0)