Skip to content

Commit 75f4d93

Browse files
committed
Merge branch 'for-6.9/cxl-einj' into for-6.9/cxl
Pick up support for injecting errors via ACPI EINJ into the CXL protocol for v6.9.
2 parents d5c0078 + a0563f5 commit 75f4d93

File tree

10 files changed

+397
-21
lines changed

10 files changed

+397
-21
lines changed

Documentation/ABI/testing/debugfs-cxl

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,33 @@ Description:
3333
device cannot clear poison from the address, -ENXIO is returned.
3434
The clear_poison attribute is only visible for devices
3535
supporting the capability.
36+
37+
What: /sys/kernel/debug/cxl/einj_types
38+
Date: January, 2024
39+
KernelVersion: v6.9
40+
Contact: linux-cxl@vger.kernel.org
41+
Description:
42+
(RO) Prints the CXL protocol error types made available by
43+
the platform in the format "0x<error number> <error type>".
44+
The possible error types are (as of ACPI v6.5):
45+
0x1000 CXL.cache Protocol Correctable
46+
0x2000 CXL.cache Protocol Uncorrectable non-fatal
47+
0x4000 CXL.cache Protocol Uncorrectable fatal
48+
0x8000 CXL.mem Protocol Correctable
49+
0x10000 CXL.mem Protocol Uncorrectable non-fatal
50+
0x20000 CXL.mem Protocol Uncorrectable fatal
51+
52+
The <error number> can be written to einj_inject to inject
53+
<error type> into a chosen dport.
54+
55+
What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
56+
Date: January, 2024
57+
KernelVersion: v6.9
58+
Contact: linux-cxl@vger.kernel.org
59+
Description:
60+
(WO) Writing an integer to this file injects the corresponding
61+
CXL protocol error into $dport_dev ($dport_dev will be a device
62+
name from /sys/bus/pci/devices). The integer to type mapping for
63+
injection can be found by reading from einj_types. If the dport
64+
was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
65+
a CXL 2.0 error is injected.

Documentation/firmware-guide/acpi/apei/einj.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ configuration::
3232
CONFIG_ACPI_APEI
3333
CONFIG_ACPI_APEI_EINJ
3434

35+
...and to (optionally) enable CXL protocol error injection set::
36+
37+
CONFIG_ACPI_APEI_EINJ_CXL
38+
3539
The EINJ user interface is in <debugfs mount point>/apei/einj.
3640

3741
The following files belong to it:
@@ -118,6 +122,24 @@ The following files belong to it:
118122
this actually works depends on what operations the BIOS actually
119123
includes in the trigger phase.
120124

125+
CXL error types are supported from ACPI 6.5 onwards (given a CXL port
126+
is present). The EINJ user interface for CXL error types is at
127+
<debugfs mount point>/cxl. The following files belong to it:
128+
129+
- einj_types:
130+
131+
Provides the same functionality as available_error_types above, but
132+
for CXL error types
133+
134+
- $dport_dev/einj_inject:
135+
136+
Injects a CXL error type into the CXL port represented by $dport_dev,
137+
where $dport_dev is the name of the CXL port (usually a PCIe device name).
138+
Error injections targeting a CXL 2.0+ port can use the legacy interface
139+
under <debugfs mount point>/apei/einj, while CXL 1.1/1.0 port injections
140+
must use this file.
141+
142+
121143
BIOS versions based on the ACPI 4.0 specification have limited options
122144
in controlling where the errors are injected. Your BIOS may support an
123145
extension (enabled with the param_extension=1 module parameter, or boot
@@ -181,6 +203,18 @@ You should see something like this in dmesg::
181203
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
182204
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
183205

206+
A CXL error injection example with $dport_dev=0000:e0:01.1::
207+
208+
# cd /sys/kernel/debug/cxl/
209+
# ls
210+
0000:e0:01.1 0000:0c:00.0
211+
# cat einj_types # See which errors can be injected
212+
0x00008000 CXL.mem Protocol Correctable
213+
0x00010000 CXL.mem Protocol Uncorrectable non-fatal
214+
0x00020000 CXL.mem Protocol Uncorrectable fatal
215+
# cd 0000:e0:01.1 # Navigate to dport to inject into
216+
# echo 0x8000 > einj_inject # Inject error
217+
184218
Special notes for injection into SGX enclaves:
185219

186220
There may be a separate BIOS setup option to enable SGX injection.

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5289,6 +5289,7 @@ M: Dan Williams <dan.j.williams@intel.com>
52895289
L: linux-cxl@vger.kernel.org
52905290
S: Maintained
52915291
F: drivers/cxl/
5292+
F: include/linux/cxl-einj.h
52925293
F: include/linux/cxl-event.h
52935294
F: include/uapi/linux/cxl_mem.h
52945295
F: tools/testing/cxl/

drivers/acpi/apei/Kconfig

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,19 @@ config ACPI_APEI_EINJ
6060
mainly used for debugging and testing the other parts of
6161
APEI and some other RAS features.
6262

63+
config ACPI_APEI_EINJ_CXL
64+
bool "CXL Error INJection Support"
65+
default ACPI_APEI_EINJ
66+
depends on ACPI_APEI_EINJ
67+
depends on CXL_BUS && CXL_BUS <= ACPI_APEI_EINJ
68+
help
69+
Support for CXL protocol Error INJection through debugfs/cxl.
70+
Availability and which errors are supported is dependent on
71+
the host platform. Look to ACPI v6.5 section 18.6.4 and kernel
72+
EINJ documentation for more information.
73+
74+
If unsure say 'n'
75+
6376
config ACPI_APEI_ERST_DEBUG
6477
tristate "APEI Error Record Serialization Table (ERST) Debug Support"
6578
depends on ACPI_APEI

drivers/acpi/apei/Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
obj-$(CONFIG_ACPI_APEI) += apei.o
33
obj-$(CONFIG_ACPI_APEI_GHES) += ghes.o
44
obj-$(CONFIG_ACPI_APEI_EINJ) += einj.o
5+
einj-y := einj-core.o
6+
einj-$(CONFIG_ACPI_APEI_EINJ_CXL) += einj-cxl.o
57
obj-$(CONFIG_ACPI_APEI_ERST_DEBUG) += erst-dbg.o
68

79
apei-y := apei-base.o hest.o erst.o bert.o

drivers/acpi/apei/apei-internal.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,4 +130,22 @@ static inline u32 cper_estatus_len(struct acpi_hest_generic_status *estatus)
130130
}
131131

132132
int apei_osc_setup(void);
133+
134+
int einj_get_available_error_type(u32 *type);
135+
int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
136+
u64 param4);
137+
int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
138+
u64 param3, u64 param4);
139+
bool einj_is_cxl_error_type(u64 type);
140+
int einj_validate_error_type(u64 type);
141+
142+
#ifndef ACPI_EINJ_CXL_CACHE_CORRECTABLE
143+
#define ACPI_EINJ_CXL_CACHE_CORRECTABLE BIT(12)
144+
#define ACPI_EINJ_CXL_CACHE_UNCORRECTABLE BIT(13)
145+
#define ACPI_EINJ_CXL_CACHE_FATAL BIT(14)
146+
#define ACPI_EINJ_CXL_MEM_CORRECTABLE BIT(15)
147+
#define ACPI_EINJ_CXL_MEM_UNCORRECTABLE BIT(16)
148+
#define ACPI_EINJ_CXL_MEM_FATAL BIT(17)
149+
#endif
150+
133151
#endif

drivers/acpi/apei/einj.c renamed to drivers/acpi/apei/einj-core.c

Lines changed: 101 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
#include <linux/nmi.h>
2222
#include <linux/delay.h>
2323
#include <linux/mm.h>
24+
#include <linux/platform_device.h>
2425
#include <asm/unaligned.h>
2526

2627
#include "apei-internal.h"
@@ -36,6 +37,12 @@
3637
#define MEM_ERROR_MASK (ACPI_EINJ_MEMORY_CORRECTABLE | \
3738
ACPI_EINJ_MEMORY_UNCORRECTABLE | \
3839
ACPI_EINJ_MEMORY_FATAL)
40+
#define CXL_ERROR_MASK (ACPI_EINJ_CXL_CACHE_CORRECTABLE | \
41+
ACPI_EINJ_CXL_CACHE_UNCORRECTABLE | \
42+
ACPI_EINJ_CXL_CACHE_FATAL | \
43+
ACPI_EINJ_CXL_MEM_CORRECTABLE | \
44+
ACPI_EINJ_CXL_MEM_UNCORRECTABLE | \
45+
ACPI_EINJ_CXL_MEM_FATAL)
3946

4047
/*
4148
* ACPI version 5 provides a SET_ERROR_TYPE_WITH_ADDRESS action.
@@ -137,6 +144,11 @@ static struct apei_exec_ins_type einj_ins_type[] = {
137144
*/
138145
static DEFINE_MUTEX(einj_mutex);
139146

147+
/*
148+
* Exported APIs use this flag to exit early if einj_probe() failed.
149+
*/
150+
bool einj_initialized __ro_after_init;
151+
140152
static void *einj_param;
141153

142154
static void einj_exec_ctx_init(struct apei_exec_context *ctx)
@@ -160,7 +172,7 @@ static int __einj_get_available_error_type(u32 *type)
160172
}
161173

162174
/* Get error injection capabilities of the platform */
163-
static int einj_get_available_error_type(u32 *type)
175+
int einj_get_available_error_type(u32 *type)
164176
{
165177
int rc;
166178

@@ -530,8 +542,8 @@ static int __einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
530542
}
531543

532544
/* Inject the specified hardware error */
533-
static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
534-
u64 param3, u64 param4)
545+
int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2, u64 param3,
546+
u64 param4)
535547
{
536548
int rc;
537549
u64 base_addr, size;
@@ -554,8 +566,17 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
554566
if (type & ACPI5_VENDOR_BIT) {
555567
if (vendor_flags != SETWA_FLAGS_MEM)
556568
goto inject;
557-
} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM))
569+
} else if (!(type & MEM_ERROR_MASK) && !(flags & SETWA_FLAGS_MEM)) {
558570
goto inject;
571+
}
572+
573+
/*
574+
* Injections targeting a CXL 1.0/1.1 port have to be injected
575+
* via the einj_cxl_rch_error_inject() path as that does the proper
576+
* validation of the given RCRB base (MMIO) address.
577+
*/
578+
if (einj_is_cxl_error_type(type) && (flags & SETWA_FLAGS_MEM))
579+
return -EINVAL;
559580

560581
/*
561582
* Disallow crazy address masks that give BIOS leeway to pick
@@ -587,6 +608,21 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
587608
return rc;
588609
}
589610

611+
int einj_cxl_rch_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
612+
u64 param3, u64 param4)
613+
{
614+
int rc;
615+
616+
if (!(einj_is_cxl_error_type(type) && (flags & SETWA_FLAGS_MEM)))
617+
return -EINVAL;
618+
619+
mutex_lock(&einj_mutex);
620+
rc = __einj_error_inject(type, flags, param1, param2, param3, param4);
621+
mutex_unlock(&einj_mutex);
622+
623+
return rc;
624+
}
625+
590626
static u32 error_type;
591627
static u32 error_flags;
592628
static u64 error_param1;
@@ -607,12 +643,6 @@ static struct { u32 mask; const char *str; } const einj_error_type_string[] = {
607643
{ BIT(9), "Platform Correctable" },
608644
{ BIT(10), "Platform Uncorrectable non-fatal" },
609645
{ BIT(11), "Platform Uncorrectable fatal"},
610-
{ BIT(12), "CXL.cache Protocol Correctable" },
611-
{ BIT(13), "CXL.cache Protocol Uncorrectable non-fatal" },
612-
{ BIT(14), "CXL.cache Protocol Uncorrectable fatal" },
613-
{ BIT(15), "CXL.mem Protocol Correctable" },
614-
{ BIT(16), "CXL.mem Protocol Uncorrectable non-fatal" },
615-
{ BIT(17), "CXL.mem Protocol Uncorrectable fatal" },
616646
{ BIT(31), "Vendor Defined Error Types" },
617647
};
618648

@@ -641,22 +671,26 @@ static int error_type_get(void *data, u64 *val)
641671
return 0;
642672
}
643673

644-
static int error_type_set(void *data, u64 val)
674+
bool einj_is_cxl_error_type(u64 type)
645675
{
676+
return (type & CXL_ERROR_MASK) && (!(type & ACPI5_VENDOR_BIT));
677+
}
678+
679+
int einj_validate_error_type(u64 type)
680+
{
681+
u32 tval, vendor, available_error_type = 0;
646682
int rc;
647-
u32 available_error_type = 0;
648-
u32 tval, vendor;
649683

650684
/* Only low 32 bits for error type are valid */
651-
if (val & GENMASK_ULL(63, 32))
685+
if (type & GENMASK_ULL(63, 32))
652686
return -EINVAL;
653687

654688
/*
655689
* Vendor defined types have 0x80000000 bit set, and
656690
* are not enumerated by ACPI_EINJ_GET_ERROR_TYPE
657691
*/
658-
vendor = val & ACPI5_VENDOR_BIT;
659-
tval = val & 0x7fffffff;
692+
vendor = type & ACPI5_VENDOR_BIT;
693+
tval = type & GENMASK(30, 0);
660694

661695
/* Only one error type can be specified */
662696
if (tval & (tval - 1))
@@ -665,9 +699,21 @@ static int error_type_set(void *data, u64 val)
665699
rc = einj_get_available_error_type(&available_error_type);
666700
if (rc)
667701
return rc;
668-
if (!(val & available_error_type))
702+
if (!(type & available_error_type))
669703
return -EINVAL;
670704
}
705+
706+
return 0;
707+
}
708+
709+
static int error_type_set(void *data, u64 val)
710+
{
711+
int rc;
712+
713+
rc = einj_validate_error_type(val);
714+
if (rc)
715+
return rc;
716+
671717
error_type = val;
672718

673719
return 0;
@@ -703,21 +749,21 @@ static int einj_check_table(struct acpi_table_einj *einj_tab)
703749
return 0;
704750
}
705751

706-
static int __init einj_init(void)
752+
static int __init einj_probe(struct platform_device *pdev)
707753
{
708754
int rc;
709755
acpi_status status;
710756
struct apei_exec_context ctx;
711757

712758
if (acpi_disabled) {
713-
pr_info("ACPI disabled.\n");
759+
pr_debug("ACPI disabled.\n");
714760
return -ENODEV;
715761
}
716762

717763
status = acpi_get_table(ACPI_SIG_EINJ, 0,
718764
(struct acpi_table_header **)&einj_tab);
719765
if (status == AE_NOT_FOUND) {
720-
pr_warn("EINJ table not found.\n");
766+
pr_debug("EINJ table not found.\n");
721767
return -ENODEV;
722768
} else if (ACPI_FAILURE(status)) {
723769
pr_err("Failed to get EINJ table: %s\n",
@@ -805,7 +851,7 @@ static int __init einj_init(void)
805851
return rc;
806852
}
807853

808-
static void __exit einj_exit(void)
854+
static void __exit einj_remove(struct platform_device *pdev)
809855
{
810856
struct apei_exec_context ctx;
811857

@@ -826,6 +872,40 @@ static void __exit einj_exit(void)
826872
acpi_put_table((struct acpi_table_header *)einj_tab);
827873
}
828874

875+
static struct platform_device *einj_dev;
876+
static struct platform_driver einj_driver = {
877+
.remove_new = einj_remove,
878+
.driver = {
879+
.name = "acpi-einj",
880+
},
881+
};
882+
883+
static int __init einj_init(void)
884+
{
885+
struct platform_device_info einj_dev_info = {
886+
.name = "acpi-einj",
887+
.id = -1,
888+
};
889+
int rc;
890+
891+
einj_dev = platform_device_register_full(&einj_dev_info);
892+
if (IS_ERR(einj_dev))
893+
return PTR_ERR(einj_dev);
894+
895+
rc = platform_driver_probe(&einj_driver, einj_probe);
896+
einj_initialized = rc == 0;
897+
898+
return 0;
899+
}
900+
901+
static void __exit einj_exit(void)
902+
{
903+
if (einj_initialized)
904+
platform_driver_unregister(&einj_driver);
905+
906+
platform_device_del(einj_dev);
907+
}
908+
829909
module_init(einj_init);
830910
module_exit(einj_exit);
831911

0 commit comments

Comments
 (0)