Skip to content

Commit bcbd069

Browse files
shijujose4bp3tk0v
authored andcommitted
EDAC: Add a Error Check Scrub control feature
Add an Error Check Scrub (ECS) control to manage a memory device's ECS feature. The ECS is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts. The DDR5 device contains a number of memory media Field Replaceable Units (FRU) per device. The DDR5 ECS feature and thus the ECS control driver supports configuring the ECS parameters per FRU. Memory devices support the ECS feature register with the EDAC device driver, which retrieves the ECS descriptor from the EDAC ECS driver. This driver exposes sysfs ECS control attributes to userspace via /sys/bus/edac/devices/<dev-name>/ecs_fruX/. The common sysfs ECS control interface abstracts the control of an arbitrary ECS functionality to a common set of functions. Support for the ECS feature is added separately because the control attributes of the DDR5 ECS feature differ from those of the scrub feature. The sysfs ECS attribute nodes are only present if the client driver has implemented the corresponding attribute callback function and passed the necessary operations to the EDAC RAS feature driver during registration. [ bp: Massage, fixup edac_dev_register() retvals. ] Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Fan Ni <fan.ni@samsung.com> Tested-by: Fan Ni <fan.ni@samsung.com> Link: https://lore.kernel.org/r/20250212143654.1893-4-shiju.jose@huawei.com
1 parent f90b738 commit bcbd069

File tree

7 files changed

+356
-2
lines changed

7 files changed

+356
-2
lines changed
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
What: /sys/bus/edac/devices/<dev-name>/ecs_fruX
2+
Date: March 2025
3+
KernelVersion: 6.15
4+
Contact: linux-edac@vger.kernel.org
5+
Description:
6+
The sysfs EDAC bus devices /<dev-name>/ecs_fruX subdirectory
7+
pertains to the memory media ECS (Error Check Scrub) control
8+
feature, where <dev-name> directory corresponds to a device
9+
registered with the EDAC device driver for the ECS feature.
10+
/ecs_fruX belongs to the media FRUs (Field Replaceable Unit)
11+
under the memory device.
12+
13+
The sysfs ECS attr nodes are only present if the parent
14+
driver has implemented the corresponding attr callback
15+
function and provided the necessary operations to the EDAC
16+
device driver during registration.
17+
18+
What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/log_entry_type
19+
Date: March 2025
20+
KernelVersion: 6.15
21+
Contact: linux-edac@vger.kernel.org
22+
Description:
23+
(RW) The log entry type of how the DDR5 ECS log is reported.
24+
25+
- 0 - per DRAM.
26+
27+
- 1 - per memory media FRU.
28+
29+
- All other values are reserved.
30+
31+
What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/mode
32+
Date: March 2025
33+
KernelVersion: 6.15
34+
Contact: linux-edac@vger.kernel.org
35+
Description:
36+
(RW) The mode of how the DDR5 ECS counts the errors.
37+
Error count is tracked based on two different modes
38+
selected by DDR5 ECS Control Feature - Codeword mode and
39+
Row Count mode. If the ECS is under Codeword mode, then
40+
the error count increments each time a codeword with check
41+
bit errors is detected. If the ECS is under Row Count mode,
42+
then the error counter increments each time a row with
43+
check bit errors is detected.
44+
45+
- 0 - ECS counts rows in the memory media that have ECC errors.
46+
47+
- 1 - ECS counts codewords with errors, specifically, it counts
48+
the number of ECC-detected errors in the memory media.
49+
50+
- All other values are reserved.
51+
52+
What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/reset
53+
Date: March 2025
54+
KernelVersion: 6.15
55+
Contact: linux-edac@vger.kernel.org
56+
Description:
57+
(WO) ECS reset ECC counter.
58+
59+
- 1 - reset ECC counter to the default value.
60+
61+
- All other values are reserved.
62+
63+
What: /sys/bus/edac/devices/<dev-name>/ecs_fruX/threshold
64+
Date: March 2025
65+
KernelVersion: 6.15
66+
Contact: linux-edac@vger.kernel.org
67+
Description:
68+
(RW) DDR5 ECS threshold count per gigabits of memory cells.
69+
The ECS error count is subject to the ECS Threshold count
70+
per Gbit, which masks error counts less than the Threshold.
71+
72+
Supported values are 256, 1024 and 4096.
73+
74+
All other values are reserved.

Documentation/edac/scrub.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -262,3 +262,5 @@ sysfs
262262

263263
Sysfs files are documented in
264264
`Documentation/ABI/testing/sysfs-edac-scrub`
265+
266+
`Documentation/ABI/testing/sysfs-edac-ecs`

drivers/edac/Kconfig

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,15 @@ config EDAC_SCRUB
8484
into a unified set of functions.
8585
Say 'y/n' to enable/disable EDAC scrub feature.
8686

87+
config EDAC_ECS
88+
bool "EDAC ECS (Error Check Scrub) feature"
89+
help
90+
The EDAC ECS feature is optional and is designed to control on-die
91+
error check scrub (e.g., DDR5 ECS) in the system. The common sysfs
92+
ECS interface abstracts the control of various ECS functionalities
93+
into a unified set of functions.
94+
Say 'y/n' to enable/disable EDAC ECS feature.
95+
8796
config EDAC_AMD64
8897
tristate "AMD64 (Opteron, Athlon64)"
8998
depends on AMD_NB && EDAC_DECODE_MCE

drivers/edac/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ edac_core-y += edac_module.o edac_device_sysfs.o wq.o
1313

1414
edac_core-$(CONFIG_EDAC_DEBUG) += debugfs.o
1515
edac_core-$(CONFIG_EDAC_SCRUB) += scrub.o
16+
edac_core-$(CONFIG_EDAC_ECS) += ecs.o
1617

1718
ifdef CONFIG_PCI
1819
edac_core-y += edac_pci.o edac_pci_sysfs.o

drivers/edac/ecs.c

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
/*
3+
* The generic ECS driver is designed to support control of on-die error
4+
* check scrub (e.g., DDR5 ECS). The common sysfs ECS interface abstracts
5+
* the control of various ECS functionalities into a unified set of functions.
6+
*
7+
* Copyright (c) 2024-2025 HiSilicon Limited.
8+
*/
9+
10+
#include <linux/edac.h>
11+
12+
#define EDAC_ECS_FRU_NAME "ecs_fru"
13+
14+
enum edac_ecs_attributes {
15+
ECS_LOG_ENTRY_TYPE,
16+
ECS_MODE,
17+
ECS_RESET,
18+
ECS_THRESHOLD,
19+
ECS_MAX_ATTRS
20+
};
21+
22+
struct edac_ecs_dev_attr {
23+
struct device_attribute dev_attr;
24+
int fru_id;
25+
};
26+
27+
struct edac_ecs_fru_context {
28+
char name[EDAC_FEAT_NAME_LEN];
29+
struct edac_ecs_dev_attr dev_attr[ECS_MAX_ATTRS];
30+
struct attribute *ecs_attrs[ECS_MAX_ATTRS + 1];
31+
struct attribute_group group;
32+
};
33+
34+
struct edac_ecs_context {
35+
u16 num_media_frus;
36+
struct edac_ecs_fru_context *fru_ctxs;
37+
};
38+
39+
#define TO_ECS_DEV_ATTR(_dev_attr) \
40+
container_of(_dev_attr, struct edac_ecs_dev_attr, dev_attr)
41+
42+
#define EDAC_ECS_ATTR_SHOW(attrib, cb, type, format) \
43+
static ssize_t attrib##_show(struct device *ras_feat_dev, \
44+
struct device_attribute *attr, char *buf) \
45+
{ \
46+
struct edac_ecs_dev_attr *dev_attr = TO_ECS_DEV_ATTR(attr); \
47+
struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); \
48+
const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; \
49+
type data; \
50+
int ret; \
51+
\
52+
ret = ops->cb(ras_feat_dev->parent, ctx->ecs.private, \
53+
dev_attr->fru_id, &data); \
54+
if (ret) \
55+
return ret; \
56+
\
57+
return sysfs_emit(buf, format, data); \
58+
}
59+
60+
EDAC_ECS_ATTR_SHOW(log_entry_type, get_log_entry_type, u32, "%u\n")
61+
EDAC_ECS_ATTR_SHOW(mode, get_mode, u32, "%u\n")
62+
EDAC_ECS_ATTR_SHOW(threshold, get_threshold, u32, "%u\n")
63+
64+
#define EDAC_ECS_ATTR_STORE(attrib, cb, type, conv_func) \
65+
static ssize_t attrib##_store(struct device *ras_feat_dev, \
66+
struct device_attribute *attr, \
67+
const char *buf, size_t len) \
68+
{ \
69+
struct edac_ecs_dev_attr *dev_attr = TO_ECS_DEV_ATTR(attr); \
70+
struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev); \
71+
const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops; \
72+
type data; \
73+
int ret; \
74+
\
75+
ret = conv_func(buf, 0, &data); \
76+
if (ret < 0) \
77+
return ret; \
78+
\
79+
ret = ops->cb(ras_feat_dev->parent, ctx->ecs.private, \
80+
dev_attr->fru_id, data); \
81+
if (ret) \
82+
return ret; \
83+
\
84+
return len; \
85+
}
86+
87+
EDAC_ECS_ATTR_STORE(log_entry_type, set_log_entry_type, unsigned long, kstrtoul)
88+
EDAC_ECS_ATTR_STORE(mode, set_mode, unsigned long, kstrtoul)
89+
EDAC_ECS_ATTR_STORE(reset, reset, unsigned long, kstrtoul)
90+
EDAC_ECS_ATTR_STORE(threshold, set_threshold, unsigned long, kstrtoul)
91+
92+
static umode_t ecs_attr_visible(struct kobject *kobj, struct attribute *a, int attr_id)
93+
{
94+
struct device *ras_feat_dev = kobj_to_dev(kobj);
95+
struct edac_dev_feat_ctx *ctx = dev_get_drvdata(ras_feat_dev);
96+
const struct edac_ecs_ops *ops = ctx->ecs.ecs_ops;
97+
98+
switch (attr_id) {
99+
case ECS_LOG_ENTRY_TYPE:
100+
if (ops->get_log_entry_type) {
101+
if (ops->set_log_entry_type)
102+
return a->mode;
103+
else
104+
return 0444;
105+
}
106+
break;
107+
case ECS_MODE:
108+
if (ops->get_mode) {
109+
if (ops->set_mode)
110+
return a->mode;
111+
else
112+
return 0444;
113+
}
114+
break;
115+
case ECS_RESET:
116+
if (ops->reset)
117+
return a->mode;
118+
break;
119+
case ECS_THRESHOLD:
120+
if (ops->get_threshold) {
121+
if (ops->set_threshold)
122+
return a->mode;
123+
else
124+
return 0444;
125+
}
126+
break;
127+
default:
128+
break;
129+
}
130+
131+
return 0;
132+
}
133+
134+
#define EDAC_ECS_ATTR_RO(_name, _fru_id) \
135+
((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RO(_name), \
136+
.fru_id = _fru_id })
137+
138+
#define EDAC_ECS_ATTR_WO(_name, _fru_id) \
139+
((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_WO(_name), \
140+
.fru_id = _fru_id })
141+
142+
#define EDAC_ECS_ATTR_RW(_name, _fru_id) \
143+
((struct edac_ecs_dev_attr) { .dev_attr = __ATTR_RW(_name), \
144+
.fru_id = _fru_id })
145+
146+
static int ecs_create_desc(struct device *ecs_dev, const struct attribute_group **attr_groups,
147+
u16 num_media_frus)
148+
{
149+
struct edac_ecs_context *ecs_ctx;
150+
u32 fru;
151+
152+
ecs_ctx = devm_kzalloc(ecs_dev, sizeof(*ecs_ctx), GFP_KERNEL);
153+
if (!ecs_ctx)
154+
return -ENOMEM;
155+
156+
ecs_ctx->num_media_frus = num_media_frus;
157+
ecs_ctx->fru_ctxs = devm_kcalloc(ecs_dev, num_media_frus,
158+
sizeof(*ecs_ctx->fru_ctxs),
159+
GFP_KERNEL);
160+
if (!ecs_ctx->fru_ctxs)
161+
return -ENOMEM;
162+
163+
for (fru = 0; fru < num_media_frus; fru++) {
164+
struct edac_ecs_fru_context *fru_ctx = &ecs_ctx->fru_ctxs[fru];
165+
struct attribute_group *group = &fru_ctx->group;
166+
int i;
167+
168+
fru_ctx->dev_attr[ECS_LOG_ENTRY_TYPE] = EDAC_ECS_ATTR_RW(log_entry_type, fru);
169+
fru_ctx->dev_attr[ECS_MODE] = EDAC_ECS_ATTR_RW(mode, fru);
170+
fru_ctx->dev_attr[ECS_RESET] = EDAC_ECS_ATTR_WO(reset, fru);
171+
fru_ctx->dev_attr[ECS_THRESHOLD] = EDAC_ECS_ATTR_RW(threshold, fru);
172+
173+
for (i = 0; i < ECS_MAX_ATTRS; i++)
174+
fru_ctx->ecs_attrs[i] = &fru_ctx->dev_attr[i].dev_attr.attr;
175+
176+
sprintf(fru_ctx->name, "%s%d", EDAC_ECS_FRU_NAME, fru);
177+
group->name = fru_ctx->name;
178+
group->attrs = fru_ctx->ecs_attrs;
179+
group->is_visible = ecs_attr_visible;
180+
181+
attr_groups[fru] = group;
182+
}
183+
184+
return 0;
185+
}
186+
187+
/**
188+
* edac_ecs_get_desc - get EDAC ECS descriptors
189+
* @ecs_dev: client device, supports ECS feature
190+
* @attr_groups: pointer to attribute group container
191+
* @num_media_frus: number of media FRUs in the device
192+
*
193+
* Return:
194+
* * %0 - Success.
195+
* * %-EINVAL - Invalid parameters passed.
196+
* * %-ENOMEM - Dynamic memory allocation failed.
197+
*/
198+
int edac_ecs_get_desc(struct device *ecs_dev,
199+
const struct attribute_group **attr_groups, u16 num_media_frus)
200+
{
201+
if (!ecs_dev || !attr_groups || !num_media_frus)
202+
return -EINVAL;
203+
204+
return ecs_create_desc(ecs_dev, attr_groups, num_media_frus);
205+
}

drivers/edac/edac_device.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -628,6 +628,9 @@ int edac_dev_register(struct device *parent, char *name,
628628
attr_gcnt++;
629629
scrub_cnt++;
630630
break;
631+
case RAS_FEAT_ECS:
632+
attr_gcnt += ras_features[feat].ecs_info.num_media_frus;
633+
break;
631634
default:
632635
return -EINVAL;
633636
}
@@ -669,6 +672,22 @@ int edac_dev_register(struct device *parent, char *name,
669672
scrub_cnt++;
670673
attr_gcnt++;
671674
break;
675+
case RAS_FEAT_ECS:
676+
if (!ras_features->ecs_ops) {
677+
ret = -EINVAL;
678+
goto data_mem_free;
679+
}
680+
681+
dev_data = &ctx->ecs;
682+
dev_data->ecs_ops = ras_features->ecs_ops;
683+
dev_data->private = ras_features->ctx;
684+
ret = edac_ecs_get_desc(parent, &ras_attr_groups[attr_gcnt],
685+
ras_features->ecs_info.num_media_frus);
686+
if (ret)
687+
goto data_mem_free;
688+
689+
attr_gcnt += ras_features->ecs_info.num_media_frus;
690+
break;
672691
default:
673692
ret = -EINVAL;
674693
goto data_mem_free;

0 commit comments

Comments
 (0)