Skip to content

Commit b040240

Browse files
committed
Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov: - Add a FRU (Field Replaceable Unit) memory poison manager which collects and manages previously encountered hw errors in order to save them to persistent storage across reboots. Previously recorded errors are "replayed" upon reboot in order to poison memory which has caused said errors in the past. The main use case is stacked, on-chip memory which cannot simply be replaced so poisoning faulty areas of it and thus making them inaccessible is the only strategy to prolong its lifetime. - Add an AMD address translation library glue which converts the reported addresses of hw errors into system physical addresses in order to be used by other subsystems like memory failure, for example. Add support for MI300 accelerators to that library. - igen6: Add support for Alder Lake-N SoC - i10nm: Add Grand Ridge support - The usual fixlets and cleanups * tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Convert to platform remove callback returning void RAS/AMD/FMPM: Fix off by one when unwinding on error RAS/AMD/FMPM: Add debugfs interface to print record entries RAS/AMD/FMPM: Save SPA values RAS: Export helper to get ras_debugfs_dir RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2() RAS: Introduce a FRU memory poison manager RAS/AMD/ATL: Add MI300 row retirement support Documentation: Move RAS section to admin-guide EDAC/versal: Make the bit position of injected errors configurable EDAC/i10nm: Add Intel Grand Ridge micro-server support EDAC/igen6: Add one more Intel Alder Lake-N SoC support RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300() RAS/AMD/ATL: Add MI300 support Documentation: RAS: Add index and address translation section EDAC/amd64: Use new AMD Address Translation Library RAS: Introduce AMD Address Translation Library EDAC/synopsys: Convert to devm_platform_ioremap_resource()
2 parents 1f75619 + af65545 commit b040240

33 files changed

+5165
-336
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Address translation
4+
===================
5+
6+
x86 AMD
7+
-------
8+
9+
Zen-based AMD systems include a Data Fabric that manages the layout of
10+
physical memory. Devices attached to the Fabric, like memory controllers,
11+
I/O, etc., may not have a complete view of the system physical memory map.
12+
These devices may provide a "normalized", i.e. device physical, address
13+
when reporting memory errors. Normalized addresses must be translated to
14+
a system physical address for the kernel to action on the memory.
15+
16+
AMD Address Translation Library (CONFIG_AMD_ATL) provides translation for
17+
this case.
18+
19+
Glossary of acronyms used in address translation for Zen-based systems
20+
21+
* CCM = Cache Coherent Moderator
22+
* COD = Cluster-on-Die
23+
* COH_ST = Coherent Station
24+
* DF = Data Fabric

Documentation/RAS/ras.rst renamed to Documentation/admin-guide/RAS/error-decoding.rst

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,10 @@
11
.. SPDX-License-Identifier: GPL-2.0
22
3-
Reliability, Availability and Serviceability features
4-
=====================================================
5-
6-
This documents different aspects of the RAS functionality present in the
7-
kernel.
8-
93
Error decoding
10-
---------------
4+
==============
115

12-
* x86
6+
x86
7+
---
138

149
Error decoding on AMD systems should be done using the rasdaemon tool:
1510
https://github.com/mchehab/rasdaemon/
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
.. toctree::
3+
:maxdepth: 2
4+
5+
main
6+
error-decoding
7+
address-translation

Documentation/admin-guide/ras.rst renamed to Documentation/admin-guide/RAS/main.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
1+
.. SPDX-License-Identifier: GPL-2.0
12
.. include:: <isonum.txt>
23

3-
============================================
4-
Reliability, Availability and Serviceability
5-
============================================
4+
==================================================
5+
Reliability, Availability and Serviceability (RAS)
6+
==================================================
7+
8+
This documents different aspects of the RAS functionality present in the
9+
kernel.
610

711
RAS concepts
812
************

Documentation/admin-guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ configure specific aspects of kernel behavior to your liking.
122122
pmf
123123
pnp
124124
rapidio
125-
ras
125+
RAS/index
126126
rtc
127127
serial-console
128128
svga

Documentation/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,6 @@ to ReStructured Text format, or are simply too old.
113113
:maxdepth: 1
114114

115115
staging/index
116-
RAS/ras
117116

118117

119118
Translations

MAINTAINERS

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -897,6 +897,12 @@ Q: https://patchwork.kernel.org/project/linux-rdma/list/
897897
F: drivers/infiniband/hw/efa/
898898
F: include/uapi/rdma/efa-abi.h
899899

900+
AMD ADDRESS TRANSLATION LIBRARY (ATL)
901+
M: Yazen Ghannam <Yazen.Ghannam@amd.com>
902+
L: linux-edac@vger.kernel.org
903+
S: Supported
904+
F: drivers/ras/amd/atl/*
905+
900906
AMD AXI W1 DRIVER
901907
M: Kris Chaplin <kris.chaplin@amd.com>
902908
R: Thomas Delev <thomas.delev@amd.com>
@@ -7583,7 +7589,6 @@ R: Robert Richter <rric@kernel.org>
75837589
L: linux-edac@vger.kernel.org
75847590
S: Supported
75857591
T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
7586-
F: Documentation/admin-guide/ras.rst
75877592
F: Documentation/driver-api/edac.rst
75887593
F: drivers/edac/
75897594
F: include/linux/edac.h
@@ -18379,11 +18384,17 @@ M: Tony Luck <tony.luck@intel.com>
1837918384
M: Borislav Petkov <bp@alien8.de>
1838018385
L: linux-edac@vger.kernel.org
1838118386
S: Maintained
18382-
F: Documentation/admin-guide/ras.rst
18387+
F: Documentation/admin-guide/RAS
1838318388
F: drivers/ras/
1838418389
F: include/linux/ras.h
1838518390
F: include/ras/ras_event.h
1838618391

18392+
RAS FRU MEMORY POISON MANAGER (FMPM)
18393+
M: Yazen Ghannam <Yazen.Ghannam@amd.com>
18394+
L: linux-edac@vger.kernel.org
18395+
S: Maintained
18396+
F: drivers/ras/amd/fmpm.c
18397+
1838718398
RC-CORE / LIRC FRAMEWORK
1838818399
M: Sean Young <sean@mess.org>
1838918400
L: linux-media@vger.kernel.org

arch/x86/include/asm/topology.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ static inline bool topology_is_primary_thread(unsigned int cpu)
224224
static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
225225
static inline int topology_max_smt_threads(void) { return 1; }
226226
static inline bool topology_is_primary_thread(unsigned int cpu) { return true; }
227-
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 0; };
227+
static inline unsigned int topology_amd_nodes_per_pkg(void) { return 1; }
228228
#endif /* !CONFIG_SMP */
229229

230230
static inline void arch_fix_phys_package_id(int num, u32 slot)

drivers/edac/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ config EDAC_GHES
7878
config EDAC_AMD64
7979
tristate "AMD64 (Opteron, Athlon64)"
8080
depends on AMD_NB && EDAC_DECODE_MCE
81+
imply AMD_ATL
8182
help
8283
Support for error detection and correction of DRAM ECC errors on
8384
the AMD64 families (>= K8) of memory controllers.

0 commit comments

Comments
 (0)