Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--layout reports incorrect memory layout for EPYC 7xx2 CPU #64

Open
jhoblitt opened this issue May 17, 2022 · 3 comments
Open

--layout reports incorrect memory layout for EPYC 7xx2 CPU #64

jhoblitt opened this issue May 17, 2022 · 3 comments

Comments

@jhoblitt
Copy link

[jhoblitt@pillan06 rasdaemon]$ sudo ras-mc-ctl --layout
          +-----------------------------------------------------------------------------------------------+
          |                                              mc0                                              |
          |  csrow0   |  csrow1   |  csrow2   |  csrow3   |  csrow4   |  csrow5   |  csrow6   |  csrow7   |
----------+-----------------------------------------------------------------------------------------------+
channel7: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
channel6: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----------+-----------------------------------------------------------------------------------------------+
channel5: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
channel4: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----------+-----------------------------------------------------------------------------------------------+
channel3: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
channel2: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----------+-------------------------------------------------------------------------------------------------+
channel1: |     0 MB  |     0 MB  |  32767 MB  |  32767 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
channel0: |     0 MB  |     0 MB  |  32767 MB  |  32767 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----------+-------------------------------------------------------------------------------------------------+
[jhoblitt@pillan06 rasdaemon]$ free -g
              total        used        free      shared  buff/cache   available
Mem:            251          93          42           0         115         156
Swap:             0           0           0
[jhoblitt@pillan06 rasdaemon]$ sudo dmidecode --type 4
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x0029, DMI type 4, 48 bytes
Processor Information
	Socket Designation: CPU
	Type: Central Processor
	Family: Zen
	Manufacturer: Advanced Micro Devices, Inc.
	ID: 10 0F 83 00 FF FB 8B 17
	Signature: Family 23, Model 49, Stepping 0
	Flags:
		FPU (Floating-point unit on-chip)
		VME (Virtual mode extension)
		DE (Debugging extension)
		PSE (Page size extension)
		TSC (Time stamp counter)
		MSR (Model specific registers)
		PAE (Physical address extension)
		MCE (Machine check exception)
		CX8 (CMPXCHG8 instruction supported)
		APIC (On-chip APIC hardware supported)
		SEP (Fast system call)
		MTRR (Memory type range registers)
		PGE (Page global enable)
		MCA (Machine check architecture)
		CMOV (Conditional move instruction supported)
		PAT (Page attribute table)
		PSE-36 (36-bit page size extension)
		CLFSH (CLFLUSH instruction supported)
		MMX (MMX technology supported)
		FXSR (FXSAVE and FXSTOR instructions supported)
		SSE (Streaming SIMD extensions)
		SSE2 (Streaming SIMD extensions 2)
		HTT (Multi-threading)
	Version: AMD EPYC 7502P 32-Core Processor               
	Voltage: 1.1 V
	External Clock: 100 MHz
	Max Speed: 3350 MHz
	Current Speed: 2500 MHz
	Status: Populated, Enabled
	Upgrade: Socket SP3
	L1 Cache Handle: 0x0026
	L2 Cache Handle: 0x0027
	L3 Cache Handle: 0x0028
	Serial Number: Unknown
	Asset Tag: Unknown
	Part Number: Unknown
	Core Count: 32
	Core Enabled: 32
	Thread Count: 64
	Characteristics:
		64-bit capable
		Multi-Core
		Hardware Thread
		Execute Protection
		Enhanced Virtualization
		Power/Performance Control

[jhoblitt@pillan06 rasdaemon]$ sudo dmidecode --type 17
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 3.2.0 present.

Handle 0x002B, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x002A
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMA1
	Bank Locator: P0_Node0_Channel0_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x002D, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x002C
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMA2
	Bank Locator: P0_Node0_Channel0_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE3B4
	Asset Tag: DIMMA2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x0030, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x002F
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMB1
	Bank Locator: P0_Node0_Channel1_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x0032, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0031
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMB2
	Bank Locator: P0_Node0_Channel1_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE54C
	Asset Tag: DIMMB2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x0035, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0034
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMC1
	Bank Locator: P0_Node0_Channel2_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x0037, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0036
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMC2
	Bank Locator: P0_Node0_Channel2_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE495
	Asset Tag: DIMMC2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x003A, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0039
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMD1
	Bank Locator: P0_Node0_Channel3_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x003C, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x003B
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMD2
	Bank Locator: P0_Node0_Channel3_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE716
	Asset Tag: DIMMD2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x003F, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x003E
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMME1
	Bank Locator: P0_Node0_Channel4_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x0041, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0040
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMME2
	Bank Locator: P0_Node0_Channel4_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE698
	Asset Tag: DIMME2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x0044, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0043
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMF1
	Bank Locator: P0_Node0_Channel5_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x0046, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0045
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMF2
	Bank Locator: P0_Node0_Channel5_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948EFE3B8
	Asset Tag: DIMMF2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x0049, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x0048
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMG1
	Bank Locator: P0_Node0_Channel6_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x004B, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x004A
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMG2
	Bank Locator: P0_Node0_Channel6_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948F02273
	Asset Tag: DIMMG2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None

Handle 0x004E, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x004D
	Total Width: Unknown
	Data Width: Unknown
	Size: No Module Installed
	Form Factor: Unknown
	Set: None
	Locator: DIMMH1
	Bank Locator: P0_Node0_Channel7_Dimm0
	Type: Unknown
	Type Detail: Unknown

Handle 0x0050, DMI type 17, 84 bytes
Memory Device
	Array Handle: 0x0023
	Error Information Handle: 0x004F
	Total Width: 72 bits
	Data Width: 64 bits
	Size: 32 GB
	Form Factor: DIMM
	Set: None
	Locator: DIMMH2
	Bank Locator: P0_Node0_Channel7_Dimm1
	Type: DDR4
	Type Detail: Synchronous Registered (Buffered)
	Speed: 3200 MT/s
	Manufacturer: Samsung
	Serial Number: T0FN00014948F02271
	Asset Tag: DIMMH2_AssetTag (date:21/49)
	Part Number: M393A4K40EB3-CWE    
	Rank: 2
	Configured Memory Speed: 3200 MT/s
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
	Memory Technology: DRAM
	Memory Operating Mode Capability: Volatile memory
	Firmware Version: M393A4K40EB3-CWE    
	Module Manufacturer ID: Bank 1, Hex 0xCE
	Module Product ID: Unknown
	Memory Subsystem Controller Manufacturer ID: Unknown
	Memory Subsystem Controller Product ID: Unknown
	Non-Volatile Size: None
	Volatile Size: 32 GB
	Cache Size: None
	Logical Size: None
@jhoblitt
Copy link
Author

The layout as displayed by sysfs is wrong too, so this may be more a kernel issue than anything else.

@mchehab
Copy link
Owner

mchehab commented May 18, 2022

First of all, it is not every time that the BIOS information is correct. It is actually common that the same BIOS is used on different machines that have different motherboard silk screen and/or different numbers of sockets. So, neither the Kernel nor rasdaemon relies on it.

If the BIOS is reliable enough, one could use:

ras-mc-ctl --guess-labels
memory stick 'ChannelA-DIMM0' is located at 'BANK 0'
memory stick 'ChannelB-DIMM0' is located at 'BANK 2'

With such information (that comes from DMI decoding), it can update the layout that are inside the labels/ directory.

@jhoblitt
Copy link
Author

--guess-labels looks promising but it seems to output to stdout only. Is there a way to machine generate the label db?

[root@pillan06 ~]# ras-mc-ctl --guess-labels
memory stick 'DIMMA1' is located at 'P0_Node0_Channel0_Dimm0'
memory stick 'DIMMA2' is located at 'P0_Node0_Channel0_Dimm1'
memory stick 'DIMMB1' is located at 'P0_Node0_Channel1_Dimm0'
memory stick 'DIMMB2' is located at 'P0_Node0_Channel1_Dimm1'
memory stick 'DIMMC1' is located at 'P0_Node0_Channel2_Dimm0'
memory stick 'DIMMC2' is located at 'P0_Node0_Channel2_Dimm1'
memory stick 'DIMMD1' is located at 'P0_Node0_Channel3_Dimm0'
memory stick 'DIMMD2' is located at 'P0_Node0_Channel3_Dimm1'
memory stick 'DIMME1' is located at 'P0_Node0_Channel4_Dimm0'
memory stick 'DIMME2' is located at 'P0_Node0_Channel4_Dimm1'
memory stick 'DIMMF1' is located at 'P0_Node0_Channel5_Dimm0'
memory stick 'DIMMF2' is located at 'P0_Node0_Channel5_Dimm1'
memory stick 'DIMMG1' is located at 'P0_Node0_Channel6_Dimm0'
memory stick 'DIMMG2' is located at 'P0_Node0_Channel6_Dimm1'
memory stick 'DIMMH1' is located at 'P0_Node0_Channel7_Dimm0'
memory stick 'DIMMH2' is located at 'P0_Node0_Channel7_Dimm1'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants