mce detected memory error

Hi Experts,
I found some interesting issue in /var/log/messages. I installed SLES 12 SP3 for SAP on Lenovo X3850 X6.

2019-10-17T18:02:15.441498+07:00 hostname mcelog[5163]: Running trigger `socket-memory-error-trigger’
2019-10-17T18:02:15.441503+07:00 hostname mcelog[5163]: Hardware event. This is not a software error.
2019-10-17T18:02:15.441548+07:00 hostname mcelog[5163]: Corrected error
2019-10-17T18:02:15.441575+07:00 hostname mcelog[5163]: Transaction: Memory read error
2019-10-17T18:02:15.441579+07:00 hostname mcelog[5163]: MemCtrl: Corrected memory read error

After surfing some solution, I found this (https://www.suse.com/support/kb/doc/?id=7022118). I add that kernel options (mce=ignore_ce).
And then this error/invalid appear on

2019-10-29T00:33:32.041734+07:00 hostname kernel: [ 9.938351] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
2019-10-29T00:33:32.041973+07:00 hostname kernel: [ 15.612800] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 0.
2019-10-29T00:33:32.041974+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 1.
2019-10-29T00:33:32.041975+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
2019-10-29T00:33:32.041980+07:00 hostname kernel: [ 15.612802] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
2019-10-29T00:33:32.041981+07:00 hostname kernel: [ 15.612803] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
2019-10-29T00:33:32.041982+07:00 hostname kernel: [ 15.612804] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
2019-10-29T00:33:32.041983+07:00 hostname kernel: [ 15.612805] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
2019-10-29T00:33:32.041984+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
2019-10-29T00:33:32.041985+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.

How to solve this? :confused:

Thanks and regards

Thomson Malau

[QUOTE=thomsonmalau;58737]Hi Experts,
I found some interesting issue in /var/log/messages. I installed SLES 12 SP3 for SAP on Lenovo X3850 X6.

2019-10-17T18:02:15.441498+07:00 hostname mcelog[5163]: Running trigger `socket-memory-error-trigger’
2019-10-17T18:02:15.441503+07:00 hostname mcelog[5163]: Hardware event. This is not a software error.
2019-10-17T18:02:15.441548+07:00 hostname mcelog[5163]: Corrected error
2019-10-17T18:02:15.441575+07:00 hostname mcelog[5163]: Transaction: Memory read error
2019-10-17T18:02:15.441579+07:00 hostname mcelog[5163]: MemCtrl: Corrected memory read error

After surfing some solution, I found this (https://www.suse.com/support/kb/doc/?id=7022118). I add that kernel options (mce=ignore_ce).
And then this error/invalid appear on

2019-10-29T00:33:32.041734+07:00 hostname kernel: [ 9.938351] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
2019-10-29T00:33:32.041973+07:00 hostname kernel: [ 15.612800] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 0.
2019-10-29T00:33:32.041974+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 1.
2019-10-29T00:33:32.041975+07:00 hostname kernel: [ 15.612801] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
2019-10-29T00:33:32.041980+07:00 hostname kernel: [ 15.612802] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
2019-10-29T00:33:32.041981+07:00 hostname kernel: [ 15.612803] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
2019-10-29T00:33:32.041982+07:00 hostname kernel: [ 15.612804] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
2019-10-29T00:33:32.041983+07:00 hostname kernel: [ 15.612805] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
2019-10-29T00:33:32.041984+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
2019-10-29T00:33:32.041985+07:00 hostname kernel: [ 15.612806] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.

How to solve this? :confused:

Thanks and regards

Thomson Malau[/QUOTE]
Hi
Are you sure it’s not a real hardware problem with the RAM? Have you tested the ram, reseated it?

Hi Malcolm,
Thanks for your reply.

[QUOTE=malcolmlewis;58738]Hi
Are you sure it’s not a real hardware problem with the RAM?[/QUOTE]

When we saw hardware log, there is no error about memory or other failure.

I already tested system with stress-test tools (stress-ng) and running well.

Thomson Malau

Hi
Then perhaps you can go in and tweak the trigger?

The configuration files are in /etc/mcelog/ then the man pages… man mcelog.triggers, man mcelog.conf etc