Steam for Linux

Steam for Linux

tuxdelux Apr 15, 2022 @ 12:39pm
Hardware Error causing computer restart during gameplay
Can someone help me troubleshoot my gaming desktop? Every 8-10 hours of gaming, the system crashes and reboots. I ran MEMTEST86 from a boot disk, which completed tests 100% no-errors. Where do I go from here, to diagnose the issue?

I had two crashes over today and yesterday. The kern.log file had this error. I am not sure how to decode the information, past the CPU references. Any help would be appreciated.

Apr 14 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: bea0000000000108 mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1649788097 SOCKET 0 APIC 0 microcode 8701021 mce: [Hardware Error]: TSC 0 ADDR 7fdf18c609a2 MISC d012000100000000 SYND 4d000000 IPID 500b000000000 Apr 15 mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 22: baa000000002010b mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000000 IPID 1813e17000 mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1650050419 SOCKET 0 APIC 0 microcode 8701021 mce: [Hardware Error]: CPU 12: Machine Check: 0 Bank 5: bea0000000000108 mce: [Hardware Error]: TSC 0 ADDR 8e647a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1650050419 SOCKET 0 APIC 9 microcode 8701021
Originally posted by Aoi Blue:
Originally posted by tuxdelux:
The BIOS is not up-to-date, as the desktop system was purchased only last year. The motherboard (Biostar B450MH) is running a BIOS from 07-15-2020, and the processor is a AMD Ryzen 7 - 3700X 3.6GHz

Is there any reason to suspect problems with the video card? (Crashes always happen when playing games.) Or the power supply? (It is not running off a UPS.)

Not doing (never done) any over-clocking.
Yeah the B450 motherboards have some errata bugs in the interface to the CPU and default CPU microcode You definitely should update the bios.

The error is absolutely CPU on the microcode level.

Also double check your power supply is sufficient. Insufficient power supply can cause this issue.

Keep in mind that some motherboards do some lightweight above spec behavior automatically that isn't considered overclocking per-say. You might want to turn that off. Most common is the extension of the maximum boost cycle on the CPU and increasing the CPU's power draw limiter, TDP and thermal cap limits within the +5-10% maximum range.

Also check your CPU for overheating. I probably should have mentioned this. If your thermal paste connection is poor or your airflow is bad it can cause failure on the most heat sensitive circuit in your CPU when processing the most heat sensitive instruction (typically the instruction that gives the least amount of response time). This is because thermals adjust the heat.

Also, what is your distro. Double check that there isn't a more recent major version, and double check that you are using the latest kernel.
< >
Showing 1-14 of 14 comments
RTheren Apr 15, 2022 @ 12:40pm 
I suppose you have latest BIOS and microcode?
Zyro Apr 15, 2022 @ 1:05pm 
If you can lend some parts, exchanging then for a while would be interesting.
Aoi Blue Apr 15, 2022 @ 1:09pm 
Double check that your BIOS is up to date, and also check if you are overclocking your CPU.

It would be nice to know what type of processor you have.

Some CPUs had issues that have been patched in both BIOS and updates to the Linux Kernel, so you should probably update your kernel too.

Address is not consistent, but microcode is. This means it is likely an overclocking issue, voltage regulator issue, microcode or a CPU errata. All three of these can be addressed by BIOS updates, the later two by kernel updates.
Last edited by Aoi Blue; Apr 15, 2022 @ 9:55pm
tuxdelux Apr 15, 2022 @ 1:17pm 
The BIOS is not up-to-date, as the desktop system was purchased only last year. The motherboard (Biostar B450MH) is running a BIOS from 07-15-2020, and the processor is a AMD Ryzen 7 - 3700X 3.6GHz

CPU*: patch_level=0x08701021

Is there any reason to suspect problems with the video card? (Crashes always happen when playing games.) Or the power supply? (It is not running off a UPS.)

Not doing (never done) any over-clocking.
Last edited by tuxdelux; Apr 18, 2022 @ 7:37am
meheezen Apr 15, 2022 @ 4:46pm 
brand and model of the PSU?
The author of this thread has indicated that this post answers the original topic.
Aoi Blue Apr 15, 2022 @ 10:08pm 
Originally posted by tuxdelux:
The BIOS is not up-to-date, as the desktop system was purchased only last year. The motherboard (Biostar B450MH) is running a BIOS from 07-15-2020, and the processor is a AMD Ryzen 7 - 3700X 3.6GHz

Is there any reason to suspect problems with the video card? (Crashes always happen when playing games.) Or the power supply? (It is not running off a UPS.)

Not doing (never done) any over-clocking.
Yeah the B450 motherboards have some errata bugs in the interface to the CPU and default CPU microcode You definitely should update the bios.

The error is absolutely CPU on the microcode level.

Also double check your power supply is sufficient. Insufficient power supply can cause this issue.

Keep in mind that some motherboards do some lightweight above spec behavior automatically that isn't considered overclocking per-say. You might want to turn that off. Most common is the extension of the maximum boost cycle on the CPU and increasing the CPU's power draw limiter, TDP and thermal cap limits within the +5-10% maximum range.

Also check your CPU for overheating. I probably should have mentioned this. If your thermal paste connection is poor or your airflow is bad it can cause failure on the most heat sensitive circuit in your CPU when processing the most heat sensitive instruction (typically the instruction that gives the least amount of response time). This is because thermals adjust the heat.

Also, what is your distro. Double check that there isn't a more recent major version, and double check that you are using the latest kernel.
tuxdelux Apr 16, 2022 @ 2:58am 
The computer is running a 600W power supply (info from PC store website, sorry cannot see make-model info, because the case internal design hides this information)

Running Ubuntu 22.04 (pre-release) but I have also been having this issue with Ubuntu 21.10 and 21.04

Thank you all for your responses. Thank you so much Aoi Blue for your detailed explanation. I will update the BIOS right away, and I will reply here if the problem happens again.
Last edited by tuxdelux; Apr 16, 2022 @ 3:13am
Aoi Blue Apr 16, 2022 @ 8:06pm 
Check the recommended power supply for your GPU/CPU combination. It is usually a lot higher than the TDP wattage. The TDP is the heat produced on sustained draw, but the recommended wattage is burst draw, which can be much higher.

There are often settings in the motherboard to compensate by premptively allocating more voltage to the CPU so when it draws too much it has some leeway in the Voltage Regulator caps on the motherboard.

I tend to always buy higher end motherboards because they make sure to have extra power draw spike handling, reducing excess load on the power supply.
meheezen Apr 17, 2022 @ 4:08am 
also, unless you got a 80+bronze or better power supply unit from a known brand, assume the PSU has around 60% efficiency; meaning a 600W off-brand PSU will deliver up to 360W reliably, which seems to be a bit low when playing games (due to the bursts/surges of power draw)
Aoi Blue Apr 17, 2022 @ 5:08am 
Originally posted by meheezen:
also, unless you got a 80+bronze or better power supply unit from a known brand, assume the PSU has around 60% efficiency; meaning a 600W off-brand PSU will deliver up to 360W reliably, which seems to be a bit low when playing games (due to the bursts/surges of power draw)
Additionally this capacity reduces with age. I forgot to mention that part.
Scrab Apr 17, 2022 @ 5:49am 
wtf
-Ebalo) Apr 22, 2022 @ 2:32am 
wtf
Stin Apr 22, 2022 @ 5:34am 
I had similar MCEs from the kernel. Mine would randomly lock up or load corrupt data. Turned out to be a dead DMA controller. It's OK so long as I don't use the SATA controller lol

I hope you find out what was wrong :)

Also, I always thought that PSU efficiency had to do with the ratio of power drawn from the wall to power output to the PC, not just against rated power output.

For example, a PSU which pulls 700 Watts from the wall, at just 60% efficiency would produce 420w of useable output to the PC. And it would have a rating of 420w peak?

I always thought it was an input:output ratio.
Last edited by Stin; Apr 22, 2022 @ 5:38am
Zyro Apr 22, 2022 @ 6:07am 
Originally posted by Stin:
For example, a PSU which pulls 700 Watts from the wall, at just 60% efficiency would produce 420w of useable output to the PC. And it would have a rating of 420w peak?

I think you're right: What's written on its box is what the PSU can deliver to the system, not what it pulls from the outside world.
< >
Showing 1-14 of 14 comments
Per page: 1530 50

Date Posted: Apr 15, 2022 @ 12:39pm
Posts: 14