Rare crash w/ screen going black along with buzzing noise, then reboots
This is probably my 3rd time now having this crash and it worries me since most of my components are only a year old, except the PSU which is about 4 years old.

The crash seems to happen on average once every two weeks. I play Fortnite often and it has happened every single time on that game, to my knowledge.

No weird under or overclocks going on. Only XMP enabled for my RAM. No real noticeable settings or lack of updates that would be an immediate culprit.

My temps are fine. Only thing in the Event Viewer logs I can find says critical error Event 41, Kernel-Power: "The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly."

Specs:

CPU: Ryzen 5 5600X

GPU: RX 6600 XT

Mobo: MSI MPG B550 Gaming Plus

RAM: G.Skill Ripjaws 16GBx2 DDR4

PSU: EVGA 600W GD+

OS: Win 10
< >
Сообщения 1624 из 24
first grab memtest86+
then set bios to defaults, and disconnect all drives
and boot from the usb stick to test
then enable xmp/docp to see if thats the cause
Отредактировано _I_; 21 янв. 2024 г. в 9:34
Автор сообщения: Illusion of Progress
Yeah, if a crash is rare, troubleshooting it is... "fun".

The only thing with following what you find others had suggested for an issue that seems similar is...

1. It presumes what was suggested solved the issue (sometimes it has, others it was suggested and unknown).

2. More importantly, it presumes your issue is the same.

Not all BSODs are caused by the same thing. Not all machine check exceptions are caused by the same thing.

I also had a similar symptom (minus the audio buzzing; in my case the audio was fine until it restarted, and the only anomaly was it might "stop and go" after the screen went Black but before the restart if it took a while for the restart to actually occur) and in my case it was the graphics card, but it's not going to be the same for everyone.

That's not to convince you against trying the PSU. To the contrary; the PSU is one of the first couple of things I'd consider though, so if you find it's the most willing place to start, then start there. I'm just adding that there's multiple sources it could be coming from so keep in mind that if thing A fails to resolve it, be prepared and have a thing B to move on to trying.

You could lower the clock speeds/power consumption of the GPU in Adrenalin and see if that lessens or gets rid of the issue (hard to tell maybe if it's infrequent, I know). If it does occur less when the graphics card is only able to use less power, it would give support to the idea the PSU isn't coping.

On the other hand, you can do the opposite and try and force as much power consumption by running a CPU test in OCCT or Prime 95 to load the CPU, and use Furmark to load the GPU. If your system doesn't crash with the power draw of that, it's unlikely any game is making it pull more. But I'd really do both independently too in order to see if the crash is more likely with one as opposed to the other.

Testing RAM should be done too. Run MemTest86 overnight if need be.

One thing I forgot to mention is that I tune my GPU settings a bit: https://imgur.com/UVGoK6V

Normally, the minimum frequency default is set to only something like 600 MHz, which it is recommended to raise about 100 MHz less than the max frequency. I also tend to lower the voltage from 1150 mV to 1100 mV, however, I'm leaving that unchanged currently in case that was a cause for the issue (sudden loss of power?)

For VRAM tuning: Memory timing I set to fast and Max Frequency I set to 2180 MHz from 2000 MHz.

I don't think these should cause the issue, because its barely an overclock, bur rather just tuning. When my computer crashed like it does, it reverts the tuning controls to default, which is why it gave me suspicion of this, as well.
Adrenalin always sets itself to stock when there's a system crash (or at least when there's a GPU caused crash). Mine was doing that every time too even if I didn't change anything. The notice was there that it was reverted due to a system recovery or whatever.

Try setting Adrenalin back to stock. My 7800 XT is new to me so I'm not familiar with what are common sweet spot values for tuning these RDNA2/3 chips but one of the first rules of troubleshooting when having issues is putting things back to stock/default to see if it's a variable. That's where you're at here. You need to reduce variables to remove guess work as to what's causing it. RAM is included in this set it to stock part so I agree with the above advice to test it and perhaps with XMP disabled.

Set Adrenalin to default. You could even play with uninstalling it, running DDU in safe mode, and then reinstalling the drivers/Adrenalin. Wouldn't be a bad idea just to rule that out while you're at it.

Test RAM with MeTest86.

Maybe disable XMP.

These would be your first steps. You can also do further stuff like running OCCT tests or try some combination of the OCCT CPU test or Prime 95 (one or the other for the CPU) along with Furmark (for the GPU).
Автор сообщения: Illusion of Progress
Adrenalin always sets itself to stock when there's a system crash (or at least when there's a GPU caused crash). Mine was doing that every time too even if I didn't change anything. The notice was there that it was reverted due to a system recovery or whatever.

Try setting Adrenalin back to stock. My 7800 XT is new to me so I'm not familiar with what are common sweet spot values for tuning these RDNA2/3 chips but one of the first rules of troubleshooting when having issues is putting things back to stock/default to see if it's a variable. That's where you're at here. You need to reduce variables to remove guess work as to what's causing it. RAM is included in this set it to stock part so I agree with the above advice to test it and perhaps with XMP disabled.

Set Adrenalin to default. You could even play with uninstalling it, running DDU in safe mode, and then reinstalling the drivers/Adrenalin. Wouldn't be a bad idea just to rule that out while you're at it.

Test RAM with MeTest86.

Maybe disable XMP.

These would be your first steps. You can also do further stuff like running OCCT tests or try some combination of the OCCT CPU test or Prime 95 (one or the other for the CPU) along with Furmark (for the GPU).

Hmm okay. Should I bother upgrading to windows 11 or nah? I've kept it on 10 the past few years, but I can always just switch over for free.

I had someone suggest clearing CMOS in BIOS but I'd have to remove the CMOS battery from the motherboard which sounds like a pain.

I'm pretty confidend my RAM is good, but I can always run memtestx86 to be sure. Maybe I can do some sort of stress test that draws a lot of power from the PSU? Just need a way to replicate the crash, since once every 2 weeks or so is not good enough for diagnosis of issue.
Your call on Windows 11. If you reinstall the OS and choose either 10 or 11 it'd let you rule out a lot of variables, although I wouldn't expect that to fix your issue (that's merely a personal guess and not a "you shouldn't try what you're willing to" statement though). A machine check exception is usually a hardware side fault, rarely software side. Although "software side" things that tune the hardware side things (clock speeds, voltages, etc.) can certainly play into that.

I think of all the things mentioned, I think setting Adrenalin to default, and doing tests with the RAM (and trying XMP off if need be) are you most efficient one and two steps.

It's a process of elimination.

There's methods to replicate demanding conditions. They aren't exact but they might speed it up. That's why I mentioned OCCT (tests for a lot of parts), prime 95 (CPU loading) and Furmark (GPU loading).
Автор сообщения: Illusion of Progress
Your call on Windows 11. If you reinstall the OS and choose either 10 or 11 it'd let you rule out a lot of variables, although I wouldn't expect that to fix your issue (that's merely a personal guess and not a "you shouldn't try what you're willing to" statement though). A machine check exception is usually a hardware side fault, rarely software side. Although "software side" things that tune the hardware side things (clock speeds, voltages, etc.) can certainly play into that.

I think of all the things mentioned, I think setting Adrenalin to default, and doing tests with the RAM (and trying XMP off if need be) are you most efficient one and two steps.

It's a process of elimination.

There's methods to replicate demanding conditions. They aren't exact but they might speed it up. That's why I mentioned OCCT (tests for a lot of parts), prime 95 (CPU loading) and Furmark (GPU loading).

Right. Because a failing GPU I'd imagine would be a bit more obvious or at least more impactful on my experience. I had an old GPU years ago and things like visual artifacts/visual screen cracks with bright pink/black textures would occur as well as crashing of course.

Is there a way I can monitor the PSU? I heard I can download a program somewhere to test for the voltage fluctuations and whatnot.

Again, swapping out parts or doing anything major like that may not be worth it at the moment, since the crash happens rarely. Maybe I can monitor some things in the mean time.

EDIT: Also, I wanted to ask - could PBO be a possible cause? Like I said, it's enabled by default since it's the 5600X.

Online I've read people suggest that it could be how the voltages are set in BIOS for your CPU, etc.

Also, I have some old 2TB harddisk drive attached to my mobo. I really only use it for old games that don't benefit from an SSD... could this cause the issue too? I just heard somewhere that jank HDDs can cause problems.
Отредактировано Hoppled; 21 янв. 2024 г. в 14:45
Don't rule out the graphics card. You would think it would be more consistent or obvious but depending on how it fails, it might not be.

I changed to a 7800 XT and it was pretty good, except after a few days my display went Black, sound continued for a few seconds, then went stop and go, and then the PC was back at the POST screen restarting. Sometimes it would work fine for days, others it would crash a few times a day. When it worked, there was zero issues. No artifacts, temperatures were low, absolutely nothing looked wrong.

I even doubted it was the graphics card because of that, and because playing with CPU or RAM things made it more or less severe (but never made it go away). But removing the RX 7800 XT made it go away, and the system was stable otherwise.

Did an RMA and so far this new one hasn't done it once.

This isn't to say yours is the graphics card. This is merely saying don't entirely rule it out because it seems to work fine when it does work.

To reiterate myself from earlier, if the issue persists with things at stock, then I think the PSU and graphics card are the first two I'd blind guess at. RAM and then motherboard after that.

PBO can be involved, yes. That affects the voltage/frequency curve of the CPU. Something I forgot to mention earlier when asking if you had Event ID 18 was to also ask for you to post a few of them. If it's reporting the cache hierarchy error as yours is, then see if the SAME core (APIC ID) is listed. If it's the same one, it heavily does suggest you have a particularly unstable CPU core. If the APIC ID is always different, however, then it's probably NOT the CPU (but still can be) and its just different cores catching the machine check exception condition. That Event ID means "this CPU core caught an uncorrectable hardware error (MCE)".

Disabling PBO and XMP will help rule out things like CPU, Infinity Fabric, or just bad RAM. Again, all to stock.

You can use HwInfo64 or GPU-Z to monitor voltages in the background while gaming. I suggested that in an earlier reply. I did this when troubleshooting my own issue and the voltages stayed consistent which further led me back to suspecting the GPU. The only thing is the voltages might drop too fast to be caught before the CPU does its thing and triggers the restart but it's still worth doing because if it does report they drop, it's a good sign the PSU is a likelier suspect.
Отредактировано Illusion of Progress; 21 янв. 2024 г. в 15:20
Автор сообщения: Illusion of Progress
Don't rule out the graphics card. You would think it would be more consistent or obvious but depending on how it fails, it might not be.

I changed to a 7800 XT and it was pretty good, except after a few days my display went Black, sound continued for a few seconds, then went stop and go, and then the PC was back at the POST screen restarting. Sometimes it would work fine for days, others it would crash a few times a day. When it worked, there was zero issues. No artifacts, temperatures were low, absolutely nothing looked wrong.

I even doubted it was the graphics card because of that, and because playing with CPU or RAM things made it more or less severe (but never made it go away). But removing the RX 7800 XT made it go away, and the system was stable otherwise.

Did an RMA and so far this new one hasn't done it once.

This isn't to say yours is the graphics card. This is merely saying don't entirely rule it out because it seems to work fine when it does work.

To reiterate myself from earlier, if the issue persists with things at stock, then I think the PSU and graphics card are the first two I'd blind guess at. RAM and then motherboard after that.

PBO can be involved, yes. That affects the voltage/frequency curve of the CPU. Something I forgot to mention earlier when asking if you had Event ID 18 was to also ask for you to post a few of them. If it's reporting the cache hierarchy error as yours is, then see if the SAME core (APIC ID) is listed. If it's the same one, it heavily does suggest you have a particularly unstable CPU core. If the APIC ID is always different, however, then it's probably NOT the CPU (but still can be) and its just different cores catching the machine check exception condition. That Event ID means "this CPU core caught an uncorrectable hardware error (MCE)".

Disabling PBO and XMP will help rule out things like CPU, Infinity Fabric, or just bad RAM. Again, all to stock.

You can use HwInfo64 or GPU-Z to monitor voltages in the background while gaming. I suggested that in an earlier reply. I did this when troubleshooting my own issue and the voltages stayed consistent which further led me back to suspecting the GPU. The only thing is the voltages might drop too fast to be caught before the CPU does its thing and triggers the restart but it's still worth doing because if it does report they drop, it's a good sign the PSU is a likelier suspect.

Right. In your case, it sounds like the crashes at least happened more frequently. It has obviously not crashed again since making this post and I've yet to turn everything to stock yet. If it happened more often, it may be easier to narrow down.

There were 2 event 18 ID when the crash occurred the other day, both having different Processor APIC IDs - one is 0 the other is 6. I can't go back further than 7 days to fetch the last crash that happened a few weeks ago, but I remember checking Processor APIC ID in the past and it was always different.

Also, you mentioned using hwinfo and GPU-Z to monitor voltages - is there a way I can have it save as logs? That way, I can go into hwinfo or GPU-Z and check what happened before the crash -- similar to how event viewer works. I'm also not sure if its possible to monitor a PSU output.

Not saying it can't be the GPU, yeah. If anything, it's less of a pain in the ass replacing a GPU rather than mobo or CPU...

One thing I will mention: Before I upgraded to this PC over a year ago, it was a pre-built PC. It came with an ASRock B450M/ac mobo, 8GB ADATA DDR4 (which I expanded), RX 580 GPU, Ryzen 5 3600 CPU, 240GB SATA SSD (which I still use, but is not my boot drive).

Over time, I replaced ALL of the components in the case, EXCEPT the PSU, Case and the remaining harddrives I use just for extra storage - boot drive is a new m.2 NVMe. Keeping the same case and PSU made building the PC and new mobo super easy. The mobo was the very last part I installed.

However, I actually had an issue where I would still bluescreen occasionally (literal BOSD error message, not like this black screen I'm getting), but it would happen infrequently like it does now. It happened with the very pre-built from the beginning, up until I replaced everything but the mobo. Replacing the mobo made me think in my head "okay, that was probably the issue." So, in a way, I narrowed it down to the PSU... BUT the crash happens in a different way and I cannot remember what those event ID errors were back then, 3-4 years ago. It just seems quite unlikely since it's a good brand (EVGA) and I made sure it is a real 80+ Gold unit.
Yes, HwInfo64 and GPU-Z can both log. I'm not sure offhand the exact steps to either of them but it should be rather straightforward.

I wouldn't conflate the earlier BSOD with the seemingly MCE issue you're having now. They seem to be different things and saying "I had a different issue once upon a time and it could be related to one I'm having now" doesn't help. Believe me, I know that all too well. Just focus on the issue you're having now. Even if there's a link, it's better to focus on addressing the here and now issue as it is currently presenting itself.
Отредактировано Illusion of Progress; 21 янв. 2024 г. в 15:57
< >
Сообщения 1624 из 24
Показывать на странице: 1530 50

Дата создания: 20 янв. 2024 г. в 8:28
Сообщений: 24