A fatal hardware error has occured (Event ID 18) :: Hardware and Operating Systems

Tutte le discussioni > Discussioni di Steam > Hardware and Operating Systems > Dettagli della discussione

Illusion of Progress 12 ott 2023, ore 19:40

A fatal hardware error has occured (Event ID 18)

Edit: I've rewritten this post, both to update it and make it more concise to the primary issue.

I've been dealing with two issues an issue lately (Edit: one has been solved). This started shortly after I changed my video card (GTX 1060 to RX 7800 XT).

--------------------------------------------------

Summary of the issue:

The display sometimes goes Black (display backlight stays on though), and then the PC restarts on it's own (most of the time) or stays that way (one or two times). The time between the screen turning Black and the PC restarting varies.

These are not BSODs. I'm not seeing one (I have automatic restart on BSOD disabled) nor does anything show up insofar as minidumps, memory dumps, or event logs indicating such. What I am getting is a log for Event ID 18 every time (which is a machine check exception of "a fatal hardware error has occurred"), and nothing else (besides the expected Event ID 41 and Event ID 6008, which are merely byproducts of the unexpected shutdown). Details about this issue are below.

--------------------------------------------------

PC Specifications:

https://valid.x86.fr/306yr6

PSU: EVGA SuperNova G5 750W
CPU: Ryzen 7 5800X3D
CPU cooling: Be Quiet Dark Rock Pro 4
Motherboard: MSI MAG X570S Tomahawk Max WiFi (7D54v17 BIOS)
RAM: 64 GB (4x 16 GB) G.Skill Ripjaws V 3,6000 MHz 16-19-19-39 1.35V
GPU: Sapphire Nitro RX 7800 XT (23.10.2 drivers)
SSD(s): 2x Western Digital Black SN850X 2 TB (latest firmware on both)
HDD(s): 1x Western Digital Black 5 TB
2x Western Digital Blue 8 TB
Display: Dell U2410 24" 1920 x 1200/60 Hz (connected via display port and HDMI)

Everything is at stock, with the exception of the "XMP" RAM profile speed being enabled.

--------------------------------------------------

Detailed description of the issue:

For a week or so after I added the video card, things were fine.

I tried undervolting my CPU (all core offset of -30, and then -20), and they passed some initial stress tests each time, but failed in real world scenarios. The screen would go Black, and the PC would restart, and this was the first time the issue occurred. Both times I was met with Event ID 18 in the Event Viewer.

"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 14

The details view of this entry contains further information."

The APIC number varies each time I get this issue, as this corresponds to the logical processor that threw the exception.

I chalked it up to instability, and set the CPU back to stock.

Not long after, it happened again... and it irked me, but I thought maybe it was a one off and waited to see if it would continue.

For another week or so, things were again stable.

I then got another one (this was the point I initially made this thread). And then another. And then another. And... I'm losing track. And they seem to been escalating.

--------------------------------------------------

Troubleshooting I've attempted:

1. I've updated the motherboard BIOS. V1.5, V,1.7, and V1.8.

2. Windows 10 is up to date.

3. AMD chipset drivers are up to date. Audio drivers are up to date. Ethernet drivers are up to date. Bluetooth and WiFi drivers are up to date. Etc.

4. I've updated video card drivers as new ones have become available. Both issues have persisted on all drivers I've tried, including 23.9.1, 23.9.3, 23.10,.1, 23.10.2, and 23.11.1. I'm not even getting any event logs about the drivers crashing and recovering.

5. I've used DDU to uninstall and reinstall the video drivers. Yes, I used safe mode. Yes, I disconnected the internet.

6. I've reset the BIOS or otherwise done things (see number 13 below) that leave me with a reset BIOS who knows how many times.

7. I've disabled XMP (seems like it may have made it worse, but that might just be coincidental), set XMP but scaled back RAM frequency/IF clocks a bit to 3,200 MHz/1,600 MHz respective. So it doesn't matter RAM/IF is set to 2,133 MHz (JEDEC default)/1,066 MHz or 3,200 MHz/1,600 MHz or 3,600 MHz/1,800 MHz respectively, they all have the issue. This seems to rule out RAM or Infinity Fabric instability?

8. I've run stress tests galore. Windows memory diagnostic (might not be very conclusive on its own but I did it), MemTest86+, Prime 95, BurnInTest, and the majority of the OCCT suite. All passed, with the exception of the "GPU variable" test in OCCT, which immediately caused the crash the first time I attempted it, but then succeeded on a subsequent attempt (at first I was happy, ironically, that I may have found a reproducible cause, but it seems I didn't).

9. I've tried connecting the DP cable to both output ports on the video card (mine has two DP and two HDMI instead of three DP and one HDMI).

10. I've tried HDMI.

11. I've adjusted the ASPM setting (PCI Express > Link State Power Management > Off).

12. I've completely reinstalled Windows 10!

13. I've completely, and I mean completely, took my PC apart down to the part, cleaned it (though it was already rather clean), and reassembled it. This was to rule out a bad connection anywhere. I even swapped RAM around, and the CPU was also reseat.

14. The video card is a Sapphire Nitro+ RX 7800 XT which has a BIOS switch with three positions (one performance BIOS, one silent BIOS, and the other is just a mode that lets you change it on the fly with the Sapphire Trixxx software). I've tried both BIOS/all three positions.

15. I've used "Driver Verifier" which is something Windows includes and followed the instructions here[answers.microsoft.com] to stress test the drivers. This was inconclusive, but not useless. Since the issue doesn't yet have a known reproducible, on demand cause, I have to wait, but this tends to cause it to occur sooner. Unfortunately, the Driver Verifier does not catch anything and give me a notice of any violations it detected. Maybe because the drivers are fine and the issue isn't drivers but hardware itself. I'm reading machine check exceptions are, as a rule, almost always hardware and not software.

16. I've found some people saying they suspect the issue the card boosting above where it should. I've tried limiting the boost to 2,500 MHz (default maximum is 2565 MHz) but it doesn't seem to truly respect this. Nonetheless, it made no difference. Along with this, I tried disabling the "Zero Fan" (this stops the fan when the temperature is below a certain temperature) as someone suggested, and this also made no difference.

17. I've tried disabling ULPS.

18. I've tried disabling MPO.

19. I've tried a 3700X in place of the 5800X3D. It happens on both.

None of these troubleshooting steps have resolved the issue.

--------------------------------------------------

Troubleshooting I'm needing to do since the above failed, and I think I want to try both first before deciding to proceed down any RMA path:

1. Try my old video card to see if the issue indeed goes away.

Ultima modifica da Illusion of Progress; 8 nov 2023, ore 13:08

< >

Visualizzazione di 1-15 commenti su 149

A&A 12 ott 2023, ore 20:26

About video signal loss problem.
If there is no difference after updating the drivers, then I would do this. If you have another display port cable, test. If you don't have then with HDMI and If you don't have then testing with the older GPU should show the problem is or isn't the GPU.

Descriptions of the crashing/PC restarting issue:
Very interesting what makes Cache Hierarchy Error

Disabling XMP should reduce the infinite fabric frequency, also if can the core multiplier be reduced to "increase stability"?

Ultima modifica da A&A; 12 ott 2023, ore 20:45

Illusion of Progress 12 ott 2023, ore 21:58

Yes, I immediately though afterwards that I also do have HDMI cables I can test, but I'll hold off for now as I'll try and get another DP cable tomorrow. If I still have issues with another DP cable, I can probably rule a bad DP cable out, but at that point I can also fall back on HDMI to test as this would also rule out a bad DP port on the display side (the display has two HDMI inputs but just one DP).

In the meantime I've done the other three things.

BIOS is updated from 1.50 to 1.70.

Drivers are updated from 23.9.3 to 23.10.1.

XMP is off (sadly).

Will be testing and letting time show results on both issues.

Messaggio originale di A&A ✠:
Descriptions of the crashing/PC restarting issue:
Very interesting what makes Cache Hierarchy Error

The way I first took that might be "faulty CPU cache" but that might be a Red herring so to speak. It just means whatever numbered logical processor is listed threw a machine check exception because it found an issue in some memory address space, likely in the CPU cache, but that doesn't mean the CPU cache itself is bad. It just means the results it was reading from it was "bad" (so could be bad software?).

So it's inconclusive what the cause is, but I'm open to suggestions on that.

There's a plethora of those WHEA errors since I updated my motherboard, but they are all warnings only. Changed the video card and now they are errors that crash the PC.

Messaggio originale di A&A ✠:
Disabling XMP should reduce the infinite fabric frequency, also if can the core multiplier be reduced to "increase stability"?

I don't want to mess with manual CPU settings too much honestly, especially since it's a 5800X3D, but also since I'm trying to troubleshoot so just ensuring stock settings work is the goal for now.

Ultima modifica da Illusion of Progress; 12 ott 2023, ore 21:59

emoticorpse 12 ott 2023, ore 22:28

I'm assuming you did change the video card and went straight to display port and you didn't change the video card WHILE still being on dvi for a while and it worked fine and THEN changed to display port and then it started acting up.

I think in you very first summary of the display issue, it'd be a good idea and correctly add in that you changed the display connection type. It says the issue started after you changed video cards, but in reality it also changed after you changed video connection type. Subtle difference, but I do think it's crucial to noting in your case. Especially, after only recently learning (after reading up for someone else) about display port design not passing a signal for power or something like that which is apparently why some monitors end up can be turned off without killing the connection between it and the pc and others totally taking the display off when it's powered off or put to sleep. Something like that. Kind of confusing but I think it's good to know this. Not that it's necessarily your problem because, I don't know if that is happening.

I'm assuming you did change the video card and went straight to display port and you didn't change the video card WHILE still being on dvi for a while and it worked fine and THEN changed to display port and then it started acting up.

Have you tried simply jiggling the cable or replugging it more firmly (I'm assuming this is an easy yes, but asking anyways).

Have you reset bios settings to default and left them there to check for stability (Tweaking certain things is fine but leaving the performance/ram/cpu stuff alone)?

Ultima modifica da emoticorpse; 12 ott 2023, ore 22:31

A&A 12 ott 2023, ore 23:02

After a bios update, it is normal for all settings to reset themselves to factory defaults.

Its not like it couldn't possibly be bad software, but then there would have to be a BSOD? The reason I looked at the core multiplier is that L1 and L2 caches have their speed affected by the core clock, while L3 by the fastest core. The lower clock speed of the core reached in some other way should still affect it.

Also every motherboard has some difference in the stock voltage they run the processors. It is worth checking.

Ultima modifica da A&A; 12 ott 2023, ore 23:42

Illusion of Progress 13 ott 2023, ore 3:07

The crash issue occurred again. And it happened in a different scenario than any of the others (I was just playing League of Legends).

I guess a Silver lining is it happened so fast I can sort of rule out that something with the old BIOS, video card drivers, or the XMP speeds as the cause.

"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 2

The details view of this entry contains further information."

If there's further information these event viewers hold, please inform me as to how to find out what they are.

At this point I'm thinking I have a bad part that is worsening fast. I don't know if it's the CPU or GPU, or even something else. The RAM passed a stress test but so did the CPU. I'm totally lost at this point but I'm wondering about the GPU, since this all started after that changed.

And to add insult to injury, I found out the Minecraft world I was playing when it crashed yesterday is corrupt as a result. Might be able to recover from that one, and I have a backup from around a month ago, but... just another part about this whole thing that's annoying me now.

Messaggio originale di emoticorpse:
I think in you very first summary of the display issue, it'd be a good idea and correctly add in that you changed the display connection type. It says the issue started after you changed video cards, but in reality it also changed after you changed video connection type. Subtle difference, but I do think it's crucial to noting in your case.

Have you tried simply jiggling the cable or replugging it more firmly (I'm assuming this is an easy yes, but asking anyways).

Have you reset bios settings to default and left them there to check for stability (Tweaking certain things is fine but leaving the performance/ram/cpu stuff alone)?

I did note that I changed connection type at the same time of the video card change though, didn't I?

"I was, however, using DVI before the change whereas I'm using display port now, so the cable is a variable that changed. I've tried disconnecting and reconnecting the DP cable, and swapping which end gets plugged into the video card and display. It didn't change the issue."

I mention that the connection type/cable changed and are a variable. For sure, this might be causing the display losing signal but not the crashing issue.

I changed from DVI to DP because modern video cards don't have DVI.

I never changed to DP before that because I had no inherent reason to.

And yes, I've tried unplugging and plugging in the cable and even reversing the ends. I'm going to try ruling out the cable with either another DP cable or an HDMI cable though.

BIOS settings are already mostly stock. The only things I really change are disabling the motherboard RGB, turning off scroll lock on boot, and other small stuff. The only major non-stock thing I changed before was that I was using RAM profile speeds, which I've since taken out of the equation by using default JEDEC speeds for now.

Messaggio originale di A&A ✠:
Its not like it couldn't possibly be bad software, but then there would have to be a BSOD? The reason I looked at the core multiplier is that L1 and L2 caches have their speed affected by the core clock, while L3 by the fastest core. The lower clock speed of the core reached in some other way should still affect it.

I'm not sure if there would be a BSOD or not.

Are you thinking the CPU may be bad then? I don't necessarily disagree. I'm just wondering if that's what you're thinking.

emoticorpse 13 ott 2023, ore 3:22

Messaggio originale di Illusion of Progress:
I did note that I changed connection type at the same time of the video card change though, didn't I?

"I was, however, using DVI before the change whereas I'm using display port now, so the cable is a variable that changed. I've tried disconnecting and reconnecting the DP cable, and swapping which end gets plugged into the video card and display. It didn't change the issue."

I mention that the connection type/cable changed and are a variable. For sure, this might be causing the display losing signal but not the crashing issue.

Yeah, you did mention that information later on which is how I ended up getting it. I just think it'd be worth placing in the initial sum-up of the problem way earlier where you mentioned the problem started when you changed cards.

Messaggio originale di Illusion of Progress:
I changed from DVI to DP because modern video cards don't have DVI.

I never changed to DP before that because I had no inherent reason to.

I figured this was the case. I'm not trying to in anyway criticize the change or question why you did it.

Messaggio originale di Illusion of Progress:
BIOS settings are already mostly stock. The only things I really change are disabling the motherboard RGB, turning off scroll lock on boot, and other small stuff. The only major non-stock thing I changed before was that I was using RAM profile speeds, which I've since taken out of the equation by using default JEDEC speeds for now.

So with the ram configuration in bios, have you done ANY CHANGES AT ALL? or just left them totally alone how they are at stock?

I'm wondering this because funny enough just today I was messing around in my bios, and long story short I start getting freezes and blue screens with any sort of change in my ram config where I never had that before. Currently I have the ram at total defaults because It's the only way I don't get those freezes/blue screens. I'll probably figure it out eventually but for the moment, I don't feel like messing with it.

I don't know what changed and why the ram profiles are giving me such a hard time now but oh well. I do know I went through a serious Windows 10 update recently, maybe it's something with Windows. Not sure.

I'm wondering if what you and I are experiencing are a parallel. Who knows though.

I personally wouldn't blame hardware yet though.

A&A 13 ott 2023, ore 5:48

Ok, I have a guess

There is a "cache hierarchy error" for a second time and the ID is different, and probably these are local APIC IDs that represent actual cores, but there is an ID 14, so maybe they represent all the logical cores in the system.

So it must be something like this???
ID: 0 = CPU0
ID: 1 = CPU1
etc...

What if HT is disabled will be there more than ID 7?

Ultima modifica da A&A; 13 ott 2023, ore 6:22

Agent 13 ott 2023, ore 6:13

Sounds to me like it could be a power delivery problem. To confirm this I would undervolt your GPU with MSI afterburner. If your PC is able to go on for longer or even eliminate the problem after this then that's the issue.

pasa 13 ott 2023, ore 6:29

First google hit: https://www.reddit.com/r/AMDHelp/comments/hq7jcu/cache_hierarchy_error/ a nice read like Algernon... no solution, but reasonable thought about power spikes. We know about that for some time and indeed that IMO could manifest in such behavior. With no realistic chance to prove without a scope (or similar external HW) attached. Well, maybe if you run recorder of voltages and get a recoverable variant of the fault, it may show a sudden drop. But most take samples way too rarely for that.

If you find a way to trigger the problem in somewhat consistent way, you may experiment by forcing the GPU to do less work and see if that has influence.

GOD RAYS ON ULTRA™ 13 ott 2023, ore 6:46

The 3600 ram speed isn't officially supported by AMD 5800x3d. AMD only supports 3200. I would set it to default or at least 3200, 16, 18, 18, 38

GOOD LUCK!

#10

pasa 13 ott 2023, ore 7:10

Also, you get the problems detected through MCA. So maybe you can put mcat or some similar utility to get more details on what is actually detected and whether it's consistent or goes all over the place for being a secondary symptom.

I would run experiments disabling cores. 4 at a time in different patterns. Maybe you luck out and one of those is actually bad.

#11

A&A 13 ott 2023, ore 8:08

My guess maybe is correct because nowdays every cpu core has it's own APIC.

When it's a "cache hierarchy issue" that means it could be caused by driver errors, but you updated the drivers (How about the chipset?) and after replacing the motherboard did you reinstall Windows? The other thing that is also possible if the core voltage is not enough and the cache is tied to the expected set core clock, this can make it out of sync and then the issue can be address problems, register errors, or interrupt vector errors.

Ultima modifica da A&A; 13 ott 2023, ore 8:10

#12

Illusion of Progress 13 ott 2023, ore 9:03

Messaggio originale di emoticorpse:
So with the ram configuration in bios, have you done ANY CHANGES AT ALL? or just left them totally alone how they are at stock?

I was running with "XMP" enabled before. I am running with XMP disabled (in other words, at default now). Errors show up both ways.

Messaggio originale di emoticorpse:
I'm wondering this because funny enough just today I was messing around in my bios, and long story short I start getting freezes and blue screens with any sort of change in my ram config where I never had that before. Currently I have the ram at total defaults because It's the only way I don't get those freezes/blue screens. I'll probably figure it out eventually but for the moment, I don't feel like messing with it.

That sounds like it isn't stable with the Expo profile, which would be interesting.

Messaggio originale di emoticorpse:
I personally wouldn't blame hardware yet though.

I didn't want to at first, but it's happened six times now, four of them being "unaccounted for". I'm even getting data loss (I mean you "technically" always do with any crash but this is actually risking setting me back as a result). That is unacceptable; the PC is unstable.

The video card and video drivers were the only things that changed, and I know people meme on AMD's video drivers but I doubt the four most recent versions are all expected to behave this way. I'm not saying it's the video card, but it does at least seem most logical to start there? Have to start somewhere, and I've exhausted trying different video drivers anyway.

Messaggio originale di A&A ✠:
Ok, I have a guess

There is a "cache hierarchy error" for a second time and the ID is different, and probably these are local APIC IDs that represent actual cores, but there is an ID 14, so maybe they represent all the logical cores in the system.

So it must be something like this???
ID: 0 = CPU0
ID: 1 = CPU1
etc...

What if HT is disabled will be there more than ID 7?

Your guess is correct. I mentioned this in my first post. The APIC ID corresponds to the logical processor. As far as I know, a "machine check exception" error is always "made" by the CPU, which is why I said it could be a Red herring and might not indicate a bad CPU. Or it might.

Searching "Error Type: Cache Hierarchy Error" on google though is a ... wild rabbit hole. From what I've found out from this so far...

That's Event ID 18.

It's always an AMD CPU involved, and namely it seems to be Ryzen 5000 series CPUs it occurs with. I found one result from the early 2010s pre-Ryzen days with it, though.

The causes seem to vary to, well... everything.

Some changed CPU and it fixed it. Some RMA'd CPUs and had no change.

Some had it with AMD GPUs, some with nVidia. Others had it fixed with changing to a different GPU, or inversely and like me, had it show up only after a GPU change.

Others changed one of a dozen voltages and had it fixed. Some of them even claimed ba/low voltage d VRAM. Others changed multiple voltages and had no fix.

Others changed RAM and fixed it. Others set to JEDEC and had no fix like me.

Others reinstalled Windows (haven't found a single one that stated this resolved the issue though).

Wild goose chase this is sounding like. But I'm going to keep it simple. It showed up when I changed the GPU, so I should maybe approach it with that in mind and presume GPU first. I did have Event ID 19 (the warnings) in the event log before... but no crashes. Event ID 19 itself turns up even on Intel platforms, but Even ID 18 I'm only finding results on AMD.

Messaggio originale di ULTRAWOKETRYHARDCRINGE:
The 3600 ram speed isn't officially supported by AMD 5800x3d. AMD only supports 3200. I would set it to default or at least 3200, 16, 18, 18, 38

GOOD LUCK!

I appreciate the well wishes. I need them right now.

This is true, because AMD doesn't support above 3,200 MHz officially on anything AM4. Same as Intel for that time period. And I do know I have a heavy RAM configuration (four DIMMs, 3,600 MHz speed, and all dual rank DIMMs). So the thought entered my mind.

However, I'm still having the issue at stock JEDEC speeds of 2,133 MHz.

Messaggio originale di pasa:
First google hit: https://www.reddit.com/r/AMDHelp/comments/hq7jcu/cache_hierarchy_error/ a nice read like Algernon... no solution, but reasonable thought about power spikes.

If you find a way to trigger the problem in somewhat consistent way, you may experiment by forcing the GPU to do less work and see if that has influence.

Yeah, I searched Google and mentioned my findings above; definite rabbit hold this is looking to be.

I don't think it's wattage related. I can play Minecraft (and with shaders this can push it near 100% use) and it never crashes. Never. Once I press F11 to shift it to window mode, and then go to open a file explorer window, it falls apart.

Also crashed on League of Legends and that's super low demanding video and wattage wise so it doesn't seem to follow a rule of crashing under load. I'm suspecting video card maybe, but not PSU yet. It doesn't follow a trend of being fine at low load and crashing under high load. More the opposite.

Ultima modifica da Illusion of Progress; 13 ott 2023, ore 9:07

#13

emoticorpse 13 ott 2023, ore 9:18

Messaggio originale di Illusion of Progress:
Messaggio originale di emoticorpse:
So with the ram configuration in bios, have you done ANY CHANGES AT ALL? or just left them totally alone how they are at stock?
I was running with "XMP" enabled before. I am running with XMP disabled (in other words, at default now). Errors show up both ways.
Messaggio originale di emoticorpse:
I'm wondering this because funny enough just today I was messing around in my bios, and long story short I start getting freezes and blue screens with any sort of change in my ram config where I never had that before. Currently I have the ram at total defaults because It's the only way I don't get those freezes/blue screens. I'll probably figure it out eventually but for the moment, I don't feel like messing with it.
That sounds like it isn't stable with the Expo profile, which would be interesting.
Messaggio originale di emoticorpse:
I personally wouldn't blame hardware yet though.
I didn't want to at first, but it's happened six times now, four of them being "unaccounted for". I'm even getting data loss (I mean you "technically" always do with any crash but this is actually risking setting me back as a result). That is unacceptable; the PC is unstable.

The video card and video drivers were the only things that changed, and I know people meme on AMD's video drivers but I doubt the four most recent versions are all expected to behave this way. I'm not saying it's the video card, but it does at least seem most logical to start there? Have to start somewhere, and I've exhausted trying different video drivers anyway.
Messaggio originale di A&A ✠:
Ok, I have a guess

There is a "cache hierarchy error" for a second time and the ID is different, and probably these are local APIC IDs that represent actual cores, but there is an ID 14, so maybe they represent all the logical cores in the system.

So it must be something like this???
ID: 0 = CPU0
ID: 1 = CPU1
etc...

What if HT is disabled will be there more than ID 7?
Your guess is correct. I mentioned this in my first post. The APIC ID corresponds to the logical processor. As far as I know, a "machine check exception" error is always "made" by the CPU, which is why I said it could be a Red herring and might not indicate a bad CPU. Or it might.

Searching "Error Type: Cache Hierarchy Error" on google though is a ... wild rabbit hole. From what I've found out from this so far...

That's Event ID 18.

It's always an AMD CPU involved, and namely it seems to be Rzyen 5000 series CPUs it occurs with. I found one result from the erarly 2010s pre-Ryzen days with it, though.

The causes seem to vary to, well... everything.

Some changed CPU and it fixed it. Some RMA'd CPUs and had no change.

Some had it with AMD GPUs, some with nVidia. Others had it fixed with changing to a different GPU, or inversely and like me, had it show up with a GPU change.

Others changed one of a dozen voltages and had it fixed. Others changed multiple voltages and had no fix.

Others changed RAM.

Others reinstalled Windows (haven't found a single one that stated this resolved the issue).

Wild goose chase this is sounding like. But I'm going to keep it simple. It showed up when I changed the GPU, so I should maybe approach it with that in mind and presume GPU first. I did have Event ID 19 (the warnings) in the event log before... but no crashes. Event ID 19 itself turns up even on Intel platforms, but Even ID 18 I'm only finding results on AMD.
Messaggio originale di ULTRAWOKETRYHARDCRINGE:
The 3600 ram speed isn't officially supported by AMD 5800x3d. AMD only supports 3200. I would set it to default or at least 3200, 16, 18, 18, 38

GOOD LUCK!
I appreciate the well wishes. I need them right now.

This is true, because AMD doesn't support above 3,200 MHz officially on anything AM4. Same as Intel for that time period. And I do know I have a heavy RAM configuration (four DIMMs, 3,600 MHz speed, and all dual rank DIMMs). So the thought entered my mind.

However, I'm still having the issue at stock JEDEC speeds of 2,133 MHz.
Messaggio originale di pasa:
First google hit: https://www.reddit.com/r/AMDHelp/comments/hq7jcu/cache_hierarchy_error/ a nice read like Algernon... no solution, but reasonable thought about power spikes.

If you find a way to trigger the problem in somewhat consistent way, you may experiment by forcing the GPU to do less work and see if that has influence.
Yeah, I searched Google and mentioned my findings above; definite rabbit hold this is looking to be.

I don't think it's wattage related. I can play Minecraft (and with shaders this can push it near 100% use) and it never crashes. Never. Once I press F11 to shift it to window mode, and then go to open a file explorer window, it falls apart.

Also crashed on League of Legends and that's super low demanding video and wattage wise so it doesn't seem to follow a rule of crashing under load. I'm suspecting video card maybe, but not PSU yet. It doesn't follow a trend of being fine at low load and crashing under high load. More the opposite.

Well, I asked the question specifically the way I did for accuracy even though I know it'd be confusing to see where I'm going with it. Just recently with what I was describing I had gone into the overclock settings and like changed a ram setting even though XMP was shown to be off which was weird to me. So, I was trying to simply ask if EVERY SINGLE ram setting is default because even if XMP profile is disabled there might be some kind of ram setting that is still tweaked.

But no big deal, I will assume right now your ram settings in bios are at complete stock settings and we can check something else.

I would try re-installing Windows in this case to see if it helps. But before that, I would like to ask if you tried sticking around in bios for a while and noticed these same issues happening within that? At the same time I'd try booting up with some kind of live os of any kind and seeing if you notice any of the issues.

I know that might kind of suck because you'd be sitting there inside an environment that's practically useless to you the entire time and not actually utilizing your pc though. So, I would actually opt for a Fresh windows install and even if you do still get the crashes/ issues within that you can still at least confirm it's purely hardware and nothing to do with the OS itself.

#14

GOD RAYS ON ULTRA™ 13 ott 2023, ore 9:19

So it started right after the GPU install? Does the GPU go into the pcie slot all the way? I know this sounds dumb but I've had a GPU not fully seat into the slot, particularly the back where the I/o shield is.

Other than that, if your PSU is modular you can remove the GPU cable and inspect it or try another one.

So basically it's the GPU then? Since that's when it started? Is it running at x16 lanes? Or if not maybe GPU doesn't make full connection and is running at x8 or x4? This would indicate GPU not fully seated.

Hopefully we'll get it solved. It sounds like GPU related.

#15

< >

Visualizzazione di 1-15 commenti su 149

Per pagina: 1530 50

Tutte le discussioni > Discussioni di Steam > Hardware and Operating Systems > Dettagli della discussione

Data di pubblicazione: 12 ott 2023, ore 19:40

Messaggi: 149

Inizia una nuova discussione

Regole e linee guida per le discussioni

Segnala questo messaggio