Installa Steam
Accedi
|
Lingua
简体中文 (cinese semplificato)
繁體中文 (cinese tradizionale)
日本語 (giapponese)
한국어 (coreano)
ไทย (tailandese)
Български (bulgaro)
Čeština (ceco)
Dansk (danese)
Deutsch (tedesco)
English (inglese)
Español - España (spagnolo - Spagna)
Español - Latinoamérica (spagnolo dell'America Latina)
Ελληνικά (greco)
Français (francese)
Indonesiano
Magyar (ungherese)
Nederlands (olandese)
Norsk (norvegese)
Polski (polacco)
Português (portoghese - Portogallo)
Português - Brasil (portoghese brasiliano)
Română (rumeno)
Русский (russo)
Suomi (finlandese)
Svenska (svedese)
Türkçe (turco)
Tiếng Việt (vietnamita)
Українська (ucraino)
Segnala un problema nella traduzione
If there is no difference after updating the drivers, then I would do this. If you have another display port cable, test. If you don't have then with HDMI and If you don't have then testing with the older GPU should show the problem is or isn't the GPU.
Descriptions of the crashing/PC restarting issue:
Very interesting what makes Cache Hierarchy Error
Disabling XMP should reduce the infinite fabric frequency, also if can the core multiplier be reduced to "increase stability"?
In the meantime I've done the other three things.
BIOS is updated from 1.50 to 1.70.
Drivers are updated from 23.9.3 to 23.10.1.
XMP is off (sadly).
Will be testing and letting time show results on both issues.
The way I first took that might be "faulty CPU cache" but that might be a Red herring so to speak. It just means whatever numbered logical processor is listed threw a machine check exception because it found an issue in some memory address space, likely in the CPU cache, but that doesn't mean the CPU cache itself is bad. It just means the results it was reading from it was "bad" (so could be bad software?).
So it's inconclusive what the cause is, but I'm open to suggestions on that.
There's a plethora of those WHEA errors since I updated my motherboard, but they are all warnings only. Changed the video card and now they are errors that crash the PC.
I don't want to mess with manual CPU settings too much honestly, especially since it's a 5800X3D, but also since I'm trying to troubleshoot so just ensuring stock settings work is the goal for now.
I think in you very first summary of the display issue, it'd be a good idea and correctly add in that you changed the display connection type. It says the issue started after you changed video cards, but in reality it also changed after you changed video connection type. Subtle difference, but I do think it's crucial to noting in your case. Especially, after only recently learning (after reading up for someone else) about display port design not passing a signal for power or something like that which is apparently why some monitors end up can be turned off without killing the connection between it and the pc and others totally taking the display off when it's powered off or put to sleep. Something like that. Kind of confusing but I think it's good to know this. Not that it's necessarily your problem because, I don't know if that is happening.
I'm assuming you did change the video card and went straight to display port and you didn't change the video card WHILE still being on dvi for a while and it worked fine and THEN changed to display port and then it started acting up.
Have you tried simply jiggling the cable or replugging it more firmly (I'm assuming this is an easy yes, but asking anyways).
Have you reset bios settings to default and left them there to check for stability (Tweaking certain things is fine but leaving the performance/ram/cpu stuff alone)?
Its not like it couldn't possibly be bad software, but then there would have to be a BSOD? The reason I looked at the core multiplier is that L1 and L2 caches have their speed affected by the core clock, while L3 by the fastest core. The lower clock speed of the core reached in some other way should still affect it.
Also every motherboard has some difference in the stock voltage they run the processors. It is worth checking.
I guess a Silver lining is it happened so fast I can sort of rule out that something with the old BIOS, video card drivers, or the XMP speeds as the cause.
"A fatal hardware error has occurred.
Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 2
The details view of this entry contains further information."
If there's further information these event viewers hold, please inform me as to how to find out what they are.
At this point I'm thinking I have a bad part that is worsening fast. I don't know if it's the CPU or GPU, or even something else. The RAM passed a stress test but so did the CPU. I'm totally lost at this point but I'm wondering about the GPU, since this all started after that changed.
And to add insult to injury, I found out the Minecraft world I was playing when it crashed yesterday is corrupt as a result. Might be able to recover from that one, and I have a backup from around a month ago, but... just another part about this whole thing that's annoying me now.
I did note that I changed connection type at the same time of the video card change though, didn't I?
"I was, however, using DVI before the change whereas I'm using display port now, so the cable is a variable that changed. I've tried disconnecting and reconnecting the DP cable, and swapping which end gets plugged into the video card and display. It didn't change the issue."
I mention that the connection type/cable changed and are a variable. For sure, this might be causing the display losing signal but not the crashing issue.
I changed from DVI to DP because modern video cards don't have DVI.
I never changed to DP before that because I had no inherent reason to.
And yes, I've tried unplugging and plugging in the cable and even reversing the ends. I'm going to try ruling out the cable with either another DP cable or an HDMI cable though.
BIOS settings are already mostly stock. The only things I really change are disabling the motherboard RGB, turning off scroll lock on boot, and other small stuff. The only major non-stock thing I changed before was that I was using RAM profile speeds, which I've since taken out of the equation by using default JEDEC speeds for now.
I'm not sure if there would be a BSOD or not.
Are you thinking the CPU may be bad then? I don't necessarily disagree. I'm just wondering if that's what you're thinking.
Yeah, you did mention that information later on which is how I ended up getting it. I just think it'd be worth placing in the initial sum-up of the problem way earlier where you mentioned the problem started when you changed cards.
I figured this was the case. I'm not trying to in anyway criticize the change or question why you did it.
So with the ram configuration in bios, have you done ANY CHANGES AT ALL? or just left them totally alone how they are at stock?
I'm wondering this because funny enough just today I was messing around in my bios, and long story short I start getting freezes and blue screens with any sort of change in my ram config where I never had that before. Currently I have the ram at total defaults because It's the only way I don't get those freezes/blue screens. I'll probably figure it out eventually but for the moment, I don't feel like messing with it.
I don't know what changed and why the ram profiles are giving me such a hard time now but oh well. I do know I went through a serious Windows 10 update recently, maybe it's something with Windows. Not sure.
I'm wondering if what you and I are experiencing are a parallel. Who knows though.
I personally wouldn't blame hardware yet though.
There is a "cache hierarchy error" for a second time and the ID is different, and probably these are local APIC IDs that represent actual cores, but there is an ID 14, so maybe they represent all the logical cores in the system.
So it must be something like this???
ID: 0 = CPU0
ID: 1 = CPU1
etc...
What if HT is disabled will be there more than ID 7?
If you find a way to trigger the problem in somewhat consistent way, you may experiment by forcing the GPU to do less work and see if that has influence.
GOOD LUCK!
I would run experiments disabling cores. 4 at a time in different patterns. Maybe you luck out and one of those is actually bad.
When it's a "cache hierarchy issue" that means it could be caused by driver errors, but you updated the drivers (How about the chipset?) and after replacing the motherboard did you reinstall Windows? The other thing that is also possible if the core voltage is not enough and the cache is tied to the expected set core clock, this can make it out of sync and then the issue can be address problems, register errors, or interrupt vector errors.
That sounds like it isn't stable with the Expo profile, which would be interesting.
I didn't want to at first, but it's happened six times now, four of them being "unaccounted for". I'm even getting data loss (I mean you "technically" always do with any crash but this is actually risking setting me back as a result). That is unacceptable; the PC is unstable.
The video card and video drivers were the only things that changed, and I know people meme on AMD's video drivers but I doubt the four most recent versions are all expected to behave this way. I'm not saying it's the video card, but it does at least seem most logical to start there? Have to start somewhere, and I've exhausted trying different video drivers anyway.
Your guess is correct. I mentioned this in my first post. The APIC ID corresponds to the logical processor. As far as I know, a "machine check exception" error is always "made" by the CPU, which is why I said it could be a Red herring and might not indicate a bad CPU. Or it might.
Searching "Error Type: Cache Hierarchy Error" on google though is a ... wild rabbit hole. From what I've found out from this so far...
That's Event ID 18.
It's always an AMD CPU involved, and namely it seems to be Ryzen 5000 series CPUs it occurs with. I found one result from the early 2010s pre-Ryzen days with it, though.
The causes seem to vary to, well... everything.
Some changed CPU and it fixed it. Some RMA'd CPUs and had no change.
Some had it with AMD GPUs, some with nVidia. Others had it fixed with changing to a different GPU, or inversely and like me, had it show up only after a GPU change.
Others changed one of a dozen voltages and had it fixed. Some of them even claimed ba/low voltage d VRAM. Others changed multiple voltages and had no fix.
Others changed RAM and fixed it. Others set to JEDEC and had no fix like me.
Others reinstalled Windows (haven't found a single one that stated this resolved the issue though).
Wild goose chase this is sounding like. But I'm going to keep it simple. It showed up when I changed the GPU, so I should maybe approach it with that in mind and presume GPU first. I did have Event ID 19 (the warnings) in the event log before... but no crashes. Event ID 19 itself turns up even on Intel platforms, but Even ID 18 I'm only finding results on AMD.
I appreciate the well wishes. I need them right now.
This is true, because AMD doesn't support above 3,200 MHz officially on anything AM4. Same as Intel for that time period. And I do know I have a heavy RAM configuration (four DIMMs, 3,600 MHz speed, and all dual rank DIMMs). So the thought entered my mind.
However, I'm still having the issue at stock JEDEC speeds of 2,133 MHz.
Yeah, I searched Google and mentioned my findings above; definite rabbit hold this is looking to be.
I don't think it's wattage related. I can play Minecraft (and with shaders this can push it near 100% use) and it never crashes. Never. Once I press F11 to shift it to window mode, and then go to open a file explorer window, it falls apart.
Also crashed on League of Legends and that's super low demanding video and wattage wise so it doesn't seem to follow a rule of crashing under load. I'm suspecting video card maybe, but not PSU yet. It doesn't follow a trend of being fine at low load and crashing under high load. More the opposite.
Well, I asked the question specifically the way I did for accuracy even though I know it'd be confusing to see where I'm going with it. Just recently with what I was describing I had gone into the overclock settings and like changed a ram setting even though XMP was shown to be off which was weird to me. So, I was trying to simply ask if EVERY SINGLE ram setting is default because even if XMP profile is disabled there might be some kind of ram setting that is still tweaked.
But no big deal, I will assume right now your ram settings in bios are at complete stock settings and we can check something else.
I would try re-installing Windows in this case to see if it helps. But before that, I would like to ask if you tried sticking around in bios for a while and noticed these same issues happening within that? At the same time I'd try booting up with some kind of live os of any kind and seeing if you notice any of the issues.
I know that might kind of suck because you'd be sitting there inside an environment that's practically useless to you the entire time and not actually utilizing your pc though. So, I would actually opt for a Fresh windows install and even if you do still get the crashes/ issues within that you can still at least confirm it's purely hardware and nothing to do with the OS itself.
Other than that, if your PSU is modular you can remove the GPU cable and inspect it or try another one.
So basically it's the GPU then? Since that's when it started? Is it running at x16 lanes? Or if not maybe GPU doesn't make full connection and is running at x8 or x4? This would indicate GPU not fully seated.
Hopefully we'll get it solved. It sounds like GPU related.