Installer Steam
Logg inn
|
språk
简体中文 (forenklet kinesisk)
繁體中文 (tradisjonell kinesisk)
日本語 (japansk)
한국어 (koreansk)
ไทย (thai)
Български (bulgarsk)
Čeština (tsjekkisk)
Dansk (dansk)
Deutsch (tysk)
English (engelsk)
Español – España (spansk – Spania)
Español – Latinoamérica (spansk – Latin-Amerika)
Ελληνικά (gresk)
Français (fransk)
Italiano (italiensk)
Bahasa Indonesia (indonesisk)
Magyar (ungarsk)
Nederlands (nederlandsk)
Polski (polsk)
Português (portugisisk – Portugal)
Português – Brasil (portugisisk – Brasil)
Română (rumensk)
Русский (russisk)
Suomi (finsk)
Svenska (svensk)
Türkçe (tyrkisk)
Tiếng Việt (vietnamesisk)
Українська (ukrainsk)
Rapporter et problem med oversettelse
Another idea I had was picking up external interference like from a mobile phone. Though that is not consistent with relation to F11.
My primary suspect would be the MOBO here, but guess it's not possible to test the rest of the HW with a different one.
Did you reboot into safemode to perform the DDU?
Did you enable the option to disable the Windows automatic driver install via Windows Update?
Did you disconnect from the internet when performing the DDU?
Also, the loss of display signal is almost certainly a resultant byproduct of this same issue where you experience a recoverable driver crash and the driver resets causing the display to blank and then gets signal again when the driver recovers. It is possibly other issues but given the circumstance I'd highly suspect it is directly related to the GPU change and possible driver issues.
Not directly applicable but here is an Intel document on how APIC enumeration should occur[cdrdv2-public.intel.com] on IA-64 and IA-32 CPUs. I don't know of a similar AMD whitepaper so just posting this for conceptual purposes since there was the related discussion regarding APIC.
The APIC ID in those errors is indicating the logical core where the interrupt was occurring when the error happened. In other words which logical core whatever thread that resulted in the crash was running on where the crash occurred. Don't get too hung up on that ID unless you are seeing consistent crashes all from the same specific logical core; while bearing in mind that the enumeration occurs with each boot so it may change IDs <-> physical core after a crash and reboot.
With 3rd gen Zen / Ryzen 5000, the errors you've noted I would lean toward an unstable Infinity Fabric. This is likely set to "Auto" in your UEFI/BIOS by default and may change when applying the XMP profile and not change when disabling the XMP profile. (CPUz should show what your FCLK is currently running at)
Try:
NOTE: If you are using secure boot with fTMP and CSM disabled you will need to ensure you reconfigure all of those settings in order to restore booting functionality after resetting UEFI/BIOS as described below
You should be running at the default memory timings and the SPD defined memory speed for your memory kit at this point. Run the CPUz burn-in for about 45min and keep an eye on HWMONITOR for temps and voltages of the CPU packages. If things appear stable try to reproduce the issue you were encountering by doing the things you've been doing when you've encountered the issue previously.
If that is stable; then try to manually tune your memory to 3200MT/s and manually set the FCLK to 1600MHz which we can cross that bridge when we get there. Also keep in mind that these errors could be a knock-on effect from the GPU change so try to resolve that first.
His link to the validation is already up there
https://community.amd.com/t5/drivers-software/setting-link-state-power-management-to-off-fixed-my-crashes/td-p/294814
Lol yeah I'm just blind... and I looked twice for it since I'd expect him to post it with how detailed he tends to be. We are all human I guess :)
Yeah, I miss things too. With me it's laziness and rushing it I think.
Yes, this "started" after the change of graphics card. I quote started because I recently noticed I had WHEA warning level logs (Event ID 19) going back to around the time I changed my motherboard (AM4 to AM4), but the error level logs (Event ID 18) resulting in crashing started with the video card change.
For those unaware, Event ID 19 is "a corrected hardware error has occurred", and event ID 18 is "a fatal hardware error has occurred" (and while fatal hardware errors can occur on any platform, Event ID 18 seems to show up only with AMD, and more specifically with some generations of CPUs like mine). Same thing, of sorts, but different severity. One is so bad there's no coming back from it, so it crashes.
This is basically where I'm at now. I'm having Event ID 18 crashes and I have a trail of Event ID 19 warnings from the last year. So it seems I was already close to having issues before, and the GPU change pushed that over?
Or coincidental timing. It's also possible they are two different things and I'm still apt to having the warnings, and these recent error crash ones are different.
Therefore, it's hard for me to say where the real issue is (system side or GPU side).
But my logical thinking is "new behavior since graphics card change equals start there" and then maybe look into the remaining warning level issue later, instead of conflating both issues and making it harder on myself?
I did test running at default BIOS/stock/no JEDEC/no high Infinity Fabric speeds/etc. My first rule of "I have an issue" is to try stock/default settings. As it is, I pretty much already run at default everything anyway, the only changes being that I use the XMP profile (which will result in the Infinity Fabric being set to match). Even at stock RAM/Infinity Fabric speeds (2133 MHz/1066 MHz respectively), I not only still had issues, but they showed up sooner and in situations I didn't have them before! League of Legends crashed now for goodness sake, and not even a day after running that way. At least before I was going days/weeks without issue and then it'd only crash when, like, running Minecraft for hours, making recording, and then pressing F11 to switch states and start another task.
The person who advised OCCT, I did that as one of the initial stress tests while undervolting to test said undervolt. When I changed the graphics card, it was fine for a week or so, and so I decided to try and undervolt the CPU (purely to bring temperatures down some because now my GPU was actually running cooler than it). I even did the "core cycler" method to shift a single core load across random threads. To those who don't know, this method is to help catch instability when switching load states, since a lot of the time you can be stable at idle or full load, but when switching one way or the other it might not be. Result? OCCT passed. Prime95 passed. Real world was crashing, and fast. I figured it wasn't stable so I stopped undervolting and set it back to stock. But then I had another of such crashes a few days later and then it was again fine for a week. Then another issue. That's when I knew it wasn't a one off.
When I used DDU when changing from nVidia to AMD, yes, I did it in safe mode and with the internet disconnected. I didn't touch anything with Windows update, but the OS didn't install any drivers on its own, and the drivers I tried installing went fine (I've read of a certain issue with AMD driver installation itself being tricky because Windows does something to conflict with it, and I didn't seem to have any such issues).
Besides these issues, which I know is ironic to say given what's going on, the GPU drivers and behavior/performance in games has been better than expected, but I've also not gotten very far in testing multiple things nor had it very long.
When my display loses signal/enters power save mode in a loop, the drivers are not crashing during this time. There's nothing (and I mean nothing) in event viewer during these times, nothing stating the drivers crashed and recovered either. Speaking of which, I have not had this lesser display losses signal issue since I updated BIOS/drivers, but... it's only been two days, so I'll keep updated on if that occurs again. The crashes remain, though.
Again, if you said something and want me to respond to it specifically, please ask again. I'm not trying to ignore anything but a lot was said.
Do you still have the 1060? Are you able to perform a DDU as you've noted previously (but in the advance option check the box to disable automatic driver install), then shutdown and reinstall the 1060, and install the latest GeForce drivers to test if you are still having the issues (besides the WHEA warnings you found post-facto had been occurring prior to the GPU change).
Do you have another temporary disk you can use to disconnect all of your other disks, and do a clean install of Windows 10, and do a clean install for the motherboard drivers and GPU drivers to try to rule out software being an issue?
Do you have the MSI Center installed? and if so have you tried removing it and retesting?
Do you have any RGB control software installed? and if so have you tried removing it and retesting?
Do you have the Ryzen Master software installed? Do you have PBO enabled?
Double check in BIOS/UEFI that MSI's "Gameboost" or "Creator Genie" isn't enabled/active by default.
Try manually configuring memory and FCLK. (For post-change burn-in-testing outside of your installed OS, here is a link to the PassMark Burn-In-Test WinPE Builder Guide[www.passmark.com])
For hardware just to fully understand the state of things:
Do you have both the 8pin EPS and the 4pin P4 CPU_PWR1 and CPU_PWR2 connections fed from your PSU?
Are there any other PCIe Add-in-cards installed or just the GPU?
I also noticed the RAM has two XMP profiles, and I can't see much difference between them as both are listed as 3,600 MHz at 16-19-19-39 timings, but trying to use second profile just results at it running at 2,133 MHz instead. Not too important though.
I thought of mentioning this earlier but I've already been including lots of information so I've been trying to keep it strictly to what's relevant.
I still have the GTX 1060, yes. I have the feeling putting it back into the system will result in the issue going away, but that's merely a feeling so I guess there's no place for those here. Unfortunately, given I can go up to a week or more without the issue, it's... hard to troubleshoot. And other than the crash in League of Legends at stock JEDEC RAM settings, it only seems to happen circumstantially with Minecraft. Namely (and this is referring to behavior that occurred even on the previous GTX 1060), I've noticed if I record for too long, and then stop, and then press F11 to switch to Window mode, a couple of seconds later I would get a game crash instead of a system crash, with an exit code (-1073740791[bugs.mojang.com]) that referred to nVidia 36x.xx era drivers (way old ones) as the cause, despite me having that crash code up to nVidia's latest drivers, and then the resulting recording (despite stopping before the crash) was unreadable by any video player. I thought maybe 700 GB+ videos was just... too much or whatever, and I should just avoid those situations, but maybe I'm digressing now. Point is, my first thought when I had this crash was just "it's just a situational Minecraft thing rather than a system issue, but I don't record long videos often, and nVidia just crashed more 'gracefully' in that specific situation" so I didn't think much of it... until it started happening rather often, just when doing "light" play, and now since making this thread, in another game entirely. So it's not just a Minecraft thing, clearly.
Anyway, back to the relevant stuff...
I also have a number of other things that might prove useful here if necessary. Those are my previous motherboard (Asus ROG Strix B550-F Gaming), my previous AM4 CPU (3700X), and a SATA SSD. I do not have another PSU nor DDR4 RAM for any testing.
I'm hesitant to bring the other motherboard in particular into the equation though, since I had these weird "random" restart issues with the combination of it (both the 3700X and 5800X3D) and my RAM. It was BIOS dependent to a point because a certain BIOS version in particular caused it, and on that version, it was like playing a game of Russian roulette on startup on if I'd have a spontaneous restart about 30 seconds after loading into the Windows 10 desktop (and only then; the BIOS was fine to sit in). If it passed that point, it never restarted... until it finally started doing that on even a later BIOS. Around that same time, I ultimately RMA'd it when I bought M2 drives and started using them, to find out the bottom M2 was faulty outright (this was also when I bought the new motherboard, partly to avoid downtime and partly to get X570 to utilize PCI Express 4 speeds for the second M2), so the spare motherboard I have is an RMA and while I have tested it for initial functionality, I don't know its level of operation beyond that, so to speak. Since changing to the new motherboard, none of those random restarts (or near the end, the crashes with the DRAM light on) have been a thing.
Hm, writing that out has me questioning the RAM now. Especially since that above post asked if the voltage is the same at JEDEC speeds, and no it's lower (which makes sense as it should need less) and it crashed sooner there.
I have tested RAM and it supposedly passes, though if you have suspicions/suggestions here, I'd be open to them.
I do.
I have an old 1 TB SATA SSD I could use, or I could even clear one of my 2 TB M2 SSDs temporarily.
Funny thing about this is I was considering moving to Windows 11 soon as well. I was using it not long after it launched, but I think the issues AMD platforms were having with it for a short time made me retreat to my Windows 10 install for the time, and I simply never moved back to it yet out of procrastination, which is funny considering I liked the look and layout of Windows 11 better than Windows 10. Didn't like the right click changes or extra mess of file associations though (not sure of the state of those today though).
No to most/all of these, unless Sapphire Trixxx or whatever it's called counts, which I needed to disable the RGB on the video card. But as far as I can tell, that doesn't "run" since it seems like it may have set the RGB state on a firmware level (?) since the RGB never comes on at all anymore, not even in the BIOS, and there's no sign of Trixxx ever running.
For PBO I'm not sure. I use a 5800X3D and the motherboard BIOS just says "auto" for it.
I used to use Ryzen Master, but it would randomly throw up a command prompt/terminal (or whatever you want to call it) window when checking for updates, which would steal focus from whatever I was doing, and I got annoyed with that and thus uninstalled it. I wasn't using it for anything really important.
I can confirm this is off, which the exception of when XMP is enabled since it highlights the RAM spot for this when it is. But it's off for the CPU.
I'll try this and report back. Also, thanks so much for the exhaustive list of things to check/try.
Yes, all spots for connectors in the top left of the motherboard are supplied with cables from the PSU.
No, only the graphics card.