A fatal hardware error has occured (Event ID 18)
Edit: I've rewritten this post, both to update it and make it more concise to the primary issue.

I've been dealing with two issues an issue lately (Edit: one has been solved). This started shortly after I changed my video card (GTX 1060 to RX 7800 XT).

--------------------------------------------------

Summary of the issue:


The display sometimes goes Black (display backlight stays on though), and then the PC restarts on it's own (most of the time) or stays that way (one or two times). The time between the screen turning Black and the PC restarting varies.

These are not BSODs. I'm not seeing one (I have automatic restart on BSOD disabled) nor does anything show up insofar as minidumps, memory dumps, or event logs indicating such. What I am getting is a log for Event ID 18 every time (which is a machine check exception of "a fatal hardware error has occurred"), and nothing else (besides the expected Event ID 41 and Event ID 6008, which are merely byproducts of the unexpected shutdown). Details about this issue are below.

--------------------------------------------------

PC Specifications:


https://valid.x86.fr/306yr6

PSU: EVGA SuperNova G5 750W
CPU: Ryzen 7 5800X3D
CPU cooling: Be Quiet Dark Rock Pro 4
Motherboard: MSI MAG X570S Tomahawk Max WiFi (7D54v17 BIOS)
RAM: 64 GB (4x 16 GB) G.Skill Ripjaws V 3,6000 MHz 16-19-19-39 1.35V
GPU: Sapphire Nitro RX 7800 XT (23.10.2 drivers)
SSD(s): 2x Western Digital Black SN850X 2 TB (latest firmware on both)
HDD(s): 1x Western Digital Black 5 TB
2x Western Digital Blue 8 TB
Display: Dell U2410 24" 1920 x 1200/60 Hz (connected via display port and HDMI)

Everything is at stock, with the exception of the "XMP" RAM profile speed being enabled.

--------------------------------------------------

Detailed description of the issue:


For a week or so after I added the video card, things were fine.

I tried undervolting my CPU (all core offset of -30, and then -20), and they passed some initial stress tests each time, but failed in real world scenarios. The screen would go Black, and the PC would restart, and this was the first time the issue occurred. Both times I was met with Event ID 18 in the Event Viewer.

"A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 14

The details view of this entry contains further information."


The APIC number varies each time I get this issue, as this corresponds to the logical processor that threw the exception.

I chalked it up to instability, and set the CPU back to stock.

Not long after, it happened again... and it irked me, but I thought maybe it was a one off and waited to see if it would continue.

For another week or so, things were again stable.

I then got another one (this was the point I initially made this thread). And then another. And then another. And... I'm losing track. And they seem to been escalating.

--------------------------------------------------

Troubleshooting I've attempted:


1. I've updated the motherboard BIOS. V1.5, V,1.7, and V1.8.

2. Windows 10 is up to date.

3. AMD chipset drivers are up to date. Audio drivers are up to date. Ethernet drivers are up to date. Bluetooth and WiFi drivers are up to date. Etc.

4. I've updated video card drivers as new ones have become available. Both issues have persisted on all drivers I've tried, including 23.9.1, 23.9.3, 23.10,.1, 23.10.2, and 23.11.1. I'm not even getting any event logs about the drivers crashing and recovering.

5. I've used DDU to uninstall and reinstall the video drivers. Yes, I used safe mode. Yes, I disconnected the internet.

6. I've reset the BIOS or otherwise done things (see number 13 below) that leave me with a reset BIOS who knows how many times.

7. I've disabled XMP (seems like it may have made it worse, but that might just be coincidental), set XMP but scaled back RAM frequency/IF clocks a bit to 3,200 MHz/1,600 MHz respective. So it doesn't matter RAM/IF is set to 2,133 MHz (JEDEC default)/1,066 MHz or 3,200 MHz/1,600 MHz or 3,600 MHz/1,800 MHz respectively, they all have the issue. This seems to rule out RAM or Infinity Fabric instability?

8. I've run stress tests galore. Windows memory diagnostic (might not be very conclusive on its own but I did it), MemTest86+, Prime 95, BurnInTest, and the majority of the OCCT suite. All passed, with the exception of the "GPU variable" test in OCCT, which immediately caused the crash the first time I attempted it, but then succeeded on a subsequent attempt (at first I was happy, ironically, that I may have found a reproducible cause, but it seems I didn't).

9. I've tried connecting the DP cable to both output ports on the video card (mine has two DP and two HDMI instead of three DP and one HDMI).

10. I've tried HDMI.

11. I've adjusted the ASPM setting (PCI Express > Link State Power Management > Off).

12. I've completely reinstalled Windows 10!

13. I've completely, and I mean completely, took my PC apart down to the part, cleaned it (though it was already rather clean), and reassembled it. This was to rule out a bad connection anywhere. I even swapped RAM around, and the CPU was also reseat.

14. The video card is a Sapphire Nitro+ RX 7800 XT which has a BIOS switch with three positions (one performance BIOS, one silent BIOS, and the other is just a mode that lets you change it on the fly with the Sapphire Trixxx software). I've tried both BIOS/all three positions.

15. I've used "Driver Verifier" which is something Windows includes and followed the instructions here[answers.microsoft.com] to stress test the drivers. This was inconclusive, but not useless. Since the issue doesn't yet have a known reproducible, on demand cause, I have to wait, but this tends to cause it to occur sooner. Unfortunately, the Driver Verifier does not catch anything and give me a notice of any violations it detected. Maybe because the drivers are fine and the issue isn't drivers but hardware itself. I'm reading machine check exceptions are, as a rule, almost always hardware and not software.

16. I've found some people saying they suspect the issue the card boosting above where it should. I've tried limiting the boost to 2,500 MHz (default maximum is 2565 MHz) but it doesn't seem to truly respect this. Nonetheless, it made no difference. Along with this, I tried disabling the "Zero Fan" (this stops the fan when the temperature is below a certain temperature) as someone suggested, and this also made no difference.

17. I've tried disabling ULPS.

18. I've tried disabling MPO.

19. I've tried a 3700X in place of the 5800X3D. It happens on both.

None of these troubleshooting steps have resolved the issue.

--------------------------------------------------

Troubleshooting I'm needing to do since the above failed, and I think I want to try both first before deciding to proceed down any RMA path:


1. Try my old video card to see if the issue indeed goes away.
Last edited by Illusion of Progress; Nov 8, 2023 @ 1:08pm
< >
Showing 136-149 of 149 comments
emoticorpse Nov 9, 2023 @ 5:48pm 
Originally posted by Illusion of Progress:
Well, when I've just about exhausted ruling out everything else, the timing of the arrival of the behavior matches up exactly with a given hardware part being changed, and there's more than an expected amount of reports for that exact hardware part turning up describing the same behavior on different CPUs, different motherboards, different RAM, different PSUs, and many of them also exhausted just about everything else... that's seems super suggestive as to what is most likely, but maybe that's just me.

Take a read through some of these and tell me it doesn't sound like exact deja vu of my entire thread here?

https://www.reddit.com/r/AMDHelp/comments/17r69gz/the_7800_xt_experience/

https://www.reddit.com/r/AMDHelp/comments/17r5e0o/defective_7800xt_graphics_card/

https://www.reddit.com/r/AMDHelp/comments/17qyvn6/just_installed_brand_new_7800_xt_and_it_crashes/

https://www.reddit.com/r/AMDHelp/comments/17qh91m/sudden_pc_reboot/

https://www.reddit.com/r/AMDHelp/comments/17pehz5/random_crash_with_whea_logger_event_18/

https://www.reddit.com/r/AMDHelp/comments/17p9gdc/nitro_7800xt_black_screen/

https://www.reddit.com/r/AMDHelp/comments/17o6awq/is_my_gpu_faulty/

https://www.reddit.com/r/AMDHelp/comments/17mjsix/7800xt_cs2_black_screen/

Here's a couple with videos.

https://www.reddit.com/r/AMDHelp/comments/17q7v5r/brandnew_rx7800xt_crashes_after_a_few_minutes_to/

https://www.reddit.com/r/AMDHelp/comments/17quauf/7800xt_black_screen_video/

There's some with 7900 XT/XTXs reporting it too. Not 7800 XT, but RDNA 3 as well.

https://www.reddit.com/r/AMDHelp/comments/17m8bct/7900xtx_randomly_causing_my_computer_to_restart/

https://www.reddit.com/r/AMDHelp/comments/17lpjwk/please_help_started_getting_driver_timeouts/

https://www.reddit.com/r/AMDHelp/comments/17lzw0m/7900xtx_black_screens_fans_at_100_wont_be/

Different CPUs, different motherboards, different RAM, different PSUs, people trying to change those anyway and the issue remaining, basically trying everything I did here (Windows reinstall, DDU, ensuring BIOS/drivers are up to date, XMP on and off, disabling ULPS and MPO, messing with clock speeds and/or voltages of CPU, RAM, and GPU [it should never come to this!], you name it), and the common factor always seem to be the 7800 XT (across all brands too) resulting in these black screen to reboot behaviors.

No, the 7800 XT isn't the only GPU to have ever had this issue, but it's really looking like there may be something to the 7800 XT in particular having this issue right now. Keep in mind this is a relatively new GPU, and for a brand that's less popular (although this one is certainly selling well), yet these reports are all over. And that above was found by simply looking through the last week alone on one Reddit community; it's not like I went and did a complete web search or looked across a wide time range here.

Sure you could argue there's "sampling bias" to some degree because the Reddit is called "AMD Help" after all so of course there's going to be only issues there, but I can't just ignore that this very behavior that showed up for me with a given GPU change sure seems to have a lot of feedback for the very same behavior.

Think what you want, but it seems awfully suggestive to me.

Maybe high demand led to a bunch of bad batches of 7800 XTs? Maybe AMD doesn't quite have this GPU/driver combination stable? Who knows. But something like that sounds more reasonable than "my previously stable system just happened to go unstable with an issue that so many others are having with the same GPU despite other hardware and a ton of software variables being ruled out".

It's just way, way, way to darning of evidence, if you will.

I would not find it hard to think AMD's latest stuff is too buggy to be worth dealing with which one of the main reasons I went with Nvidia (even though I actually feel all their gpu's across the board are have high rates of problematic behavior). But, I really haven't said that or don't want to say it because I remember you giving them the benefit of the doubt and that those stories were just anecdotal and I still don't want to risk increasing tension because I'm not trying to berate them. Just agreeing with their history.

But, I will say this in defense of AMD. I do feel standardization of hardware is a thing and that only so many configurations of hardware can really be focused on and optimized. That being said, I wouldn't blame them if they didn't catch a system with four ram slots occupied and as many drives as you have, and that screen resolution and your gpu+the 5800x3d in the mix and fix something that about it that caused a 7800 xt to bug out.

I don't think your issues is typical of a bug caused by things as simple as an odd screen res, of just having more than typical ram slots occupied, or anything simple like that but I do think it is possible.

This is just things I'm thinking right now.
Last edited by emoticorpse; Nov 9, 2023 @ 5:49pm
Sure, I mentioned that sampling bias could be part of this.

I also mentioned this is a relatively young GPU from a less popular brand, and those results all turned up from a short time span on one web site alone.

Contrast that to the RTX 4070 Ti which has been out longer and likely has more users (could be wrong on that second part?). Now also remove the number of incidents where it actually is some system issue, unlike all these reports that basically describe ruling out every variable but the video card and are left without resolution.

How do they compare then?

We don't know.

My point is I'm not trying to make a factual statement to the frequency of the issues 7800 XT so much as I'm saying "I've personally tried almost everything else, evidence points to GPU, and now also there's a lot of other deja vu stories". It's just... pretty suggestive for what is likely the cause of my issue is all I'm saying.

You suggested it yourself, right? How long should I be expected to "continue to let myself to live among the issues" troubleshooting before I just rid myself of what introduced those issues and going with something else?
Originally posted by emoticorpse:
But, I really haven't said that or don't want to say it because I remember you giving them the benefit of the doubt and that those stories were just anecdotal and I still don't want to risk increasing tension because I'm not trying to berate them. Just agreeing with their history.
I indeed gave them the benefit of the doubt because that's how I felt at the time; I feel everything deserves a chance. I haven't any issues with ATI or AMD GPUs in the past, but it's been a while since I used one in my primary PC. And I think even if drivers are involved here, the 7800 XT might be a bit more problematic than usual, even for AMD's reputation. But that certainly could be coming from a place of being someone dealing with it. Maybe it gets resolved in time. Maybe the GPu or drivers are not even the issue and its something else for me (I'm really doubting this more and more though).

Either way, I'm certainly suffering an issue that really seems to be the GPU, and it's pretty eye opening to see there's other with stories that almost mirror mine.
Originally posted by emoticorpse:
I don't think your issues is typical of a bug caused by things as simple as an odd screen res, of just having more than typical ram slots occupied, or anything simple like that but I do think it is possible.
Those don't seem to be a common variable. Others with less DIMMs, less storage, and typical aspect ratios are also having the issue. The 7800 XT is the constant. Everything else varies. (Edit: Actually I just noticed there could be one other constant as this seems to most commonly be happening on Ryzen platforms too... that would be really strange if an AMD GPU is free of issues on an Intel platform, or an nVidia GPU is free of issues on an AMD platform, but AMD paired with itself isn't?)

Of course I can't speak from a place of fact; I don't have the numbers of these GPUs sold versus people having issues so i can't say. But I see no reason those things in particular (the DIMM count, number of storage drives, or screen resolution) have any bearing here.
Last edited by Illusion of Progress; Nov 9, 2023 @ 6:18pm
pasa Nov 9, 2023 @ 6:51pm 
I sampled your links: indeed similar stories and the comments also similar to what we have here. And the usefulness is moot: didn;t see a single one that provided a solution or followup of the story. (that actually may be just due to timing and will get there later, but I have my doubts).

Particularly, did you find even one story "I rma'd the card, got it fixed/swapped, now everything is dandy"? Or even one that actually swapped it to nvidia not only wrote about the wish.

The engineering way is to look at working solutions, not the problems -- reports are easy to submit, especially on the internet and only create a crowd, not progress.

Also media amplifies like crazy. I recall some fancy card with vapor chamber issue that was supposed to kill the provider for good. It was all around with videos and everything. Few weeks later the world was still around -- also some real number rolled in on frequency that was in the lottery win range. Some random batch was not properly filled. those got replaced and no one heard about it since.

Some bad hw happens no matter the brand. Driver issues are common, but also the common manifestations that are driver related usually got fixed in hew months time. Maybe gaining new ones.

Back in time (~2010) I was strictly "nvidia only" for the driver reasons -- not so much the drivers themselves, but it was crystal clear that game studios test exclusively on nvidia cards and everything else is up to luck. But that very soon changed: nvidia went full ♥♥♥♥♥♥♥ mode around when the 10xx series went out. And the field got pretty even. The legends keep up in heads, but I don't see any reality behind it. I'd definitely would have stayed with nvidia if though there's any remaining edge.

the rx7800 drivers are expected to improve for being too fresh -- if the black screen is pure software, that is in doubt, it may go away. But you can't trigger machine check exception from gpu driver in any way. with some rowhammer-like attack might make some bit errors and vild crashing, but not cache hierarchy error and not consistently. Well, certainly not counting indirect effect from just using the engines in the gpu for work and so consuming power.

And bad interactions just happen. I doubt samsung makes worse ram than hx, yet early ryzens had lots of issues with the former (while the intel didn't care). And some edge still lingers. When we push everything to the edge it is probably expected -- and falling back is not necessarily a solution.
You're not entirely wrong. A lot of the reasoning you're putting forward is precisely why I've been trying to exhaust as many things as possible to try and find a solution instead of jumping at returning the card for a full refund at the first sign of a slight issue.

I'm not new to dealing with PCs or having to troubleshoot with them at times, and I'm certainly not new to dealing with lesser issues. My keyboard has a broken "Tab" key, so that key and the "|" were switched since I never use the latter. The LEDs for the WASD and arrows started intermittently going out, and now the key that controls the LED brightness for it also stopped being functional, so when they do rarely come on, they are blindingly light as they are stuck at the highest brightness level where I use either the dimmest or second dimmest for the rest. My speakers sound like they are raising the dead if adjust the volume while they are on. They are also getting harder to turn on and stay on (first time turning them on might need a few attempts or they go off by themself). My display has it quirks too. On and on. This is sort of why it upset me when you accused me of creating my own issues by not dealing with this. I "settle" probably more than many people here do. And this is certainly a show stopping issue, not a minor one I can just deal with (certainly not longer term at least). So I definitely '"did my dues" to try and work through this first.

I just hope that doesn't come back to make things worse for me in the end, because if I had returned it while there was a chance, then if this actually is an issue with the graphics cards or drivers, then I wouldn't be at the mercy of hoping they improve just to get a properly functioning card. And reading people who went into the RX 6000 series say they stuck with it for two years and are still suffering is... not reassuring on that front.

But hindsight is 20-20, as they say. You won't know unless you try, and sometimes you win the bad experience lottery, even with otherwise good parts.

In any case, by referencing these other issues, I'm not so much as trying to proclaim "there's an objective issue with these and this proves it" so much as I'm saying "I have an issue, the behavior and the things I've ruled out on my own already heavily suggest one particular thing, and these other happenings sure seem to match my issue well and support the same". I hope you can understand the difference.
Originally posted by pasa:
Particularly, did you find even one story "I rma'd the card, got it fixed/swapped, now everything is dandy"? Or even one that actually swapped it to nvidia not only wrote about the wish.
It's still early so I haven't seen anyone mention a successful RMA and it being a solution to this problem yet, no.

But then again, that wouldn't solve it if the issue was the drivers anyway. That would solve it only if that specific GPU sample had a hardware fault. So that not being reported as a fix doesn't absolve the drivers as being a possible cause.

One person apparently got turned away by Sapphire and directed to take their issues to the AMD forums because their issue sounds like driver issues to Sapphire instead of a hardware issue.

I did see others saying they didn't have the issue on a prior GPU (ranging from both AMD to nVidia), then had it on the 7800 XT they switched to, and either went back to the prior GPU or just switched/were intending to switch to something else entirely. And yes, some of those said that stopped the issue indeed. This also matched my experience where my GTX 1060 hasn't yet shown the issue before the change, and has yet to afterwards (though I'm still needing to confirm the issue hasn't followed me back to the GTX 1060, which is what I'm going to be in the process of trying to rule out now, but I have a guess here that they won't).
Originally posted by pasa:
But you can't trigger machine check exception from gpu driver in any way. with some rowhammer-like attack might make some bit errors and vild crashing, but not cache hierarchy error and not consistently. Well, certainly not counting indirect effect from just using the engines in the gpu for work and so consuming power.
How certain are you of this?

From what I know, machine check exceptions do indeed seem to usually be a hardware issue as opposed to a software one, which is why I was doubtful most of my software troubleshooting would get me anywhere (but I did it anyway to formally rule it out and not have to question it). But while that's usually the norm, I'm finding it's mentioned that drivers can sometimes be a cause of them?

The Watchdog logs are giving me these.

VIDEO_ENGINE_TIMEOUT_DETECTED (141)

VIDEO_TDR_TIMEOUT_DETECTED (117)

VIDEO_MINIPORT_BLACK_SCREEN_LIVEDUMP (1b8)

Those seem to point to the GPU and/or the drivers? Ergo, maybe a fault with the GPU or drivers is why the machine check exception/fatal hardware exception is getting thrown?
While I have made some further progress on this (if it can be called that) since the last time I posted, I don't think there's yet anything conclusive to say, so I was waiting on that before I made a formal update myself.

But since I was asked about it elsewhere and didn't want to post about my issues in someone else's thread, here's an inconclusive (perhaps key phrase) update.
Originally posted by emoticorpse:
Hey did you ever get your computer fixed? I was wondering that the other day.
After some more troubleshooting (and making another thread on another forum), I did end up at the eventual step of sending the RX 7800 XT out for RMA. I got it back a few days ago. The return from RMA was labeled a "replacement" and has a new serial number so I presume that means Sapphire found something wrong with the one I sent in, but they didn't state that nor what it might have been.

I haven't yet had enough time with it to conclusively see if it resolved the Black screen to restart issue (less than a week, and it's a busier holiday time), and I feel like I'd need up to a solid month or two, maybe even three, to fully be assured of that particular behavior being gone anyway, but I've seen some other concerning things in the little time I have had to use it.

1. I'm now having some TDR/driver crashes where I wasn't having any before. Thus far these are limited to one game, and its one that a lot of its players back in the day said had crashing issues on AMD GPU hardware/drivers. That was years and years ago, but it's not a good sign.

2. I'm also experiencing reduced performance, at least in Minecraft. It's basically lower performance across the board and has a lot more stuttering and hitching. I noticed utilization is also strangely staying around ~80% (plus or minus some) utilization almost all of the time, regardless of the scene (this seems somewhat odd...), and it wasn't like that before. It seldom drops below this, and if it does, it's not often nor by much.

I could greatly elaborate on the details with Minecraft but I won't. I don't need to. The key thing is it's just basically much worse performance now, with an oddly "nearly locked" utilization level. It's either the newest 23.12.1 drivers with Minecraft (I never tried these before as they released after I sent it out for RMA), or there's something different wrong with this individual RX 7800 XT, because those were the only two changes relative to before.

It's at the point where even if the Black screen to restart issues are gone (which is still not verified), I'm concerned and upset between the TDR/driver crash issues appearing and the massive lower performance in a major game that I play. It makes me want to just consider taking a massive loss on it by selling it second hand, and taking a further loss by going with whatever highway robbery nVidia ends up charging for their cheapest upcoming RTX 40 series Super variant that has 16 GB VRAM. I was originally considering buying a new PSU before giving up on the graphics card, but that was if the replacement 7800 XT behaved exactly as the first. With these changes in behavior with the second 7800 XT, I'm lost.

If there's one word to describe this experience, it's "exhausting". That's it. It's just tiring. I'm four months in, I'm ~$700 in, and there's even been some "lost" data, and I'm still in a position where something isn't measuring up or working properly and I'm looking at more wasted money and time ahead? In my nearly twenty or so years at this, I've never had as bad of an experience as this one. Never. I'm not sure if I'm one of the ones Radeon just doesn't work for for whatever reason, or if there's something else really deep going on with my PC somewhere that only showed up with the graphics card change. Either way it's gotten frustrating long ago.
plat Dec 24, 2023 @ 1:03pm 
Well, this is frustrating! I think you're invested enough time, energy and patience with this whole thing. Is there any way to return this model entirely and exchange it for another? I didn't read enough to where it was mentioned where you bought this so can't comment on any return/refund/exchange policy but the expectation that these devices should work the first time around is very, very real.

No more compromises.
emoticorpse Dec 24, 2023 @ 1:59pm 
That sucks. At this point, would you consider just dropping it off at a pro (not to imply you don't know what you're doing, because I know you do). But, just that might be worth the money for them to diagnose it, give it a fresh look maybe from a perspective you haven't seen already and they can tell you what's wrong and you pay for the diagnosis and maybe they can figure it out and tell you what you need to do and maybe they can even fix it without changing parts so that you don't have to return the gpu if you don't need to?
Originally posted by plat:
Well, this is frustrating! I think you're invested enough time, energy and patience with this whole thing. Is there any way to return this model entirely and exchange it for another? I didn't read enough to where it was mentioned where you bought this so can't comment on any return/refund/exchange policy but the expectation that these devices should work the first time around is very, very real.

No more compromises.
Unfortunately, no, I am well past that point. It was purchased from Newegg in the middle of September and by time I was far enough into knowing I had an issue and that it might just be a "this hardware part isn't working for me" situation, it was past the return point. I initiated the RMA late November/early December.

But thank you for the acknowledgement I've put enough time into this! I didn't want to return it right away because I felt like that might be overreacting to what could be a smaller issue, but... nobody can say I didn't put the proper time and effort into it after all this.

Unfortunately, I might have to sell it second hand on a loss and go back to nVidia. And oh joy, those power cables on the RTX 40 series! Out of the frying pan and into the fire...

Someone please put me out of my misery now. And some people have the gall to say the graphics card market has never been better for consumers...
Originally posted by emoticorpse:
That sucks. At this point, would you consider just dropping it off at a pro (not to imply you don't know what you're doing, because I know you do). But, just that might be worth the money for them to diagnose it, give it a fresh look maybe from a perspective you haven't seen already and they can tell you what's wrong and you pay for the diagnosis and maybe they can figure it out and tell you what you need to do and maybe they can even fix it without changing parts so that you don't have to return the gpu if you don't need to?
No offense taken, so don't worry. If anything, I've been questioning my ability to diagnose this the further it goes on because like I said, I've never dealt with anything like this before.

I don't think that's worthwhile at this point though. Partly because it's going to come down "swap parts until it gets working right" either way, and the rest of the stuff is known to work when the 7800 XT/AMD's drivers is removed from the equation. I've already put a lot of money into this, like ~$700 counting the graphics card, tax, and shipping for the RMA. Then there's the possibility I'm looking at the better part of a grand (!) if I have to go back to nVidia to rectify this. Alternatively, I could try buying another PSU if I wanted to jump to something other than the graphics card, but if that fails, I'm then that much more in the hole. Either way, either of those make sense to jump to versus paying someone the better half of what a PSU would cost to try myself.
Last edited by Illusion of Progress; Dec 24, 2023 @ 2:59pm
Informal mini update since the above itself was an early informal update...

The strange utilization (being "stuck near 80%") in Minecraft is gone, hopefully for good. I noticed yesterday that this "stuck utilization" appeared to be what Afterburner was showing, but Minecraft itself showed a more fluctuating utilization. So maybe it was just some driver or Afterburner thing. Note that Minecraft's utilization itself is known to be wrong more often than not.

What I did was reinstall the drivers. I wanted to try to go back to 23.11.1, but the installer I downloaded was for 23.12.1 I guess. Oh well, I went with the latest again but it worked.

The whole reason I reinstalled them now? I saw this on startup...

https://i.imgur.com/3hlMxqC.png

https://www.amd.com/en/support/kb/faq/gpu-sacerkl

I have no idea what that means.

During driver install, my keyboard/mouse completely cut out (keyboard LEDs went off and my mouse which is connected to the keyboard passthrough also sdtopped working) and the display lost signal for about half a minute, then all came back. That never happened before, but hopefully there's nothing more to it.

Upon restart, the wallpaper was missing and the right click context menu took some seconds to show up instead of being instant, and this "fixed itself" when I brought the personalization menu up which made the wallpaper show up.

So... drivers reinstalled, had some weird behavior, utilization is now fixed.

Still don't know if Minecraft's performance is back to normal or not (Edit: Performance still seems lower based on an informal "recording performance test"), and the previous TDR issues need checked. Still need to see if the Black screen to restart issues are gone as well. Will update as I know more, and happy holidays everyone. Hope they're going well.
Last edited by Illusion of Progress; Dec 25, 2023 @ 11:15am
emoticorpse Dec 25, 2023 @ 1:49pm 
Originally posted by Illusion of Progress:
Informal mini update since the above itself was an early informal update...

The strange utilization (being "stuck near 80%") in Minecraft is gone, hopefully for good. I noticed yesterday that this "stuck utilization" appeared to be what Afterburner was showing, but Minecraft itself showed a more fluctuating utilization. So maybe it was just some driver or Afterburner thing. Note that Minecraft's utilization itself is known to be wrong more often than not.

What I did was reinstall the drivers. I wanted to try to go back to 23.11.1, but the installer I downloaded was for 23.12.1 I guess. Oh well, I went with the latest again but it worked.

The whole reason I reinstalled them now? I saw this on startup...

https://i.imgur.com/3hlMxqC.png

https://www.amd.com/en/support/kb/faq/gpu-sacerkl

I have no idea what that means.

During driver install, my keyboard/mouse completely cut out (keyboard LEDs went off and my mouse which is connected to the keyboard passthrough also sdtopped working) and the display lost signal for about half a minute, then all came back. That never happened before, but hopefully there's nothing more to it.

Upon restart, the wallpaper was missing and the right click context menu took some seconds to show up instead of being instant, and this "fixed itself" when I brought the personalization menu up which made the wallpaper show up.

So... drivers reinstalled, had some weird behavior, utilization is now fixed.

Still don't know if Minecraft's performance is back to normal or not (Edit: Performance still seems lower based on an informal "recording performance test"), and the previous TDR issues need checked. Still need to see if the Black screen to restart issues are gone as well. Will update as I know more, and happy holidays everyone. Hope they're going well.

Do you always have afterburner running? Like all the time? Is it on your startup and it was present through all these issues or it was for the most part closed and you only run it rarely?
AmaiAmai Dec 25, 2023 @ 2:53pm 
Originally posted by Illusion of Progress:

The whole reason I reinstalled them now? I saw this on startup...

https://i.imgur.com/3hlMxqC.png

https://www.amd.com/en/support/kb/faq/gpu-sacerkl

I have no idea what that means.

It means that something updated your drivers and reverted them to an older version of the driver or version that wasn't compatible with Addrenalin (basically a driver installed without your permission that is not what installed). Some software like MB software, Windows Update, benchmarking software, or other tools might update your driver automatically.

Make sure you disable driver downloads from Windows Update, that is important on AMD because if you don't sometimes Windows will replace the driver with the one in the Windows Update branch -- that one can be a very old basic driver, or a testing / debug driver if you are in a Windows Insider build.

It's a very annoying bug and issue to pin down sometimes because even motherboard software or software like HP OMEN GAMING HUB (even if you have it for your keyboard) might modify your GPU drivers without telling you.

For Windows:

https://www.makeuseof.com/windows-stop-automatic-driver-updates/

In some cases the above may not work, you can post here

https://answers.microsoft.com/en-us

and problem looked into more.

But what is strange is this:

Originally posted by Illusion of Progress:
During driver install, my keyboard/mouse completely cut out (keyboard LEDs went off and my mouse which is connected to the keyboard passthrough also sdtopped working) and the display lost signal for about half a minute, then all came back. That never happened before, but hopefully there's nothing more to it.

Upon restart, the wallpaper was missing and the right click context menu took some seconds to show up instead of being instant, and this "fixed itself" when I brought the personalization menu up which made the wallpaper show up.

Never heard of the wallpaper going missing or seen that happen as the driver should never touch it, at least I have never seen any report of it. There may be other software that is interfering with the driver download that resulted in that behavior (modified start with basic driver and no personalization services -- then load with GPU driver + UI services on trigger).

But if it only happened once, perhaps it is nothing to worry about.
Originally posted by emoticorpse:
Do you always have afterburner running? Like all the time? Is it on your startup and it was present through all these issues or it was for the most part closed and you only run it rarely?
No I don't, but I like to use it while playing Minecraft.

Also, the stuck at ~80% utilization returned right after I posted that. I notice it only occurs in full screen mode and not windows.

I guess I'm not too concerned with it. It's the TDR issues, the maybe-gone-maybe-still-there Black screen of death issue, and the seemingly reduced performance issues with Minecraft (23.12.1 lowered it maybe?) that I'm more concerned with.
Originally posted by AmaiAmai:
It means that something updated your drivers and reverted them to an older version of the driver or version that wasn't compatible with Addrenalin (basically a driver installed without your permission that is not what installed). Some software like MB software, Windows Update, benchmarking software, or other tools might update your driver automatically.

Make sure you disable driver downloads from Windows Update, that is important on AMD because if you don't sometimes Windows will replace the driver with the one in the Windows Update branch -- that one can be a very old basic driver, or a testing / debug driver if you are in a Windows Insider build.

It's a very annoying bug and issue to pin down sometimes because even motherboard software or software like HP OMEN GAMING HUB (even if you have it for your keyboard) might modify your GPU drivers without telling you.

For Windows:

https://www.makeuseof.com/windows-stop-automatic-driver-updates/

In some cases the above may not work, you can post here

https://answers.microsoft.com/en-us

and problem looked into more.
Thanks for the information. If it returns, I'll look further into it. Everything seemed to work fine, and I initially installed the drivers with a DDU run beforehand (and I had it set to disable Windows installing the drivers). Adrenalin itself opened and worked fine and didn't complain about a version mismatch.

Originally posted by AmaiAmai:
But what is strange is this:
Originally posted by Illusion of Progress:
During driver install, my keyboard/mouse completely cut out (keyboard LEDs went off and my mouse which is connected to the keyboard passthrough also sdtopped working) and the display lost signal for about half a minute, then all came back. That never happened before, but hopefully there's nothing more to it.

Upon restart, the wallpaper was missing and the right click context menu took some seconds to show up instead of being instant, and this "fixed itself" when I brought the personalization menu up which made the wallpaper show up.
Never heard of the wallpaper going missing or seen that happen as the driver should never touch it, at least I have never seen any report of it. There may be other software that is interfering with the driver download that resulted in that behavior (modified start with basic driver and no personalization services -- then load with GPU driver + UI services on trigger).

But if it only happened once, perhaps it is nothing to worry about.
Yeah, it was strange to me too as I never saw it, but if it was a one off then oh well. I was mostly just noting it.

When the keyboard first went out I thought it was Black screen restarting actually, because my display did lose signal and start restarting on repeat. This has been another issue I have had with it. The wallpaper was fine until I restarted, and then it was Black (the "background color" I have chosen) with a delayed context menu response, and then when i chose to go into "personalize" it went as it should.

I've restarted the PC twice since then and it hasn't done it.

Small stuff like this is normally a non-issue for me if it happens once so it's not a big deal, but with everything else going on it has me second guessing everything. Like my thoughts race on "what could cause this" when stuff happens now. My thought was the graphics drivers initializing, which happens through the PCI Express stuff on the CPU, caused a CPU-side instability and it brought the USB down for a moment? Like I said I am basically second guessing my whole PC ever since I tried upgrading my freaking graphics card. Because the issue came with the GPU but changing platform side stuff (basically XMP on versus off) impacted it. It was unstable either way but worse with XMP off. With the old graphics card it's stable with XMP on or off.

Seriously makes me want to cry, throw the whole PC out, and start over if these issues remain. This stuff is getting old.

*sigh* Sorry for ranting, just going to take it a day at a time now. If the Black screen reboot issues don't return, that's a big improvement right there and signifies the first 7800 XT just had something wrong. Then I can go from there.
Small update and I'm freaking happy about it (I'm sticking with my story of "blaming" emoticorpse for drawing an earlier than conclusive update out of me, haha). I went one driver version back to 23.11.1 and all the differences with Minecraft disappeared. Like that. Gone. It seems 23.12.1 was causing me some issues with Minecraft? (I will mention that the awkward dropping of USB stuff while installing the drivers happened again though so that isn't a 23.12.1 thing but something only occurring with the new 7800 XT, but as odd as it is, if it only happens during those times, then I'm not sure if I should be concerned?)

Time will tell if I end up encountering the Black screen to restart issues, and I also need to figure out what the TDR crashes in the other game were (also 23.12.1?), but Minecraft is a big thing I do so that alone was a pleasant improvement. Hopefully whatever changed things in 23.12.1 for me doesn't continue to be the case in future drivers because I hate being locked to an older one (but I ended up stuck with older drivers for many years with nVidia after I got my GTX 1060, so... it's whatever?).

If the Black screen to restart issues return, I will go back to 23.12.1 and see if that resolves them (and if not it's new PSU time). If it does resolve them, then I'll finally have a conclusion to this whole ordeal. So, time to let time do its thing and see if the issue is still there or if its gone.
Last edited by Illusion of Progress; Dec 27, 2023 @ 5:24pm
< >
Showing 136-149 of 149 comments
Per page: 1530 50

Date Posted: Oct 12, 2023 @ 7:40pm
Posts: 149