From my bog 'The Pile Of' - CPU Overheating in Legacy Mode

Started by Jeremy Collake, September 07, 2012, 11:56:49 AM

Previous topic - Next topic

Jeremy Collake

CPU Overheating in Legacy Mode

When in Legacy (non-protected) mode it seems some CPUs are particularly susceptible to overheating. This is before they start frequency scaling, sometimes before dynamic FAN speed control, perhaps before use of the HLT instruction, and definitely before use of any core parking. Thus, these mitigation strategies may be all that keeps some CPUs from overheating. I've always noticed an increase in thermal emissions when my CPU is running full speed in legacy mode, but today is the first time I accidentally spent too much time in legacy mode, causing my system to shutdown pre-boot due to the CPU temperature (which had raised to 74C, though post-boot, with mitigation strategies in place, is back at 45C). Now, this is actually an indication of some trouble on this PC, despite the heavy load I place on it. Either I must allow for such extreme temperatures, or perhaps re-seat the Heatsink. Still, on a PC with a heavy lead that has no issues running continuously at 100% CPU frequency in High Performance mode, this is a surprising - and scary - discovery. I can't help but wonder, how long would your PC last in legacy mode?

UPDATE: In my case, since I was in a pre-boot RAID configuration tool, I believe the failure of the CPU fan speed to ramp up may have been the primary cause.

Src: http://thepileof.blogspot.com/2012/09/cpu-overheating-in-legacy-mode.html
Software Engineer. Bitsum LLC.

edkiefer

I have noticed higher temps in the bios than in windows and can only think it is either a deference in measuring software or all the power saving mods that the cpu has in the OS .
Bitsum QA Engineer

Jeremy Collake

#2
There is no difference in measurement, it is indeed the mitigation strategies present when the processor enters 'protected mode' and pre-boot, by the BIOS. When in an incomplete BIOS/UEFI initialization control panel, as I was in the eSATA card configuration, problems are really compounded, because then even the BIOS's dynamic fan speed control may be inoperable, or only partially operable. One thing also *for certain* is that in legacy mode, there is no frequency scaling, thus the CPU is running all-out at 100% of CPU cycles used (of usually just one core), at 100% frequency.

IMHO, the processors that 'push the thermal limit' these days are simply not designed to operate outside protected mode and/or the BIOS (or UEFI) does not have adequate safeguards to prevent overheating while in legacy mode. The OS has many additional mitigation methods, including the aforementioned frequency & voltage scaling (top one), the HLT instruction issued during IDLE loops, and of course CPU core parking.

*IF* I had set my CPU fan to run continuously at maximum speed, I think I would have been ok, but my setting is as most people's -- default to fewer RPMs at lesser temperatures, and ramp up from there. If this dynamic fan control is not operable, then the fan should go to full speed. On this PC, at least, it does NOT. It instead sits a minimum speed, while the CPU 'cooks' in legacy mode.
Software Engineer. Bitsum LLC.

BenYeeHua

Ya, the bios is the most important thing to protect the hardware.
If the bios is damaged, ready for BBQ  ;D

Jeremy Collake

Or partially inactive, as it was (largely) in the UEFI initialization chain as I was configuring the PCIx eSATA RAID card I picked up for a few bucks since mine didn't support port multiplication.

OT: Needless to say, I *must* keep a reliable and full backup of all my data. I still don't trust the cloud that much, though do use for sensitive documents even. I also do periodically upload an encrypted archive to it with some of my other data. However, to trust it with everything, I just don't know. I seems like putting my data on a credit card (the same level of exposure to whoever). It really puts it 'out there' for *easy access* now ... or later - and who knows how the world, corportations, or governments, change in the future. If ANYONE was serious about giving us SECURE cloud services, they would give us a CLIENT CONTROLLED encryption key. Instead, as is, the big players want to make sure they can see our data, if they ever need to, and/or are just too lazy to try to tackle the challenge of giving us client-side encryption while offering web based file access. While Carbonite offers a client-side encryption key, it isn't one of the big cloud systems now emerging (e.g. SkyDrive and gDrive).
Software Engineer. Bitsum LLC.

edkiefer

Its hard to tell in bios but the dynamic speed seems to be working here ,but it hard to say as not much you can do in bios (cpu% wise).
the longest I have been in there is like 10-15min, looking around at options . My cpu temps were low, like 87F (below 90F) or so if I remember right , but that not retail HS/fan, I am using Coolmaster 212evo which is a PWM type .

PS: I have case covers off right now so that going to affect it too .
Bitsum QA Engineer

Jeremy Collake

I also run with my cover off. As mentioned, it may also be one thing to run IDLE, say, in the BIOS, or run IDLE *before* the BIOS starts, or somewhere prior to full initialization of the BIOS (or UEFI). That is where I was stuck, configuring my RAID card, in some stage of initialization as the UEFI tried to scan for available devices and initialize them. Thus, in that stage, my CPU apparently just 'cooked' .. until it did hit its thermal limit and was shut down, but ONLY after exiting the RAID Controller configuration, and returning control back to the UEFI to continue initialization, at which point the UEFI said "STOP!! IMMEDIATE SHUTDOWN!!" ;p.
Software Engineer. Bitsum LLC.

Jeremy Collake

Anyway, the big problem I described, regardless of your cooling systems, your CPU rarely will get subjected to worse conditions than when in legacy mode. There, a single core is getting 'fried' running at 100% of its frequency and 100% of its capacity. Unlike protected mode, there is not a HLT instruction for IDLE loops, afaik, so it just keeps executing code, in a polling-like fashion really (continually polling for new I/O).
Software Engineer. Bitsum LLC.

BenYeeHua

And which one is the cause, BIOS or the RAID card?
----
Except you are having a bad cooling system(laptop for example)
----OT
Ya, and who know that, did the cloud will be closed and after that, they will selling the data?

Jeremy Collake

The cause was the UEFI (new BIOS gen) was not fully initialized. It was partly initialized. Thus, it did not have its dynamic fan speed control in operation, as I didn't hear my fan 'spin up' as it should when my CPU started to overheat. The cause of the heat was simply being in legacy mode. Any time you 'sit' in legacy mode, you are going to create a condition of high thermal emissions.

Legacy mode is old-school stuff, but still there. It is important to understand what it is, but hard for me to put in a sentence. Protected mode is the mode that the OS runs in, and it allows for things like virtual memory, and the basic security structures that isolate each process to their own virtual memory and virtual memory address space. Further, it sets up the basis for kernel mode (ring0) vs user mode (ring3), etc... Legacy mode is what DOS used to run in. The CPU 'just ran', continuously, in a polling fashion... just waiting for a keystroke or other input, looking, looking, interrupt handling, looking, interrupt handling, looking, etc... ;). There is no 'idle' state, essentially .. even if the user is idle. Thus, the CPU will reach a high temperature.
Software Engineer. Bitsum LLC.

Jeremy Collake

The scary thing, of course, is how quick it happened. I was not in that control panel for the eSATA card long. Maybe a minute. It takes *very little* time for a CPU to overheat... which is why it is important your dynamic fan speed control be ACTIVE, *OR* your CPU fan remains at 100%. This is why I now believe this to be a bug in my UEFI, though I doubt it is uncommon :o. It is more likely this condition, a device initializing that requires user input, was not something they tested extensively, *or* they figured the CPU surely wouldn't overheat in such a small amount of time (wrong!).
Software Engineer. Bitsum LLC.

BenYeeHua

And the final way to fix it is report it to the developers.  ;)
----
And I learn the new thing now. :)

Jeremy Collake

Another nice thing would be if the CPU *defaulted* to running at its lowest multiplier (lowest frequency and voltage state) when in legacy mode. After all, that's all we need in legacy mode (or most people anyway). However, since *some* do need full computing capacity in legacy mode, this will likely never happen. Then we have these BIOSs / UEFI that want to 'overclock' your system for you ....
Software Engineer. Bitsum LLC.

BenYeeHua


Jeremy Collake

LOL, indeed ;). My office room actually does have a 5 degree F variance from the rest of the house, due to thermal emissions from the PCs and other electronics. I am sure many others are the same ;p.
Software Engineer. Bitsum LLC.

BenYeeHua

I vote "No", as now I am at the counter. ;D
The space is too big and the computer+CRT TV(for cctv) thermal emissions is not enough to making it higher ;)