Some results from HT testing

Started by Mark_Kratzer, May 23, 2015, 05:12:50 PM

Previous topic - Next topic



Much of the Internet regards HT core pairs as being symmetrical.  Discussions as to why they may be bad or should be turned off in the BIOS are due to:

* The fact that HT technology splits cores may lead to poor work load assignments based on the fact that resources aren't what they would be for a non-HT CPU.

* There is a segment of the OCing community which feels that HT enabling creates additional heat and therefore reduces the headroom for OCing.  As many OCers are gamers, they are more concerned with max number of cycles than parallelism.

You stated above:

HyperThreading is a feature introduced by Intel, and is exclusive to Intel processors. It splits a real CPU (a core) into 2. One is the real core, called the physical core. The other is just a secondary core, called the logical core. This logical core can't do much, but it does provide a little increased parallism. It is far from being a real core. In fact, it offers 10-20% (est., likely less) the performance of a real physical core. That's right, barely any computing power.

As best I can tell from my testing "your real core" and "your logical core" are symmetrical.  There isn't a fast one and a slow one.

The test was performed on an i7-5930K.

The methodology was to use a chess benchmarking engine.  It is all compute bound and produces a number of nodes/second, meaning the number of board positions which can be evaluated per second.

Assigned to P10:  3027/8
Assigned to P11:  3027/8

The results were very reproducible.

An as an FYI, the same test run:

Assigned to P10-P11:  3751 (result varied by around +/- 10 nodes/sec) for a boost of around 24%.

My point is that your literature is at worst misleading and at best ambiguous.


Try something more repeatable like prime95 , or superPI
HT will allow a second thread but thats it .

Edit: So your getting +24% by having HT enabled in that particular app , that sounds about right , Intel states 0 -30% depending on app .

Maybe doc could be worded slightly different but its just rough avg , we know it will depend on app type .

Another interesting link as most info on HT is when it was first introduced
Bitsum QA Engineer

Jeremy Collake

That article is old, but still true, even if the numbers have changed a bit. I am not a technical writer, but believe there are plenty of citations on the web to back up the article.

Intel's new architectures are apparently more like AMD's, but still there is the fundamental truth here.

Maybe performance of Intel's HyperThreaded logical cores may have improved if the other core in the pair is idle, but it's still sub-par to a true physical core.

Notice how P10-P11 didn't double the results --- that kind of says it all - it only added 24%. If they were two fully distinct cores, then the computational capabilities would be doubled.

Try P10 and P12, see how those results go ;). They should be dramatically higher than P10 and P11, because you then have two full CPUs involved.

What I am trying to explain that it depends on if the shared part of the computational pair is utilized. If the other paired logical core is idle, then the HT core may get full use of it.

AMD's platforms are different, but similar - their logical cores are more two paired physical cores that share a computational unit(s). Now, if that computational unit(s) is used entirely by one of the two in the pair, then the other will have to wait. Again, a bottleneck due to this CPU-splitting.

Properly performed tests will always yield this to be true.

(talking about old servers where the OS was unaware of HyperThreading, as I also mention in my follow-up post):
"It is critical hyper-threading be turned off for BizTalk Server computers. This is a BIOS setting, typically found in the Processor settings of the BIOS setup. Hyper-threading makes the server appear to have more processors/processor cores than it actually does; however hyper-threaded processors typically provide between 20 and 30% of the performance of a physical processor/processor core. When BizTalk Server counts the number of processors to adjust its self-tuning algorithms; the hyper-threaded processors cause these adjustments to be skewed which is detrimental to overall performance." - Microsoft on 'Optimizing Operating System Performance'.

Lastly, benchmarks performed on your particular CPU aren't universally true. How about everyone with an Intel CPU run their own tests, feel free to publish them. The results will vary wildly, depending on specific CPU model and type of benchmark (how much use of FPU, etc..).

If I find any documents to be inaccurate, I'll happy edit them. It is hard to write good documentation that is applicable to everyone, and my writing was worse then than it is now -- but the point is still true, at least for CPUs I'm familiar with.

But, if you're here to 'refute' and 'debate', I will leave it to users and everyone, and stay focused on creating software that lets users do whatever they please, and I'll publish the best information I have to date. Users can make up their own minds.

Wikipedia on HT performance:
Software Engineer. Bitsum LLC.

Jeremy Collake

Response again consolidated and summarized. I will edit the page to increase the performance numbers, but it all depends on whether the paired core is fully utilized.

Whether Intel or AMD:

Cores 0 and 3 combined will be *much* faster than Cores 0 and 1 combined. That's because you have two fully distinct physical CPUs at work with 0 and 3, where as 0 and 1 share computational capabilities.

Now, it will depend on the benchmark type. How much integer math? How much FPU? Etc...

The writing can be improved, it was written so many years ago, for a different generation of processors --- just as HT was emerging really. Just take it easy, there's no conspiracy here or anything ;).

In the other thread, I just meant to say that when you micro-manage CPU affinities, you take away the OS's ability to be aware of these paired logical cores and appropriately move the thread workloads. That's why it's appropriate *sometimes*, but not for everybody, and not all the time.

From the Wikipedia article:

"It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's thread scheduler is unaware of hyper-threading, it will treat all four logical processors the same. If only two threads are eligible to run, it might choose to schedule those threads on the two logical processors that happen to belong to the same physical processor; that processor would become extremely busy while the other would idle, leading to poorer performance than is possible by scheduling the threads onto different physical processors."
Software Engineer. Bitsum LLC.



I apologize to use the word "refute".  We are not having an argument, and I am not hostile.  I am very impressed with your product even if I am not taking all your recommendations.

My point about your article is that it leads one to believe that there is a senior core and junior core of each pair.  If something runs on the senior core, then it gets a lot more cycles of power.  If that same thing runs on the junior core, then it gets very limited cycles of power.  I could not get numbers to reflect that for my specific PC.  Or perhaps, the article is unclear and I am misunderstanding what the true point is.  As my testing showed that there is neither a senior or a junior; just a pair of twins which make available a common pool of resources.

In any case, I mean no offense to you or your company.  I am sorry to use the word "refute".  As you are an ADMIN, please feel free to rename the thread or even delete it if it is not the spirit which this board is intended.

Thank you.

Jeremy Collake

(as I just emailed)

It is no problem at all.

Yes, it appears that Intel's newer generation processors are now more like AMD's design that I described on that page, in which case it doesn't matter which core of the pair is referenced, as either logical core as full access to the physical CPU -- BUT, if you reference *both* at the some time, this is sub-optimal to referencing two logical cores from different physical CPUs.

This makes perfect sense, it's just a change for the original Intel HT design that I described in that page, thus changed the behavior. That made my docs, and Microsoft's, and Intel's, antiquated. Tech changes fast, sometimes such happens ;).

Thanks for letting me know about this change... I've got to identify when Intel changed their architecture so I can correct the page, something I'll be researching.
Software Engineer. Bitsum LLC.