Conflicting CPU affinity rules and discussion of logical cores vs physical cores

Started by Mark_Kratzer, May 21, 2015, 07:54:46 AM

Previous topic - Next topic

Mark_Kratzer

I have 6 cores.

So, I wanted PL to generally manage the first 3 cores.  The 4th. would be managed by me for key applications.  The 5th. and 6th. would be reserved for either single threaded or hyper threaded games.  They would also be overclocked.

Now if you look at my Default Affinities:

DefaultAffinities=
*.exe*,0-5,
svchost.exe*,0-5,
windowmanager.exe,6-7,
windowmanager64.exe,6-7,
pdexplonxp.exe,6-7,
pwmixer.exe,6-7,
taskmgr.exe,6-7,
processgovernor.exe,6-7,
processlasso.exe,6-7,
firefox.exe,4-7,
thunderbird.exe,6-7,
startupcoppro.exe,6-7,
winbatch.exe,6-7,
autohotkey.exe,6-7,
lcore.exe,6-7,
trueimage.exe, 6-11,
trueimagehomeservice.exe, 6-11,
trueimagehomenotify.exe, 6-11


It appears (from the log file) that The Governor is waking up every 100ms and reapplying these.  So, first every thing is shifted to 0-5 due to the wildcard.  Then, specific processes are switched out of 0-5 due their affinity assignments.

As I understand this.  This is very bad as I am thrashing cores which means that I am incurring additional overhead such rebuild the instruction memory cache.  So, how do I fix this?  I want to default everything to the first three cores and tell Pro Balance that it may only operate on processes running there.

Now, I could set:

UpdateSpeedCore=60000

Thus, it would apply the rules once per minute.  It wouldn't really solve the problem, but it would reduce the damage.  Is there a better option?

Thanks!

edkiefer

could you post a section of log .

the default refresh for governor is 1 sec (1000ms) , there are predefined settings under options>general settings >PL and governor refresh rate .
You could also edit these to whatever you want manually .

I would remove the wildcard and retry . I would also let windows handle as much as possible ,what your trying to do is going to be tough IMO .
Bitsum QA Engineer

Jeremy Collake

#2
You are right that core thrashing will reduce performance.

It's thrashing because the default CPU affinity rules you created overlap. You should create rules that don't overlap.

If you reduce the refresh rate of the core engine (what that setting does), you will effectively reduce how often it makes these changes, yes.

It will also prevent ProBalance and other features from working correctly though.

Whether these changes are ideal, I'll leave for you to decide. For any readers, I do not recommend this level of customization, nor fully inclusive wildcard rules, unless you are an expert, and can recover from a very bad situation (e.g. boot into safe mode and repair the config or uninstall Lasso).
Software Engineer. Bitsum LLC.

Mark_Kratzer

#3
I definite did not have force mode on, but it kept reapplying the settings.

I was able to resolve this by using a different utility to set the affinities as PL does not touch affinities if you don't configure this stuff.

However, you documentation states for process with non-normal priorities (yes, I verified that was set) that PB will not issue restraint actions.  In fact, it appeared to ignore that.  I had to go in and exclude processes I want manually.

But I have finally, got what I was want:

3 processors managed by PB and anything, but tasks I explicitly configured will end up there.

1 processor for simply high priority tasks.  And otherwise used for various applications.

2 processors complete reserved for single threaded and multi-threaded games.  These will be OCed and will use my MOBOs autodetection whether I want the OCed profile to kick in or not for specific games.

It was work, but now I think it is working.

Jeremy Collake

Lasso is very configurable, so what ProBalance ignores is dependent on the config. Some docs are a bit out-dated, but should be accurate for default settings.

I will be curious to hear how this 'core partitioning' configuration works out for you. Keep us updated!

Software Engineer. Bitsum LLC.

Mark_Kratzer

#5
Jeremy,

Ha! Ha!

You could tell from my above message that it was 4am and I was totally exhausted.

Well, I have the tools to measure performance.  Performance with nothing but raw windows running (meaning no tweaks of the dynamic environment).

And with the tools (Something and Processor Lasso), I can also run metrics to see how particular core strategies work in terms of performance and heat.  I have various benchmarking and stress tools such as Furmark and IntelBurn.

For, the type of games I play (mainly compute bound), two fast cores should give the greatest value.  As for everything else, like Web surfing, email, ...  Unless there is a rogue process (like your demo), this machine is simply fast enough for routine activities.

I did a lot of reading of your documentation and references yesterday.  I was in the computer field for about 25 years or so.  I am shocked that in many ways the executive/scheduler of Windows has yet to reach the level of sophistication of the algorithms which were being used many years ago by DEC and DG in their proprietary mini-computer O/S'.  Also, it is disappointing how little coordination there is between the hardware features provided by Intel on Windows.  It would seem clear that use of all those cores are mainly at the application level by specific vendors who see a big bang with such coding.  Like chess playing engines or video compress software.

Thanks for your help!

BenYeeHua

QuoteFor, the type of games I play (mainly compute bound), two fast cores should give the greatest value.  As for everything else, like Web surfing, email, ...  Unless there is a rogue process (like your demo), this machine is simply fast enough for routine activities.
Yup, so far I only saw some game that are multi-platform like WarFrame and Planetside 2 has improved their multi-core performance, after they supported PS4(which is 8 core, and 4+2 core for the games).

I hope the DX12 with Windows 10 will greatly improve more the gaming performance, as GPU can do many thing but still waiting for the CPU.(except you are using mostly shader and post-process) ;)

Mark_Kratzer

This is one reason why despite disliking Apple's closed approach, one can do a better job if one controls both software and hardware.

This was very much the norm in the early days of mini-computers.  Hardware and software were very closely integrated.  If someone needed something special for a linker, compiler, database ...  They would often get it implemented for them directly at the machine level as instructions which operated in the micro code.  Very fast and efficient.

These days, you buy hardware and you run Windows which is years behind your machine.

---

BTW, the idea of Pro Balance is that by controlling process/thread priorities, you can keep the machine responsive.  I suppose that for most real world cases this is true.

However, this is for work that is created and charged to the user process.

A long, long time ago when I was a systems programming, we had an easy demo of how you could bring a mini computer to its knees and process priorities were of absolutely no help.  You could create a huge two dimensional array that was pretty much all virtual storage.  Then, use one in index to step through the array page by page.  On each page, just fill in one cell.

This took hardly any CPU.  But you would drive the O/S crazy with having to instantiate the page and page it out to disk.  All of that work was billed to the O/S and not the user.  Thus, process priorities had no impact.  As long as the user process got any CPU, the O/S would grind to a halt.  Grind to a halt that it was so busy that it rarely had time to run the scheduler and let users do anything.  My guess is that one could probably write a CPU Eater for Windows that would do something similar for which Pro Balance would be helpless to free up resources.

edkiefer

#8
Quote from: Mark_Kratzer on May 22, 2015, 11:35:52 PM
This is one reason why despite disliking Apple's closed approach, one can do a better job if one controls both software and hardware.

This was very much the norm in the early days of mini-computers.  Hardware and software were very closely integrated.  If someone needed something special for a linker, compiler, database ...  They would often get it implemented for them directly at the machine level as instructions which operated in the micro code.  Very fast and efficient.

These days, you buy hardware and you run Windows which is years behind your machine.

---

BTW, the idea of Pro Balance is that by controlling process/thread priorities, you can keep the machine responsive.  I suppose that for most real world cases this is true.

However, this is for work that is created and charged to the user process.

A long, long time ago when I was a systems programming, we had an easy demo of how you could bring a mini computer to its knees and process priorities were of absolutely no help.  You could create a huge two dimensional array that was pretty much all virtual storage.  Then, use one in index to step through the array page by page.  On each page, just fill in one cell.

This took hardly any CPU.  But you would drive the O/S crazy with having to instantiate the page and page it out to disk.  All of that work was billed to the O/S and not the user.  Thus, process priorities had no impact.  As long as the user process got any CPU, the O/S would grind to a halt.  Grind to a halt that it was so busy that it rarely had time to run the scheduler and let users do anything.  My guess is that one could probably write a CPU Eater for Windows that would do something similar for which Pro Balance would be helpless to free up resources.
I am not familiar with Applies systems, but what sounds like the demo was I/O bound and yes, in that type of issue cpu affinity and cpu priority won't help much , you could only lower I/O priority but that is not as flexible a priority as others , but what alternative is there .
For many cases Porbalance lower the back ground processes so for-ground is responsive does work .

I be interested in your outcome results and how you tested , what you used to get repeatable load results etc  .
Bitsum QA Engineer

BenYeeHua

QuoteThis was very much the norm in the early days of mini-computers.  Hardware and software were very closely integrated.  If someone needed something special for a linker, compiler, database ...  They would often get it implemented for them directly at the machine level as instructions which operated in the micro code.  Very fast and efficient.

These days, you buy hardware and you run Windows which is years behind your machine.
Yup, I still remember the software and games are very old, most of them are just emulator with many games. ;)

And ya, even the Win 10 start menu are not using c++ or something, if I am right. :)
QuoteAnd guess what... We're using the same tools available to third party app developers, in an effort to ensure the dev platform is rock solid. The Start menu is just a universal XAML app. We don't have to do any special optimization on top of what the framework offers.
https://news.ycombinator.com/item?id=9555628

Jeremy Collake

Quote from: Mark_Kratzer on May 22, 2015, 11:35:52 PM
BTW, the idea of Pro Balance is that by controlling process/thread priorities, you can keep the machine responsive.  I suppose that for most real world cases this is true.

I am glad you acknowledge that it is true, as we've demo'd many times over. A single thread at normal priority in a normal priority class process can bring Windows to a near stall for a single CPU system. For multi-core, increase the number of threads to achieve a similar effect (though even a single CPU bound thread can still be very detrimental to responsiveness on multi-core systems).

As for CPU affinity adjustments, as you can see, ProBalance supports such, but I'm not going to comment on it right now. This has been on our radar for years, specifically what you are doing.

To clarify, I discourage this kind of CPU partitioning because most users don't have enough cores yet. With 4 or 8 cores, you really need to let the OS handle it. For Server systems with 16+ cores, it makes more sense, and is being done. There is a reason we've added all this support for CPU affinity adjustments and such ;).

And absolutely you could write a demo that could bring a multi-core system to it's knees with a single thread, but that would have to be specially crafted to make use of deficiencies in the OS. In the case of our demo, we've just got a simple while(1) loop, simulating any prolonged CPU bound condition, or infinite loop.

Lastly, let us be careful not to compare apples to oranges. Each OS has a different CPU scheduler and software ecosystem.
Software Engineer. Bitsum LLC.

Jeremy Collake

#11
Oh, and let us not forget that most users don't have awareness over the difference between logical and physical cores. Some cores will offer a lot less performance than others, especially for Intel platforms since every other core is hyper-threaded. AMD is similar, with adjacent logical cores paired together that share certain computational units.

So, for older Intel CPUs:
Core 0 = real CPU
Core 1 = Hyper-threaded core (much less performance)
Core 2 = real CPU
Core 3 = Hyper-threaded core (much less performance)

For AMD and newer Intel CPUs:
Core 0 \
           shared computational units
Core 1 /
Core 2 \
           shared computational units
Core 3 /

These issues and user education have put a bit of a hold on any general recommendation of doing what you are.

The OS scheduler is aware of the CPU design so tries to keep threads on the appropriate core. If you're doing the same, just make sure you are also aware of the CPU architecture.
Software Engineer. Bitsum LLC.

BenYeeHua

Should not it be like this? ???

For Intel:
Core 0 \
           Hyper-threaded core
Core 1 /
Core 2 \
           Hyper-threaded core
Core 3 /

I think there is no priority on which thread(core 0 or 1) it should process first... :P

Jeremy Collake

#13
Whichever way you prefer to visualize it, but I think my description makes the point more clear, in that AMD logical cores are a bit more 'real' than Intel's hyper-threaded ones.

Every logical core does not offer the same performance since not all are physical cores.

Therefore, one has to keep this in mind if they do CPU affinity adjustments. The OS is aware. Just make sure you are too (speaking to users). :)

Please let the OP respond before commenting further, I don't want to confuse this thread. I am already planning to split it.
Software Engineer. Bitsum LLC.

edkiefer

right, I won't use HT by itself, I would group it with its parent core if you want to use it .

And of course something like i5 which has no HT , every core is real (0-3) .

AMD are more of a modular type, with some sharing between 2 cores , the cores are the same size .
Bitsum QA Engineer


BenYeeHua

Quote from: edkiefer on May 23, 2015, 05:25:43 PM
right, I won't use HT by itself, I would group it with its parent core if you want to use it .

And of course something like i5 which has no HT , every core is real (0-3) .

AMD are more of a modular type, with some sharing between 2 cores , the cores are the same size .
But i5 is not 4 core 4 threads when coming to Laptop version. ;D

Anyways, I wonder did AMD performance did increased in Windows 10 compare with Windows 8.1 or not, it sound like AMD don't say anything about Windows 10, so I guess their Module design is well known by scheduler now.

edkiefer

You are right , I was talking DT, I hate how Intel and Nvidia use number/naming on portables .
Bitsum QA Engineer

BenYeeHua

And also Graphic card between the generation, while the generation never changed too...(520m, 610m etc...)
(it was named as vest card in Chinese, which means that it just changed the vest/coat, but the core is still the same.) :P

edkiefer

Bitsum QA Engineer

Jeremy Collake

I am working on a change that will prevent this thrashing effect when conflicting/over-riding CPU affinity rules are specified. It may not make it in this next final, as it's coming very shortly, but the beta series that follows it.

My prior post updated to make older Intel CPUs distinct from newer Intel CPUs. It seems Intel's Hyper-Threading changed while I wasn't looking. Blink and you can so easily miss tech changes/advances.
Software Engineer. Bitsum LLC.