My E-cores makes me crazy

Started by Manuel, May 17, 2024, 12:12:12 PM

Previous topic - Next topic

Manuel

Dear all,
I use a workstation with i9-13900K processor and the 16 E-core gives me a lot of problems. The problem is well known, Windows 11 has problems handling some applications in the background or hidden and as a result some software loses a lot of performance.
I use the workstation for work with analysis software and, to give an example, if I launch a calculation with all 32 threads available on a model I use as a reference, the calculation takes about 9 minutes. If I disable the E-cores via BIOS, all the P-cores work at 100% and the time decreases to about 1 minute 30"! I don't know what Windows does, but it's a mess!!!

The current solution is to disable the E-core via BIOS, but it is something that irritates me, because 16 cores are off and in other tests, such as Cinebench R23, they all work smoothly and with very good performance (I calculated that the 16 E-core contribute about 40% of the performance).

I have a Process Lasso license with which I hoped to solve the problem, but I have tried everything and the results are always the same. The process performing the calculations starts for a few seconds using 100% of the 13900K's 32 threads and then "sits" at about 35% making everything work a little bit, but badly... (below is a screenshot of an example analysis).

Graph.png

I tried in various combinations what you see below, but nothing ever changed:
- Exclude from probalance
- affinity - only P-core
- efficiency mode off
- I/O priority high
- windows dynamic thread priority boost off
- induce performance mode on

Do you have any suggestions?
I am going crazy!

Thanks in advance
Manuel

Jeremy Collake

#1
What analysis software is this?

First, let's ensure you've checked menu item 'Options / Forced Mode (continuously reapply settings)', in case that software is managing its own affinity.

With the E-cores removed by affinity, the total CPU % is going to be limited since only the P-cores can be used. In contrast, when you disabled the E-cores in the BIOS, the CPU consumption could reach 100% since the E-cores were not included in the total available capacity.

You should also check for any thread count setting in the analysis software and set it to the total number of P-cores threads you have available (2x the P-cores). Otherwise, it may launch too many threads for the constrained CPU affinity to cope with.

How is the analysis time after the rules you set with Process Lasso?
Software Engineer. Bitsum LLC.

Manuel

Quote from: Jeremy Collake on May 17, 2024, 03:00:07 PMWhat analysis software is this?

Thank you for your quick response!
My analysis software is "SCIA Engineer" and the process that does the calculations is "DesignForms_CalcExe.exe".

QuoteFirst, let's ensure you've checked menu item 'Options / Forced Mode (continuously reapply settings)', in case that software is managing its own affinity.

I tried after your indication, but it didn't seem to change things....

QuoteWith the E-cores removed by affinity, the total CPU % is going to be limited since only the P-cores can be used. In contrast, when you disabled the E-cores in the BIOS, the CPU consumption could reach 100% since the E-cores were not included in the total available capacity.

This aspect is clear, thank you!

QuoteYou should also check for any thread count setting in the analysis software and set it to the total number of P-cores threads you have available (2x the P-cores). Otherwise, it may launch too many threads for the constrained CPU affinity to cope with.

How is the analysis time after the rules you set with Process Lasso?

I was able to get only the P-Cores to work in various ways, see image below, but although the bar graph says that the P-Cores are working at maximum, HWiNFO64 indicates that the work rate is at about 45% (I think this value is correct, because the case fans do not start running at maximum as if I run Cinebench).

Graph2.png

The strange thing, besides the fact that the P-Cores are not working at maximum, is that the processing time is always very long (the behavior is very different if I disable the E-Cores in the BIOS), it seems that Windows still wants to try to work with the E-Cores and this slows everything down (I don't know if this is really the case, but it is my impression).

Just for information, I tried to run the software on a Ryzen 7950x that does not have E-Cores, but again the processing time is incredibly too long compared to using (8+8) P-Cores from the 13900K (selected via BIOS)... and with a Ryzen I could not use the "trick" of disabling E-Cores from BIOS.

Do you have any other suggestions for me?
If not, I guess I'll have to give up....

Thanks in advance
Manuel


Jeremy Collake

We'd really have to dig into this to give you a certain answer. It is strange that you can't achieve close to the same results that you had when disabling E-cores at the BIOS level, but as you mentioned, Windows thread scheduling on these platforms is a mess.

What power plan (or mode) are you in? If not already, switch to a high performance plan to help dissuade Windows from making use of the E-cores. You can use ParkControl to check the heterogenous scheduling settings for the power plan.



Software Engineer. Bitsum LLC.

Manuel

Quote from: Jeremy Collake on May 19, 2024, 06:29:27 AMWe'd really have to dig into this to give you a certain answer. It is strange that you can't achieve close to the same results that you had when disabling E-cores at the BIOS level, but as you mentioned, Windows thread scheduling on these platforms is a mess.

What power plan (or mode) are you in? If not already, switch to a high performance plan to help dissuade Windows from making use of the E-cores. You can use ParkControl to check the heterogenous scheduling settings for the power plan.


Thank you for your helpfulness,
I would be very happy to solve this problem, and I imagine the users of my calculation software who are struggling adesse would be too. The Software house has given some pointers to try to solve the problem (link below), but we users have not been able to solve anything... maybe it only works in some cases (like Windows 10), but in my case I solve only by disabling E-cores via BIOS.

https://www.scia.net/en/support/faq/scia-engineer/other-topics/performance-scia-engineer-calculations-running-newer-types-processors

To return to your observation, I confirm that disabling E-cores via BIOS and having only P-cores used through Process Lasso or CoreDirector gives different results.

A small test I just did this morning returns these results:

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance):
4' 18"    (CPU consumption: 110-120W)

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance + CoreDirector):
4' 07"  (E-core still works... - CPU consumption: 110-120W)

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance + Process Lasso Pro + Forced Mode + Affinities Only E-cores):
4' 01" (E-core do not work... but they probably disrupt the analysis - CPU consumption: 110-120W)

13900K 8+8 P-Core (Bitsum Highest Performance + E-cores disable via BIOS - NO Process Lasso Pro - NO CoreDirector):
0' 26" (CPU consumption: 140-160W)

I think it's clear why I'm in danger of going crazy!!!  :o

Within a few days I will be able to do parallel testing with a Ryzen 7950x (which does not have E-cores), but as I have already written, I have also done preliminary testing with this CPU by a colleague and even then the analysis is still very slowed down.

Thanks in advance
Manuel

Manuel

As promised, I update the performance list with a Ryze 7950x processor (16+16 cores). Absurdly, the results are the worst in the test set, despite the Ryzen not having E-cores.

Below is the summary of processing time, in the last line the Ryzen.

I welcome advice, although in the end, I guess I'll have to wait for the developers to rewrite the software...

Thanks
Manuel

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance):
4' 18"    (CPU consumption: 110-120W)

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance + CoreDirector):
4' 07"  (E-core still works... - CPU consumption: 110-120W)

13900K 8+8 P-Core + 16 E-Core (Bitsum Highest Performance + Process Lasso Pro + Forced Mode + Affinities Only E-cores):
4' 01" (E-core do not work... but they probably disrupt the analysis - CPU consumption: 110-120W)

13900K 8+8 P-Core (Bitsum Highest Performance + E-cores disable via BIOS - NO Process Lasso Pro - NO CoreDirector):
0' 26" (CPU consumption: 140-160W)

7950x 16+16 Cores (Higest Performance - NO Process Lasso Pro - NO CoreDirector):
5' 15" (CPU consumption: 110-115W)


Jeremy Collake

Wow! The difference in performance is remarkable, and enough to pique my curiosity.

Can you advise steps to reproduce this with the SCIA Engineer software? I've got a trial of it and will begin experimentation.

Software Engineer. Bitsum LLC.

Manuel

Quote from: Jeremy Collake on May 29, 2024, 02:03:22 PMWow! The difference in performance is remarkable, and enough to pique my curiosity.

Can you advise steps to reproduce this with the SCIA Engineer software? I've got a trial of it and will begin experimentation.


Thank you for your response!

The version of SCIA Engineer is v24.0.

At the link below you can download the model I used as a test; it is already calculated, just run the reinforcement analysis as per the instructions below.

- Open the file in the software;
- look for "Concrete 2D reinforcement design" in the top bar and click on it (image below);
- on the sidebar that will appear press "Refresh" to par start the analysis (image below);
- the analysis will be completed when colored areas appear on the surface of the object.

Link file [P-E cores - Reduced - v2.esa]:
https://drive.google.com/file/d/1fPhtfgYoO_CmjMCsZhACDFobhzqjEfx6/view?usp=sharing

Write to me with any problems!

Thanks in advance
Manuel

STEP 1
Step 1.png

STEP 2

Step 2.png

Matteo

Hi everyone,

I have the same problem on my device with this calculation software, SCIA Engineer v24.0. I really hope there can be positive news soon to solve it.
I am looking forward to your valuable support.

Thank you in advance.

Matteo

Jeremy Collake

#9
After some experimentation, I found that the problem is the core count, rather than E-cores.

This effect is best seen on the 7950X3D. Reducing the number of cores by 1/2 decreases the calculation time by over 90%.

7950 with all 16 cores enabled, SCIA launches 111 threads and I stopped it at 360 seconds.
7950 with 8 cores enabled, SCIA launches 66 threads and it completed in about 35 seconds.
Intel i5-12600k with all cores enabled was able to complete the calculation in 35 seconds.

It seems likely that SCIA is launching an unnecessarily high number of threads, resulting in synchronization overhead, contention and/or an excess of context switching, thereby substantially reducing performance.

This is why no CPU affinity (or other) rules in Process Lasso have much effect. What *does* have an effect is reducing the number of launched threads, but afaik SCIA is using a hard-coded formula to compute the number of threads to launch based on the available core count.

I'll have to chew on this further and consider ways to resolve it. Ideally, they adjust the application or expose a setting to control the thread count, if one isn't available already.
Software Engineer. Bitsum LLC.

Manuel

#10
(EDIT FROM JEREMY: I accidentally edited this post while replying and lost much of its original content)

Do you think this is a problem only with SCIA (and therefore I was very unlucky) or have you observed this behavior before in some other software?

You write that you will have to think about the solution... do you think it can be found without rewriting the software code of SCIA?

Jeremy Collake

#11
Quote from: ManuelDo you think this is a problem only with SCIA (and therefore I was very unlucky) or have you observed this behavior before in some other software?

It is very rare, but similar problems are not unheard of. Some legacy games, for instance, don't cope well with too many CPU cores.

Quote from: ManuelYou write that you will have to think about the solution... do you think it can be found without rewriting the software code of SCIA?

No elegantly. It would require API hooks and/or code patches to alter the behavior of the application.

I imagine the SCIA developers would be responsive to this issue. Is there a reason to believe they won't be?
Software Engineer. Bitsum LLC.

Manuel

Quote from: Jeremy Collake on May 31, 2024, 02:52:35 PMI imagine the SCIA developers would be responsive to this issue. Is there a reason to believe they won't be?

I'm pretty sure the developers are working on the problem, but it's already been 9 months since I reported this behavior to tech support and nothing has changed yet... they say it's going to be a long job.

I decided a couple of months ago that I would make do somehow and that's why I bought the Process Lasso license, I was sure I would be able to improve something.

I'll stay tuned to this discussion in case anything new comes along!

Thanks again Jeremy

Manuel