Process Lasso and Cubase

Started by DaveB, December 01, 2023, 03:04:09 PM

Previous topic - Next topic

DaveB

I have to start by saying that I wish I had discovered this program MUCH sooner!

I have done a fair amount of testing to try to get later versions of Windows (specifically 10 and 11) to work reliably with later versions of Cubase (11, 12, and 13). I've done countless web searches and reading. I had gotten discouraged enough with my general-purpose machine having issues that I made the just to a newer machine with mostly Cubase loaded.

While the new machine was performing pretty well, I was still getting random audio dropouts or glitches and to say it was getting frustrating would be an understatement.

I was honestly losing track of all the "try this" tweaks and trying to keep track of them!

Ultimately, just installing and running Program Lasso, vastly improved but did not totally solve the issues. With a few additional changes to the configuration, I now have a usable Cubase on all scenarios I'm using.

First the machines:
General-purpose Asus Prime Z490-A with Intel i9-10900K 64GB Corsair DDR4 Windows 11 AMI BIOS v2701
Media MSI MAG Z790 Tomahawk WIFI (MS-7D91) with Intel i9-14900K 64GB G.Skill Dual-boot (Windows 10/11) AMI BIOS vH.90.

I've reverted all the BIOS (try this!) tweaks back to defaults.

Here's the settings I'm currently using:



Pic 1.png
Pic 2.png
Pic 3.png
Pic 4.png
Pic 5.png
Pic 6.png


Jeremy Collake

Nice! Thank you for sharing! I have no doubt this will be useful to other Cubase users.
Software Engineer. Bitsum LLC.

DaveB

At one point I was having noticeable audio glitches when audiodg.exe was switching context to another CPU. It appears that the CPU Affinity for audiodg.exe is not required. The jury is still out, but preliminary tests seem it's not needed.

DaveB

Two changes from above:

1. Changed the high performance mode to "Bitsum Highest Performance". I think I had high configured the same way and was having some issues with the Bitsum mode but it appears to be working now.
2. Added Park Control and disable CPU parking in Bitsum Highest Performance.

Note: I am still seeing Cubase issues when running Cubase and LatencyMon simultaneously. However, LatencyMon run alone showed that parking the CPUs improved the latency results.

Note: I was stress testing Cubase and it did not seem to benefit from parked cores because the cores were already operating near maximum. Disabling parking did not seem to hamper a fully loaded Cubase, but could potentially impact projects using less CPU resources. Since it seems to have no impact on a loaded Cubase, I'm disabling parking in Bitsum Highest Performance and leaving parking enabled in Balanced.

DaveB

I made a few other changes that seemed to benefit latency reported by LatencyMon, but I did not see notable changes to Cubase responsiveness or process load. I made these changes because allegedly some of the latency reports seem to be caused by other tasks/threads, rather than being themselves the cause.

It appears that some OS tasks reporting latency tend to run on CPU 0/1 (hyperthreading enabled, so core 0). This appears consistent with already proven improvements by moving audiodg.exe off of CPU 0/1. Initially I configured CPU affinity for audiodg.exe to CPU 2. After noting additional issues, I have changed the affinity of multiple audio tasks to use ONLY P-cores and NOT CPU 0/1. This improved some latency reported by LatencyMon while seemingly no impact on Cubase.

Essentially, I changed the affinity of Audiosrv and AudioEndpointBuilder to CPU 2-19 (for 10900K).

I'm not looking to invest enough time in this to accumulate a Masters thesis or PHD dissertation. I've already spent a lot of time NOT doing Cubase tasks to get a decent Cubase operation! That said, I would like to reiterate that I now have a dedicated machine with dual-boot (Windows 10/Windows 11) with multiple versions of Cubase (11, 12, 13) installed, along with Wavelab and SpectaLayers and a few other things like LatencyMon, Process Lasso, etc.

Note that this provides a test ground with a fresh install of Windows 10 and 11 that run on literally the same hardware! I can quite literally see a significant difference in DPC latency and execution times between the same Cubase versions on either Windows 10 and 11. There are clear indications that either Windows itself, or Windows 11 drivers are causing increased latency.

I'm willing to try a few things if there are some suggestions to assist in identifying issues. I just don't want to get buried doing things that regression testers should be getting paid to do.

I am NOT an expert in the Windows scheduler! That said, I am a retired software engineer with significant experience in embedded programming, primarily in cooperative multitasking (vxWorks, OpenRTOS) and am well aware of the impacts of  interrupts, process/task priorities, etc. Trying to do near real-time tasks on a preemptive OS is not trivial. That said, people whose job is development on these platforms SHOULD have a basic understanding of what the critical processes are, and should be able to provide SOME guidance isolating problematic tasks to dedicated resources and/or establishing appropriate priorities such that these critical OS tasks get the chance to do their job, while simultaneously allowing enough resources for audio processing.

Sorry this is getting a bit long, but I also wanted to reiterate that attempting to run LatencyMon and Cubase at the same time, is CAUSING problems with Cubase (most likely buffer underruns). My compromise so far has been to attempt to optimize what LatencyMon is reporting, then close LatencyMon and test with Cubase. I have not had the desire to start down another path of trying to get these two programs to "play nice" together.

I am thankful that Process Lasso has enabled me to get a usable Cubase, but I have underlying concerns about what these "fixes" might be causing elsewhere in the system.

Question:

How likely are the changes I've made so far, to cause issues with the Windows scheduler apart from Cubase?



Jeremy Collake

It seems unlikely to me that any of the adjustments you've made have appreciable impacts to the rest of the system.

It is interesting that LatencyMon's active presence was causing performance issues with Cubase. We've not heard of that before, though it's certainly possible, especially if your system is on the brink of inadequate real-time responsiveness.

Good luck! As you alluded to, it is a hard battle you're fighting, and we don't have all the answers.
Software Engineer. Bitsum LLC.

DaveB

Quote from: Jeremy Collake on December 08, 2023, 08:30:13 PMIt seems unlikely to me that any of the adjustments you've made have appreciable impacts to the rest of the system.

It is interesting that LatencyMon's active presence was causing performance issues with Cubase. We've not heard of that before, though it's certainly possible, especially if your system is on the brink of inadequate real-time responsiveness.

Good luck! As you alluded to, it is a hard battle you're fighting, and we don't have all the answers.

Thanks!

Something I would find incredibly interesting would be to isolate portions of the Windows OS to a subset of cores, and subsequently, dedicating cores to the more demanding application code.

Beyond that, I don't have nearly enough insight into how Windows manages memory and caches to fully understand how to get into that level of system tuning.

My BIOS provides the capability to disable hyperthreading on individual physical cores. It is not clear to me that the overhead of managing hyperthreading causes more issues than disabling the feature that should allow more threads to be processed, however it does seem possible.

Another question(s):

Is there a simple way to set up some generic rules that (for example) I can isolate things I determine are Cubase/audio related and an inverse rule assigning the remaining cores to everything else?

Would the Windows OS honor such an arrangement or constantly be trying to manage the resources?

It would seem that this could assure that tweaking priorities will not starve the OS of needed resources as they are isolated on their own core(s).

I get that in DAW processing, there have always been limits. That's why things like "track freezing" existed, and those features I will still use as needed. It's just odd that a CPU that's got more cores and more memory would have more issues getting the work done. I mean, my old 4 core machines were able to run with hyperthreading until the cores just didn't have enough cycles to do more.

Sorry if I'm getting a bit off topic!



Jeremy Collake

Quote from: DaveB on December 08, 2023, 09:55:23 PMSomething I would find incredibly interesting would be to isolate portions of the Windows OS to a subset of cores, and subsequently, dedicating cores to the more demanding application code.

As of Process Lasso v12.5.0.7 BETA, you can experiment with the new System Reserved CPU Sets tool, in the 'Options / Tools' submenu.

This will do exactly what you want: keep system threads off specific CPU cores, leaving them always available for applications. After defining the system reserved CPU sets, you can then create the inverse CPU affinity rule(s) for your real-time apps.

In this build, it only works for Windows 11, but Windows 10 support will come before the final release.

You can enter the beta channel by checking menu item 'Updates / Include Beta'.

Certainly, you can alternatively create CPU affinity rules that accomplish the same thing, and Windows will honor them. So long as they aren't excessively restrictive, there won't be any problems.

If you try it, let us know how it goes!
Software Engineer. Bitsum LLC.

DaveB

Cool! I may give it a try on one of my machines.

DaveB

I loaded the BETA to give this a try. I found the setting under Options/Tools/System Reserved CPU Sets. I first tried setting CPU cores 0-3 and was asked to reboot.

After the reboot, I could see no indication that tasks listed as user SYSTEM were restricted in any way.

I see no new rule in the Rules column and no change to the CPU Affinity for the same tasks. I DO see a rule to omit from ProBalance, just as before.

Interestingly, the % CPU bars show no activity on cores 0-3. If I look in Resource Monitor, all tasks appear to be CPU 0. Though I do see activity on the CPU graphs, primarily on cores 4-7.

I'll experiment a bit more.



DaveB

I'm still running the BETA, but removed the rule for system cpu sets.

Resource monitor still lists most everything running on CPU 0 in the table data, but the graphs indicate activity on cores 0-3 that it did not with the system cpu sets rule in place.

Process Lasso shows activity on cores 0-3 with the rule removed, where it did not with the rule in place. It seems almost as if the assigned cores to system cpu sets is reserving those cores but failing to assign the proper affinity range to the tasks to run on those cores.

Any suggestions?

DaveB

Note: On rereading your post about this, I apparently misunderstood the implications. I though I was restricting SYSTEM to specific cores but based on your description, I am actually restricting system from those cores.

However, given that, I do not see anything running on those reserved cores even though processes are marked with affinity 0-19. I'm not sure I'm understanding how this actually works.

In order to actually USE the reserved cores, do I then create CPU set(s) to assign the process to the reserved cores?

I this feature overriding the CPU Affinity settings? If so, I don't see the evidence in the GUI.


Jeremy Collake

Correct. Since System Reserved CPU Sets is a Windows registry setting, it is not reflected by any rules in the Process Lasso GUI. Your observations indicate that the feature is working. We'll be considering how to better indicate in the GUI that a System Reserved CPU Sets is in place.

I think you are misinterpreting the CPU column of Resource Monitor list. That is an integer representing the % CPU load, not the CPU index assigned to the processes.

I found that you are correct that applications will not make use of the reserved CPU cores, even with a CPU affinity or CPU sets rule. It may be this feature is only useful when combined with interrupt affinity policies for device interrupts, which is not yet supported by Process Lasso.

Therefore, the feature may not be useful to you. Apologies for the head fake! We had this addition staged, and it seemed like a good time to experiment with it.

I recommend you return to your previous plan of inverse affinity rules, one for your real-time apps, and one for everything else.
 
Software Engineer. Bitsum LLC.

DaveB

#13
Thanks again!

Makes sense.

A rule that could assign CPU Affinity based on the "User" column would make my life easier. :)

Added:

I actually did try matching against user = "SYSTEM" and it did attempt to change affinity but was getting ERROR 62 Unable to set affinity (or similar wording).

Jeremy Collake

QuoteI actually did try matching against user = "SYSTEM" and it did attempt to change affinity but was getting ERROR 62 Unable to set affinity (or similar wording).

That will occur on a few of the protected system processes. You can ignore the error, if it isn't flooding the log, which it may be if you have 'Options / Forced Mode' enabled. In that case, I recommend more selective CPU affinity rules.

We're certainly always working to make the product easier to use. Thanks for the feedback!
Software Engineer. Bitsum LLC.

DaveB

The jury is still out, but I think I've finally tamed the beast! ;)

I had gotten things running pretty well but was still seeing periodic issues with LatencyMon.

The final changes that appear to have corrected these peaks were:

1. For the Z490-A 10900K based machine, turn off AI overclocking (set to normal).
2. For the Z790 14900K based machine, turn off XMP.

On both machines, I was able to run LatencyMon for over two hours without the spikes.

Now I want to review some of the Process Lasso settings to review if they are still necessary or were bandaids, improving otherwise bad behavior.

In both cases, I still see Highest DPC execution times higher than I expect, but this does not trigger LatencyMon to report problems processing audio, as they report just below 1ms.

Also note that firing up MS Edge can still trigger LatencyMon to report problems. Cubase has been behaving pretty well, and I will repeat some of the stress testing to confirm how well it behaves. Running Edge while also doing Cubase work is not something I need nor necessarily expect. However, I do want to understand how situations I might want to do (i.e. Cubase with OBS) might impact Cubase.

Some of the more esoteric uses of Process Lasso I was envisioning may ultimately not be needed. At some point, I hope to get satisfactory behavior and stop experimenting! I may still try to isolate Cubase to a subset of core(s) or force less important tasks to E-cores.






DaveB

#16
I'm now introducing myself to Windows Performance Recorder and Analyzer. I'm fumbling around without an understanding at the moment, but it appears that I have identified a DPC/ISR duration spike occurring approximately every 15 minutes.

I stopped the event recorder shortly after a spike that caused a warning from LatencyMon.

I'm not sure what to make of this just yet. Currently, I'm trying to identify what is happening every 15 minutes and go from there.

The spike is occurring in ntoskrnl.exe.

DaveB

After a few hours with WPA, I have confirmed that there are actually TWO DPC execution time spikes that are approximately 1 ms within about 2.2 ms occurring every approximately 15 minutes. These two spikes are pegging ALL CPU cores when they occur.

This is on my MSI MAG Z790 with i9-14900K running Windows 11.

Tomorrow I want to check for this on Windows 10 on this dual-boot machine. Then I also want to look for similar spikes on my Z490 i9-10900K to see if there are similar spikes.

The question I first want to answer is whether the same spikes are potentially from BIOS settings causing HARDWARE issues (or associated drivers) or if there is something occurring in Windows 11 every 15 minutes.

Then I hope to discover if also occurs (but to a lesser extent) in Windows 10 on the same machine.

When I think about a 15-minute interval, I'm thinking this is possibly something in power management or some OS housekeeping task(s). This is purely an educated guess at this point.

In a one-hour capture, there were two spikes exactly 15 minutes, a slight delay around 30 seconds longer that 900 seconds, and the next two were again exactly 15 minutes. The final DPC is the one that triggered LatencyMon, combined to about 2.2 ms.

Again, this is occurring on ALL CPU Cores!



DaveB

#18
I repeated the same test on the Z490 i9-10900K and saw the same issue but with spike durations short enough not to trigger LatencyMon audio issues error.

So, there is definitely something happening every 30 minutes to cause a long DPC duration and affecting ALL cores!

Here we have a different motherboard and processor. The i9-10900K has fewer cores, so my hunch at this point is some Windows 11 "feature" that is affecting all CPU cores, but with less cores to manage, less duration in servicing them.

Next steps:
  • Check for similar behavior in Windows 10 on the Z790 motherboard. (Dual-boot)
  • Disable cores on the i9-14900K to match available cores on the i9-10900K and check for duration change.

p.s. Isn't it interesting that with more cores and more processing power, the affect is that the DPC duration issue gets WORSE? I think I'm going to find that THIS is why people are saying that things like disabling hyperthreading or disabling E-cores is "solving" their DPC issues, when it is really only mitigating the cause.



DaveB

I've now seen that the DPC execution spikes also occur on Windows 10 but are shorter in duration than those seen in Windows 11 on the same hardware.

I don't yet know what is causing these spikes, but I have now seen the spikes on two different motherboards, one motherboard with both Windows 10 & 11. The spikes are occurring every 15 minutes +/-30 seconds or so. The "random" nature of the spikes is simply in their duration. Sometimes the duration is longer, particularly on Windows 11.

I don't yet know the cause nor the source of variance in duration.

Within a windows around these spikes, the only interrupts I see occurring are dxgkrn.sys on CPU 0 and do not seem to be the cause, but can't say for sure yet. No smoking gun yet!

DaveB

I posted this on Steinberg Developer Forum: (My apologies, but I'm trying to get some help!)

I'm at a bit of a loss where I should post information I'm discovering about the DPC issues causing problems with Cubase!

I want to be clear that my intention is not solely to bash ANYONE. That said, I feel I have gone well beyond the call of duty attempting to analyze what is going on here with Windows/NVidia/Steinberg/whomever and I, frankly, am uncertain about where or how to address it, and to get some help from people who should be focused on a solution.

I have been collecting information about what's going on here and I am at a loss how I make all the "players" in this scenario aware, and perhaps get a little help in figuring out the details, or even if all said players are already aware and researching solutions.

I have definitive evidence that SOMETHING is occuring every 15 minutes in both Windows 10 and Windows 11 that is causing DPC execution time spikes. There are TWO spikes every 15 minutes and the only thing RANDOM about them is their duration varies a bit. The other thing is that these spikes are consistently longer in Windows 11.

So, my question is: where and how should I present the data such that the "experts" can offer guidance?

These spikes are ALMOST like clockwork (NOT random), and I believe that the apparent randomness depends entirely on what happens to be going on when these spikes occur. Also, these spikes affect ALL CPU cores!

I have collected data to support my findings and have some theories why some of the things on peoples' "try this" list might mitigate the symptoms, but do not address the underlying problem(s).

My question is this: Where is the appropriate forum to post my findings in order to get some technical help further analyzing the ROOT CAUSE, rather than a list of "try this" to mitigate the symptoms?

Some expert help would be greatly appreciated. I can post DATA illustrating the issue(s). Help me help you!


DaveB

#21
Here's an update that for now I would take with some grains of salt:

Based on what analysis and limited knowledge I have; it appears that some of the factors affecting DPC latency and DPC execution time are at least tangentially related to Windows power management. I don't yet have enough information on how all this works to start mucking around with all the power tuning properties.

I have made two changes that have shown some improvement in BOTH DPC latency and execution time:

1. Decrease the processor performance time check interval (polling interval). The significantly reduces overall DPC latency by reducing the sample period of the performance management algorithms. This makes some sense, since the default for high-performance mode is 15ms. I found reports of some user(s) changing this to 5000ms. Without more information, I would not suggest such a radical change. I don't have enough information to suggest how this might impact an overclocked CPU, other than to say whatever algorithms in power management are contributing will take action slower, so transitions between power states, clock frequencies, core parking, etc would be slower, but also consume less CPU resources calculating changes. I have changed this from 15ms to 200ms which matches the default polling rate of the Balanced power plan. This produced a significant reduction in over DPC and ISR latency, while still monitoring power properties 5 times per second.

2. Disable C-States in the BIOS. This is a common suggestion as you search the web concerning latency. As I was analyzing the WPA data, it did appear that power management services and processors in C-State C2, may have been contributing to the DPC latency duration events. This also kind of makes sense, though I'm not sure how the duration is being measured. Clearly if a DPC is assigned to a core in the C2 state, it makes some sense that there would be a delay processing that could contribute to the duration.

I am not yet comfortable enough to recommend these changes, as I don't have enough information on how it might otherwise affect the system, especially in overclocked systems where slowing the power management may potentially impact thermal performance.

The spikes I mentioned previously are still occurring, but their duration is now low enough that LatencyMon is not triggering the unsuitable for real-time audio warning. I do not know if a longer analysis will show that the duration is still sometimes exceeded, but so far 90 minutes has not caused the warning from LatencyMon on my Z790 i9-14900K system. Given these durations are already shorter on the Z490 i9-10900K, I'm speculating that there would be similar improvement there.

I still need to experiment and attempt to confirm, but this looks like potential progress and may offer further clues.

I think ultimately if power management can maintain enough cores in a C0 state AND determine what processes should be isolated to these "protected" cores, this may be a viable solution. I'm not certain what metrics would verify this, but it seems a decent comprise to get some advantage of both performance and DPC latency while also managing lower power consumption.

Ultimately, controlling C-states per processor along with affinity seems a viable solution in the OS rather than disabling C-States entirely in the BIOS. Much too early to state definitively.

DaveB

Update:

I ran LatencyMon with the above changes for 11 hours with no "not suitable for audio" warning.

I still am not comfortable with the duration of the 15-minute DPC execution duration spike but it is now reduced to the 850us range (peak), as opposed to as much as 2-3ms.

This was accomplished without disabling startup programs that might reduce it a bit more.

The change to the 200ms power polling made a significant difference in DPC latency, and combined with disabling C-States has also reduced the maximum DPC execution duration.

At this point, I am suspecting that power management is a contributor, if not cause, of DPC issues.

I am not ready to recommend these changes, but you may want to experiment to see if it helps your situation.

I am curious if imposing some constraints on power monitoring might further improve the results. My preference would be to only affect things like C-States and power monitoring when running affected apps and NOT by BIOS changes.

When I'm more confident of a solution, I want to repeat the Cubase stress test, and also determine if some of the priority tweaks in Process Lasso are needed AFTER these changes.

If the Steinberg/Microsoft team(s) discover a recommended solution, I hope they will give a reasonable summary of what was changed and why.