NumaPref launches a child process with a preferred NUMA node extended attribute (PROC_THREAD_ATTRIBUTE_PREFERRED_NODE).
Note that this same functionality is provided in cmd.exe's
Start /NODE. It is exposed here as part of additional experimentation so that we have further control.
When combined with Corepios's NUMA Dissociater (https://bitsum.com/portfolio/coreprio/), this operation is able to achieve better consistency in the massive (2X) improvements in 7-zip and Indigo Benchmarks on the AMD Threadripper 2990wx, 2970wx, and EPYC 7551.
Consistency is still not 100% though, at least for the 2990wx (EPYC has more consistency with the NUMA Dissociater fix) There is, however, consistency per NUMA node. A fast node will stay fast, it seems. Fast nodes are not dependent on the DMA nodes of the 2990wx. Bizarre.
This is still being digested. Experimental and results are tentative. I encourage user results.
Download: https://bitsum.com/files/numapref.exe
Usage:
USAGE: numapref.exe NUMA_NODE_NUM target.exe [target_params, ...]
EXAMPLES:
fixedlaunch.exe 1 7zg.exe b
fixedlaunch.exe 1 "indigo bnechmark.exe"
7z before:(https://bitsum.com/images/2990/7z_before.png)
7z after:(https://bitsum.com/images/2990/7z_after.png)