Uses only 1 core of multicore processor? :: Satisfactory Bugs & Technical Issues

Omega420

Dec 24, 2022 @ 8:27am

Originally posted by RonJ:
I get a random crash. I thought maybe it was heat related. Starting monitoring heat. Nope. No heat issues. I did notice that the game only uses one of the cores at about 27%. I have a AMD Ryzen 9 16 core 32 processes CPU. One of the best. Is that normal?

Satisfactory is mostly a single threaded process. Also, since you have AMD and its mainly a single process. If you watch carefully, you will see that the speed of your two best cores are always running faster and that the Satisfactory thread is bouncing between these cores.

Perfectly normal.

EDIT:
Multicore for gaming is extremely overrated as most games still don't use more then 4 cores. Part of this is because companies don't program for more cores and the other is the simple fact that anything outside of number crunching and video rendering almost never needs more then 4 cores.

Last edited by Omega420; Dec 24, 2022 @ 8:32am

#1

Farradin

Dec 24, 2022 @ 8:30am

Originally posted by Omega420:
Originally posted by RonJ:
I get a random crash. I thought maybe it was heat related. Starting monitoring heat. Nope. No heat issues. I did notice that the game only uses one of the cores at about 27%. I have a AMD Ryzen 9 16 core 32 processes CPU. One of the best. Is that normal?

Satisfactory is mostly a single threaded process. Also, since you have AMD and its mainly a single process. If you watch carefully, you will see that the speed of your two best cores are always running faster and that the Satisfactory thread is bouncing between these cores.

Perfectly normal.

Thanks for the info!

#2

Omega420

Dec 24, 2022 @ 8:34am

Originally posted by RonJ:
Originally posted by Omega420:

Satisfactory is mostly a single threaded process. Also, since you have AMD and its mainly a single process. If you watch carefully, you will see that the speed of your two best cores are always running faster and that the Satisfactory thread is bouncing between these cores.

Perfectly normal.

Thanks for the info!

Your welcome and should you take a look at my screenshots, many of them have a overlay that shows my 5800x behaving the same way.

#3

Die Hand Gottes has Satisfactory

Dec 24, 2022 @ 11:52am

I use process lasso so I can manage the cores and many other settings
https://steamcommunity.com/sharedfiles/filedetails/?id=2898373748
All cores are used

Last edited by Die Hand Gottes; Dec 24, 2022 @ 11:54am

#4

Omega420

Dec 24, 2022 @ 12:07pm

Originally posted by Die Hand Gottes:
I use process lasso so I can manage the cores and many other settings
https://steamcommunity.com/sharedfiles/filedetails/?id=2898373748
All cores are used

If your using software to force the game to run on all cores, your undermining AMD's Zen algorithm's along with windows task scheduler.

Forcing all your cores to work on something that doesn't require all cores results in your cores running at or near max speed causing unnecessary heat.

#5

CryptEarth has Satisfactory

Dec 24, 2022 @ 1:31pm

well - as satisfactory is pretty much factorio in 3d it has lot of potential for parallel computing - the issue is that the devs made some negatively impactful descisions and introduced game-ticks and try to calculated the whole game within those
to me - and this is just personal opinion - if the devs have the same understanding of parallelization as Snutt with his completely wrong "1 baby takes 9 months - but 9 mothers can't do it in 1" it makes sense as this kinda wrong
the idea behind parallelization is not to speed up one process but to execute several processes at the same time - or to correct Snutt once again: the correct analogy would be: "1 mother takes 9x 9months to make 9 babies - but 9 mothers can to it in just 1x 9months"
or in terms of the game: although big factories have some combine spots much of the game does in fact run in parallel idependent of eachother - which can be calculated by multiple cpu cores at the same time which only leaves the connected ones as one big tree - but the devs seem to have made the mistake to try to calculate everything in one big sequential tree

#6

cswiger Dec 24, 2022 @ 3:22pm

Originally posted by CryptEarth:
to me - and this is just personal opinion - if the devs have the same understanding of parallelization as Snutt with his completely wrong "1 baby takes 9 months - but 9 mothers can't do it in 1" it makes sense as this kinda wrong

Actually, it is completely correct. You can't divide up every task into smaller parts and run them in parallel-- indivisible single-threaded tasks are common. While there certainly can be tasks which parallelize well, it is much harder to write and debug multithreaded code.

Since what I've said is obvious to anyone who has written multithreaded code, I'm going to surmise that you aren't a programmer who has written such code. Is my assumption right?

#7

Omega420

Dec 24, 2022 @ 4:10pm

Originally posted by CryptEarth:
well - as satisfactory is pretty much factorio in 3d it has lot of potential for parallel computing - the issue is that the devs made some negatively impactful descisions and introduced game-ticks and try to calculated the whole game within those
to me - and this is just personal opinion - if the devs have the same understanding of parallelization as Snutt with his completely wrong "1 baby takes 9 months - but 9 mothers can't do it in 1" it makes sense as this kinda wrong
the idea behind parallelization is not to speed up one process but to execute several processes at the same time - or to correct Snutt once again: the correct analogy would be: "1 mother takes 9x 9months to make 9 babies - but 9 mothers can to it in just 1x 9months"
or in terms of the game: although big factories have some combine spots much of the game does in fact run in parallel idependent of eachother - which can be calculated by multiple cpu cores at the same time which only leaves the connected ones as one big tree - but the devs seem to have made the mistake to try to calculate everything in one big sequential tree

This gave me a headache and I'm not a programmer but even I can tell you have no idea what your talking about.

The idea of parallelization is in fact to speed up a single task by splitting it across several cpu cores to speed up the results or run several tasks at the same time as you mentioned.

In real life, it doesn't matter how many mothers you have because it's physical impossible to speed up the process of having a baby. Making the point make perfect sense.

Programming software has limits and conditions that need to be followed, not all tasks can be split and/or should be because it actually degrades performance of other parts of the program or causes more headaches then what the performance gain is worth.

#8

mansman

Dec 25, 2022 @ 11:32am

Originally posted by cswiger:
Originally posted by CryptEarth:
to me - and this is just personal opinion - if the devs have the same understanding of parallelization as Snutt with his completely wrong "1 baby takes 9 months - but 9 mothers can't do it in 1" it makes sense as this kinda wrong
Actually, it is completely correct. You can't divide up every task into smaller parts and run them in parallel-- indivisible single-threaded tasks are common. While there certainly can be tasks which parallelize well, it is much harder to write and debug multithreaded code.

Since what I've said is obvious to anyone who has written multithreaded code, I'm going to surmise that you aren't a programmer who has written such code. Is my assumption right?

World simulation games (like satisfactory) are one of the best targets for parallel processing within the larger game industry. It requires forethought when designing the information set so that multiple processes can act on it simultaneously while maintaining efficiency. It sounds like past decisions for this game have likely made introduction of multiple threads a much larger lift than it should be.

Most developers struggle with designing, coding, and testing multi-threaded applications because they simply don't understand the underlying mechanisms that make the technique possible. I can't tell you how often I find an integer or discrete structure of atomic elements wrapped with a global locking object. The proper way to share atomic elements between threads is to leverage the processor's built in atomic memory operations, but most developers don't even know they exist.

Take the splitter building today. It has a single internal buffer space for items. If each destination conveyor belt were processed on independent threads, you would need to coordinate access to that buffer across all three threads. However, if you create separate buffers for 1 input and 3 outputs, you can then migrate the buffer to buffer movement of objects into it own discrete task. Then the conveyor tasks could operate independently of each other as they each have a dedicated buffer to access without a single lock.

If your information set is designed for parallel processing you can eliminate most of the possible thread contention from ever being required. Its this initial design effort that is often lacking which causes knock-on effects that increase the complexity and subsequently the overall fragility of multi-threaded systems.

#9

cswiger Dec 25, 2022 @ 12:59pm

Originally posted by mansman:
World simulation games (like satisfactory) are one of the best targets for parallel processing within the larger game industry.

Some aspects of simulation games are well-suited for parallel processing.

It requires forethought when designing the information set so that multiple processes can act on it simultaneously while maintaining efficiency. It sounds like past decisions for this game have likely made introduction of multiple threads a much larger lift than it should be.

Most games aren't designed devoid of any starting context, such as the choice of Satisfactory using Unreal Engine 4. Having your devs reinvent the wheel from scratch means they are writing generic engine code rather than focusing on the game-specific code.

Most developers struggle with designing, coding, and testing multi-threaded applications because they simply don't understand the underlying mechanisms that make the technique possible.

Simple does not mean easy. Hitting a chisel with a hammer is simple.
Carving Michelangelo's David wasn't easy-- it required a great deal of skill.

Developers struggle with writing multithreaded code because parallelized code is extremely hard to deal with, and not because they fail to understand the basics of atomic operations. Atomic operations are simple, but not easy to use correctly.

I can't tell you how often I find an integer or discrete structure of atomic elements wrapped with a global locking object.

It is not uncommon for global interpreter locks to exist in interpreted languages like Python and Ruby. It is almost universal that higher-level object-oriented languages create a mutex lock per instance of each object.

The proper way to share atomic elements between threads is to leverage the processor's built in atomic memory operations, but most developers don't even know they exist.

Your assumption only works on platforms with total store order, and is not sufficient even then because any CPU which implements speculative execution can change the order of operations. Platforms which implement partial store order need membars, fences, or equivalent operations to ensure consistency even for what the processor considers to be an atomic operation.

https://dev.to/kprotty/understanding-atomics-and-memory-ordering-2mom

Take the splitter building today. It has a single internal buffer space for items. If each destination conveyor belt were processed on independent threads, you would need to coordinate access to that buffer across all three threads. However, if you create separate buffers for 1 input and 3 outputs, you can then migrate the buffer to buffer movement of objects into it own discrete task. Then the conveyor tasks could operate independently of each other as they each have a dedicated buffer to access without a single lock.

Using fine-grained locking often hurts performance rather than helping.

Your proposal would change a single lock for the splitter into 4 locks: one for the input buffer and 3 for the buffers to the output belts. Each simulation tick would always need to acquire two locks, namely the input and the chosen output lock. You are most likely going to see better performance using a single lock rather than trying to create fine-grained locks because of the need to acquire two locks every time.

If your information set is designed for parallel processing you can eliminate most of the possible thread contention from ever being required. Its this initial design effort that is often lacking which causes knock-on effects that increase the complexity and subsequently the overall fragility of multi-threaded systems.

You can improve performance by using thread local storage for data structures when possible, not by refactoring code into fine-grained locks regardless of the usage patterns.

Separate miners or constructors are examples of things which can safely run independently, and could be evaluated using parallel algorithms because they do not interact. By contrast, a splitter needs locking because the distribution of incoming items to the output belts is an interaction which requires thread safety.

#10

Omega420

Dec 25, 2022 @ 1:30pm

I'm always amazed how I can go into a game forum and learn about something that has nothing to do with gaming at all.. :)

As I admitted, I'm not a programmer. Tired back in my late teens/ early 20s as hobby. What a headache...

Out of curiosity and stupidity, why is a constructor considered to not interact compared to a splitter when it consumes and produces materials from in and outputs? Is that not considered interacting?

#11

cswiger Dec 25, 2022 @ 5:50pm

Originally posted by Omega420:
Out of curiosity and stupidity, why is a constructor considered to not interact compared to a splitter when it consumes and produces materials from in and outputs? Is that not considered interacting?

A constructor would need one lock for itself. But each constructor does its own thing independently; they only interact with the input and output belts, if such are connected.

Again, you could create a pair of locks for the constructors' input and output buffers.
But you'd tend to have to obtain both of them whenever the constructor builds a new item, so using multiple fine-grained locks would very likely not help.

#12

mansman

Dec 25, 2022 @ 6:10pm

Originally posted by cswiger:
Most games aren't designed devoid of any starting context, such as the choice of Satisfactory using Unreal Engine 4. Having your devs reinvent the wheel from scratch means they are writing generic engine code rather than focusing on the game-specific code.

The simulation should be separate and independent of the render model that informs the underlying game engine of model updates. The render model reflects changes made during simulation but is not the same thing (hopefully).

It is not uncommon for global interpreter locks to exist in interpreted languages like Python and Ruby. It is almost universal that higher-level object-oriented languages create a mutex lock per instance of each object.

Most of these locks are lazy created by the VM at runtime the first time they are requested now to avoid having all those locks just laying around in memory and not being used. Object locks should be the last resort when all other options have been explored and failed.

Your assumption only works on platforms with total store order, and is not sufficient even then because any CPU which implements speculative execution can change the order of operations. Platforms which implement partial store order need membars, fences, or equivalent operations to ensure consistency even for what the processor considers to be an atomic operation.

https://dev.to/kprotty/understanding-atomics-and-memory-ordering-2mom

Interesting article. The spec exec example used two data items: one that is accessed atomically and the other which is accessed raw from multiple threads. The author is trying to illustrate that some sections of code operate on many different data items to perform their work and all data access must be taken into account individually and as a group when dealing with multiple threads.

Your proposal would change a single lock for the splitter into 4 locks: one for the input buffer and 3 for the buffers to the output belts. Each simulation tick would always need to acquire two locks, namely the input and the chosen output lock. You are most likely going to see better performance using a single lock rather than trying to create fine-grained locks because of the need to acquire two locks every time.

It's my fault for not clearly describing the scenario. The tasks to move items from the input buffers to the output buffers of splitter buildings would execute and complete prior to executing the task set for pulling items from the output buffers onto the destination conveyor belts. The task sets as a group are executed in sequence with each other so that multiple tasks accessing those buffers are never concurrently running. There would be 0 locks in this scenario beyond the task scheduling locks.

Separate miners or constructors are examples of things which can safely run independently, and could be evaluated using parallel algorithms because they do not interact. By contrast, a splitter needs locking because the distribution of incoming items to the output belts is an interaction which requires thread safety.

This is my point exactly. All buildings (including conveyor lengths), not just miners can be calculated independently of each other with some minor modification to how the internal buffers are allocated and correct sequencing of their related task sets. All locks can be removed from the information set itself. The only locks being those used to manage task scheduling.

#13

cswiger Dec 25, 2022 @ 7:13pm

Originally posted by mansman:
Originally posted by cswiger:
Most games aren't designed devoid of any starting context, such as the choice of Satisfactory using Unreal Engine 4. Having your devs reinvent the wheel from scratch means they are writing generic engine code rather than focusing on the game-specific code.
The simulation should be separate and independent of the render model that informs the underlying game engine of model updates. The render model reflects changes made during simulation but is not the same thing (hopefully).

It is wonderful when the simulation and the display engine are separate, but it is a lot of work and a complete separation of the two is rarely done in general.

The Unreal engine is not just a render system, and Satisfactory uses Unreal's UObjects as the base class of the Satisfactory buildings and so forth.

It is not uncommon for global interpreter locks to exist in interpreted languages like Python and Ruby. It is almost universal that higher-level object-oriented languages create a mutex lock per instance of each object.
Most of these locks are lazy created by the VM at runtime the first time they are requested now to avoid having all those locks just laying around in memory and not being used.

In point of fact, the Python GIL was designed to facilitate reference counting memory management, rather than per-instance locks and refcounts.

Object locks should be the last resort when all other options have been explored and failed.

It is a lot more common for enterprise software developers to write heavily threaded code in Java or the like (Scala? Rust?) using per-instance object locks than it is for folks to roll their own fine-grained locks in native C or C++.

It's my fault for not clearly describing the scenario. The tasks to move items from the input buffers to the output buffers of splitter buildings would execute and complete prior to executing the task set for pulling items from the output buffers onto the destination conveyor belts.

How do you enforce this ordering of operations?

Kernel programmers might use continuations, but those are heavy-weight compared to mutexes or RWLocks.

The task sets as a group are executed in sequence with each other so that multiple tasks accessing those buffers are never concurrently running. There would be 0 locks in this scenario beyond the task scheduling locks.

If you implement a global lock over the task of processing items moving on belts, then you've prevented multiple threads from doing work in parallel.

Even for such a simple case, it is not easy to determine how to divide the work up onto many threads without using per-object locks.

#14

mansman

Dec 25, 2022 @ 8:06pm

Disclaimer: This processing approach is specific to applications that implement a main loop, such as simulation games. It is not intended for use in general multi-threaded applications such as a web server or deep packet inspector. It's actually designed in such a way to allow easier migration from the single thread game loop model commonly used to a multi-threaded model.

Basically you still have a primary game loop thread that is responsible for executing the same task sequencing it already does, it just doesn't end up doing all the actual task work as well. Each section of the game loop that iterates over similar objects, such as moving items on conveyor belts over one space, the task space is divided into discrete tasks. These tasks are then allocated to a number of pre-existing worker threads that were created based on available system cores.

The main loop thread also keeps a number of tasks to execute itself so it's not just waiting around for the other tasks to complete. When it completes its work, it checks to see if the remaining tasks are also complete or waits for their completion. Once all work associated with that section of the game loop is finished it moves on to the next section using the same map-reduce style of processing.

Since each task is discrete and no portion of the underlying information set is shared between tasks, they can be completed without locking. The only locking is the work threads contending with each other to pick up available tasks off the task queue placed there by the main loop thread. The main loop thread ensures that the entire task set is complete before continuing on to the next. This is how you avoid collision without locks.

The main benefit is that you don't have to completely redesign the entire game. You are doing the exact same thing you used to do on a single thread but allowing the work to proceed in parallel where possible. You typically only need to make minor changes to the information set (like dividing buffers into discrete elements) to be compatible with the approach.

#15

Satisfactory

Report this post