Multi-Threading and Multiple/Single Core CPUs

sunama · 2 Apr 2010 at 20:36

I am currently in the process of building a new application.

For the first time, I'm making extensive use of multithreading.

What I find is when I run the following subroutines without multithreading:

Sub1
Sub2
Sub3

...all sequentially (on my dual core cpu computer), the program itself grinds to a halt and becomes completely unresponsive, until the 3 subroutines have all completed their work.

However, when I assign those same 3 subroutines to 3 separate threads and then run those threads simultaneously, the program is responsive and if I hadnt programmed the application myself, I would never know that the program is actually "working" hard, in the background. The finish time of the subroutines is also faster.

Now, I am using a Core2Duo. So, when running a multithreaded program, it is logical to expect the program to run faster, compared to if an identical program is run, which is not equipped to deal with multithreading. In fact, this is what I have found above.

Q1. Would running a multithreaded program, on an older cpu, which has no HyperThreading and only a single logical cpu, allow the program to be run without making it completely unresponsive (which proved to be the case when using a multi core cpu), until the 3 subroutines have completed their work.

Q2. Would using a multi threaded application on a computer with a single logical cpu be a problem?

I ask this because I want the program to be run on older computers, PDAs and smartphones, which won't have multi core cpus.

Spyhop · 2 Apr 2010 at 21:10

Q1 and the basic findings don't make sense. Running them sequentially would take longer, but would have less impact on the rest of the computer because you wouldn't be vying for as much of the resources.

Q2 - No, a normal computer and OS has many more total application threads doing their thing than physical cores.

If the given PDA and smartphone programming environments allow you to write multi-threaded applications, it won't be a problem there either.

sunama · 3 Apr 2010 at 10:04

Sorry, I should've been a little clearer.

I've edited the original post.

Basically, when I run the program without multithreading (on my dual core cpu computer), only the program itself grinds to a halt and becomes completely unresponsive, until the 3 subroutines have all completed their work.

When I run those 3 subroutines, using threads, the program is responsive and if I hadnt programmed the application myself, I would never know that the program is actually "working", in the background.

Haircut · 3 Apr 2010 at 10:26

I'm guessing this is running on Windows?
If so, it sounds like you're simply executing the subroutines on the UI thread.
I say that because of how the Windows message loop works, I don't know if similar things apply on other OSs.

Without getting too into the details in Windows applications there is a loop whose job is to continuously process Windows messages, this runs on the main thread that the application starts on - this is known as the UI thread.

When your subroutines execute, if they are on that thread then the message loop can't process messages as that thread is busy running stuff in your subroutines, hence why the UI appears to lock up.
When the subroutine finishes the message loop continues processing any messages that have been put into its queue and the UI responds again.

So, if you can use another thread to do the processing the UI thread can happily go on processing messages, the UI remains responsive and the logic gets processed separately.
Obviously if you have a multi-core CPU then the subroutines will generally get processed using a separate physical core on the CPU, but even on a single core CPU this type of thing has a massive benefit.

Even if you can't process two things at once Windows will use context switching to enable the single core to process the messages in a timely fashion and then process the stuff in your subroutines without having the UI lock up.

NickK · 3 Apr 2010 at 10:34

Multithreading can be used for many things. Some are more 'hacks' to get around issues of response times to make the program easier to design whereas others are more to make proper use of the processing power available.

sunama said:
Q1. Would running a multithreaded program, on an older cpu, which has no HyperThreading and only a single logical cpu, allow the program to be run without making it completely unresponsive (which proved to be the case when using a multi core cpu), until the 3 subroutines have completed their work.

Firstly - hyperthreading has nothing todo with threading in terms of OS support for multiple threads. It's Intel's way of attempting to make more use of the hardware thus being more efficient.

Yes it's entirely possible to make a single or multiple core CPU unresponsive.

Never assume that a multithreaded application will run in the same sequence unless explicitly designed todo so.

sunama said:
Q2. Would using a multi threaded application on a computer with a single logical cpu be a problem?

No. Most operating systems run processes and threads without multi-core support. The only difference is the OS doesn't have any other cores to schedule threads in a runnable state on so they have to wait until for their time slot on the single core.

NathanE · 3 Apr 2010 at 11:26

sunama said:
I am currently in the process of building a new application.

For the first time, I'm making extensive use of multithreading.

What I find is when I run the following subroutines without multithreading:

Sub1
Sub2
Sub3

...all sequentially (on my dual core cpu computer), the program itself grinds to a halt and becomes completely unresponsive, until the 3 subroutines have all completed their work.

That's because your program is single threaded. It can only do one thing at a time. Be that, redraw its user interface (or update console output) or actually run your subroutines.

Generally, when creating a GUI application you don't want to be executing long-running tasks on the the same thread that is running the GUI.

You want to schedule that long-running task on another thread (usually a thread pool if your base class library has one). Then you can create the GUI to show some sort of progress bar dialog.

However, when I assign those same 3 subroutines to 3 separate threads and then run those threads simultaneously, the program is responsive and if I hadnt programmed the application myself, I would never know that the program is actually "working" hard, in the background. The finish time of the subroutines is also faster.

This is correct. You have to be careful though. As soon as you start getting into multi-threading you open yourself up to a whole world of hurt. There are subtle concurrency edge cases to be considered. And of course there is the golden rule of never trying to do something GUI related from one of these "background" threads.

PS: Never spawn your own threads unless you're absolutely certain you need to. Use a thread pool in 95% of cases.

Now, I am using a Core2Duo. So, when running a multithreaded program, it is logical to expect the program to run faster, compared to if an identical program is run, which is not equipped to deal with multithreading. In fact, this is what I have found above.

The kernel's thread scheduler will automatically "load balance" runnable threads across all available cores. Since you have a C2D, there will be two threads running at any given time - which is of course better than just one (uniprocessor).

Q1. Would running a multithreaded program, on an older cpu, which has no HyperThreading and only a single logical cpu, allow the program to be run without making it completely unresponsive (which proved to be the case when using a multi core cpu), until the 3 subroutines have completed their work.

Yes. Because the kernel's thread scheduler (at least on Windows and most modern general purpose OSes) performs time slicing, or what is collectively referred to as "pre-emptive multi tasking". It splits up CPU core time into time slices (a few micro seconds in duration) and shares these, based on various priority and equality algorithms, amongst runnable threads.

So it is kind of like a pseudo form of concurrency. It happens so fast that you don't really notice any delay on the GUI. But as we all know, things move fast in computing and nowadays running a single-core PC on something like Windows 7 with full Aero Glass isn't necessarily going to give you the best experience.

Q2. Would using a multi threaded application on a computer with a single logical cpu be a problem?

No. Merely that, as a programmer, you must understand that "true" concurrent multi-threading is not happening. It is merely a virtualisation.

It is still possible, for instance, for race conditions to occur. A 32-bit thread might be incrementing a 64-bit variable and the kernel's thread scheduler might decide to pre-empt that thread half way through the incrementation operation. Then maybe another thread which is sharing that same variable might try to read it but because the thread it preempted hadn't finished its incrementation - the new thread would basically be reading corrupted data. Clearly this is a simple example of atomicity issues, as well as synchronisation/locking.

I ask this because I want the program to be run on older computers, PDAs and smartphones, which won't have multi core cpus.

As above, the same concurrency issues that you will come up against in true concurrency will still come up on single processor environments. Sure, they might happen less frequently but that just makes them all the more annoying and hard to debug.

sunama · 3 Apr 2010 at 11:31

Haircut said:
I'm guessing this is running on Windows?

Correct

Haircut said:
If so, it sounds like you're simply executing the subroutines on the UI thread.

Correct.

Haircut said:
Without getting too into the details in Windows applications there is a loop whose job is to continuously process Windows messages, this runs on the main thread that the application starts on - this is known as the UI thread.

When your subroutines execute, if they are on that thread then the message loop can't process messages as that thread is busy running stuff in your subroutines, hence why the UI appears to lock up.
When the subroutine finishes the message loop continues processing any messages that have been put into its queue and the UI responds again.

So, if you can use another thread to do the processing the UI thread can happily go on processing messages, the UI remains responsive and the logic gets processed separately.
Obviously if you have a multi-core CPU then the subroutines will generally get processed using a separate physical core on the CPU, but even on a single core CPU this type of thing has a massive benefit.

Even if you can't process two things at once Windows will use context switching to enable the single core to process the messages in a timely fashion and then process the stuff in your subroutines without having the UI lock up.

Well explained. Thanks.

NathanE: thanks for your explanation. I'm very new to multithreading and only started playing around with this a few days ago, discovering in the process that my program seems to run more smoothly when using threads. Until now, all the threads whcih I've created have been my own (not threadpool), but I shall experiment with the threadpool later on today. In all fairness, I don't think I will bother using threads for small (2s-3s) tasks. Using the threadpool seems to have problems all of its own, which I want to avoid. I always aim to keep the program as simple as possible, even if there is a "slight" performance hit. It makes it easier when it comes to debugging and developing the program further.

Haircut · 3 Apr 2010 at 13:11

Knowing when and how to use threads effectively is a very useful skill to have, especially now that many CPUs are multi-core.
As NathanE says you can be in for a world of pain if you get it wrong when multi-threading.

The application I'm working on at work at the moment has what I can only describe as thread diarrhoea because lots of the developers seem to have thought: using lots of threads = good and will make the application faster and more responsive.

Out of interest what language are you coding in?

sunama · 3 Apr 2010 at 13:17

Haircut said:
The application I'm working on at work at the moment has what I can only describe as thread diarrhoea because lots of the developers seem to have thought: using lots of threads = good and will make the application faster and more responsive.

Thats exactly what I want to avoid doing - adding threads, just for the sake of it. If possible, I would prefer to use as few threads as possible, to avoid other problems further down the line which could result in hours/days lost in debugging.

Haircut said:
Out of interest what language are you coding in?

VB.net, but if the application proves successful, then I will have to re-code it in other languages.

sunama · 3 Apr 2010 at 17:02

NathanE said:
PS: Never spawn your own threads unless you're absolutely certain you need to. Use a thread pool in 95% of cases.

In total, I manually created 13 of my own threads. It takes roughly 3.5s for those 13 threads to complete their tasks.

I then played with the threadpool and rather than creating my own threads, I used the threadpool. The same operation took 18.5s.

I'm not saying that my testing is definitive, but I can already see that using the threadpool isn't always the best way to get the job done.

Goksly · 3 Apr 2010 at 21:18

NathanE said:
PS: Never spawn your own threads unless you're absolutely certain you need to. Use a thread pool in 95% of cases.

Can you expand on that point? It's been a while since I last had to dabble in some multi-threading, but last time I settled on creating my own threads (.Net 2.0 was the platform) and I *think* it was because I wanted to pass a method with parameters in to run on the new thread (wasn't available for thread pool threads iirc? at the time).

Is this something that can be done in the thread pool, plus I though if you were going to spawn a fair few threads (say >10) it was recommend to use your own rather than the shared thread pool.

Have things changed?

sunama · 3 Apr 2010 at 23:13

I was playing around with .net and threads today.

Goksly said:
... I wanted to pass a method with parameters in to run on the new thread (wasn't available for thread pool threads iirc? at the time).

Is this something that can be done in the thread pool, ...

There is a way of passing parameters using threads (both making your own threads and using the threadpool). We do this by:
creating a class (which includes properties and a sub/method).
then creating an object of that class, passing the parameters, as well
then creating the thread - after 'address of' we state the name of object, followed by the name of the method.

This worked for me, however, things got a little complicated when I needed to create multiple objects of that class and multiple threads, using those objects. At this moment I realised that should I ever need to debug, passing parameters into threads, using the above technique could prove extremely difficult.

Based on what I experienced today, I won't be passing parameters into threads, unless it is absolutely necessary and there is no other way around it.

Haircut · 4 Apr 2010 at 08:49

sunama said:
In total, I manually created 13 of my own threads. It takes roughly 3.5s for those 13 threads to complete their tasks.

I then played with the threadpool and rather than creating my own threads, I used the threadpool. The same operation took 18.5s.

I'm not saying that my testing is definitive, but I can already see that using the threadpool isn't always the best way to get the job done.

Something really doesn't sound right with that, the ThreadPool definitely shouldn't take that much longer than creating your own threads.
Do you have a code sample that you can post up?

Goksly said:
Can you expand on that point? It's been a while since I last had to dabble in some multi-threading, but last time I settled on creating my own threads (.Net 2.0 was the platform) and I *think* it was because I wanted to pass a method with parameters in to run on the new thread (wasn't available for thread pool threads iirc? at the time).

Is this something that can be done in the thread pool, plus I though if you were going to spawn a fair few threads (say >10) it was recommend to use your own rather than the shared thread pool.

Have things changed?

The QueueUserWorkItem on the ThreadPool class has an overload that takes in a state object, you would use this to pass parameters to your method:
http://msdn.microsoft.com/en-us/library/4yd16hza.aspx

As for when to create your own threads, just because you have > 10 doesn't necessarily mean that the ThreadPool is a bad idea. In fact the more tasks you have to process the more worthwhile the ThreadPool becomes.
If you want to change the attributes on a thread, such as upping the priority then this is a valid case for spawning your own thread.
Also if you have something that you want to run for a long time on a separate thread then this is another instance where you should create your own, but I can't think of many other times when the ThreadPool shouldn't be used.

sunama · 4 Apr 2010 at 10:46

Haircut said:
Something really doesn't sound right with that, the ThreadPool definitely shouldn't take that much longer than creating your own threads.
Do you have a code sample that you can post up?

I could. The only problem is that the code is very very long and I wouldn't want to bore you guys with the functions/subs being called.

What I will say is that the functions being called relate to loading of data from txt files, line by line.

Creating my own threads seems faster to achieve this. I spent a few hours last night switching from Threadpool and spawning. For smaller operations which take less than 1s, there is very little in it. But once the operation gets longer/bigger the thread spawning seemed to be the way to go.

In some instances, I've avoided threads altogether as simply calling functions (without linking a thread to it), worked faster.

I've also taken the advice of moving the GUI out on its own thread, which has worked a treat.

In summary, I have a situation where the GUI/Form loads up within 1-2s and is responsive thereafter, while the start-up routines (courtesy of multithreading) are loaded up within the following 3-5s. That'll do me fine.

NathanE · 4 Apr 2010 at 12:54

The advantage of ThreadPool is that it is kind of "fire and forget". You enqueue "jobs" for it to do and then, quite often, don't have to worry about it any further from that thread.

The ThreadPool scales better as well because it dynamically changes how many threads are executing concurrently at any given time. It takes into account how many processors are available on the machine, for example.

You have to be careful that you don't go down the the "thread per task" route like many of the first multi-threaded games did. This is a blind alley and will result in scaling problems and eventually a lot of rework.

The best multi-threaded programs use the number of processors in the machine as a factor in determining the number of threads they execute. Coupled with a heavily "job/task packet" oriented architecture.

Spawning a thread is an intensive and costly process on Windows. A ThreadPool is essentially a prepared queue of pre-spawned threads - so that intensive process has already been done and doesn't affect your program. Of course, sometimes the ThreadPool will need to spawn more threads to cope with bursts in load but this happens transparently from your application.

The other thing to bare in mind with the ThreadPool is that it defaults to some rather silly limits. I think it uses 1000 threads as the maximum (!). This is ridiculous for an end-user application, but perhaps understandable from a BCL framework point of view. So clearly, if you were to queue a very large number (say a couple hundred thousand) of long-running jobs to the ThreadPool you will find it will keep increasing and increasing its number of threads. Considering that each thread on Windows uses a minimum of 1MB of memory just in stack space alone, that is quite some cost. I've personally seen cases where the ThreadPool has grown to such a silly size that the system has ground to a halt.

.NET 4 contains a new System.Threading.Tasks namespace that builds upon the ThreadPool to improve it in this regard. Could be worth playing with that. Alternatively you can implement a custom pseudo-wrapper around ThreadPool that acts as a "throttle" by only allowing a certain number of tasks to be queued at any time. There's a good example of this concept here: http://blogs.msdn.com/psheill/archive/2005/06/07/426564.aspx