hyperthreading ? is it reallu useful ??????

FMTopfan · 23 Jan 2011 at 07:45

Another important feature alongside HT is 64-bit. Even video editing software can't use all cores equally to balance the load whilst programmed as x86. As with gaming we've seen that x64 makes little difference in distribution of core loads and was used mainly as a gimmick for AMD in FarCry in 2004. Like Crysis it was more like a tech demo and isn't anywhere as intensive compared to running large databases which is what it's designed for.

HT though is designed for mainstream use as most software can make use better of it. It's also useful if your building a media center PC to include a HT-capable CPU as the power usage will be lower due to extra 20-30% power compared to non-HT CPUs. I'm generalising there as I'm not sure exactly what the figure is. From Sandy Bridge CPUs using only 90w typical, the more cores available will equal even lower power consumptions whilst improving performance - as long as programmers keep improving their code. I've no idea how many cores is possible to fit on a die, there must be a upper limit. And then what, will 50 or 60 cores be enough by then as is 5 blades on a razor?

There is one good example of a certain software getting it right recently that I'm aware of. To render a 15min AVCHD clip; I noted in a magazine that Sony Vegas Movie Studio x86 took 46mins on a i7 860, 8GB RAM but only used 30% of the CPU. Meanwhile, PowerDirect 9 Ultra x64 took 31mins using 96% of the CPU. The previous version of PowerDirect was a regular x86 but CPU usage varied from 60 to 20% - a massive difference to performance.

I'm skimming lightly over the really technical aspects since this is best kept as a light-hearted exchange of opinions.

CmdrTobs · 23 Jan 2011 at 09:00

ATIorNvidia said:
Don't assume ....

Liar. You're back tracking now. Nobody said Bulldozer had HT (SMT). It was clearly stated it has CMT, a technical step up, NOT simply a derivative of SMT which is what you are implying I said because you can't find quotes of me saying any such thing.

I wastefully took the time to lay it out to you in one detailed post showing you where you were going wrong and what your mistakes were and you responded with ignorance. Go back and look at my precise language. Please stop replying too. Skin saving adds nothing to the debate. It was clear what you meant and you were wrong. Glad you engaged in some google to inform yourself (a little bit) and added a veiled retraction. Better you did that before you embarrassed yourself and derailed a thread.

My last words on the matter.

FMTopfan said:
There is one good example of a certain software getting it right recently that I'm aware of. To render a 15min AVCHD clip; I noted in a magazine that Sony Vegas Movie Studio x86 took 46mins on a i7 860, 8GB RAM but only used 30% of the CPU. Meanwhile, PowerDirect 9 Ultra x64 took 31mins using 96% of the CPU. The previous version of PowerDirect was a regular x86 but CPU usage varied from 60 to 20% - a massive difference to performance.

I'm skimming lightly over the really technical aspects since this is best kept as a light-hearted exchange of opinions.

I would like to know those technical aspects as a programmer. If you could explain them please. I would like know why you think the x64bit version had better CPU utilization as I can't think of a reason. At the moment I stay well away from x64bit native programming.

P.S is it something to do with getting results back in one register as opposed to 2(upper and lower)(does x64bit even increase reg size in some mode)? Does this save on mov operations and thus lessen the load on some FPU/ALU in a SMT race condition.... common help.

NathanE · 23 Jan 2011 at 10:33

CmdrTobs said:
I thought 2+ core CPU's ended that bottle neck for intensive threads as you can assigning the hundreds of semi-idle threads to one core. (provided windows scheduler works well - a whole other debate)

No. A core can only know about one thread at a time. That's the whole point. There is no such thing as a "semi idle" thread.

A context switch is expensive because the entire stack, registers etc must be persisted (stored) or restored into the core. It's a bit like hibernation of a whole PC except on a thread-level. This occurs potentially millions of times per second on the average PC. But because a context switch is purely a "cost" (i.e. it does not contribute to processing anything the user wants, it is purely an OS overhead) they are considered very expensive. Many OSes (including Windows) spend an awful lot of time trying to improve the context switching overhead, i.e. reduce it. The last major advance in this area came with AMD64 instruction set. Microsoft took advantage of certain new instructions (such as FXSTOR and FXRSTOR) that allow the loading and unlocking a thread stack to occur much faster. This was fortunate because without this enhancement the x64 kernel would have had much much slower context switching due to the sheer number of new registers that AMD64 has.

The Windows thread scheduler is one of the most advanced. It has intimate knowledge of hyperthreading, NUMA and AMD64 concerns. The NT kernel team put huge amounts of time into researching and devising optimisations in these areas. Really very very good hyperthreading optimisations have existed in the kernel since NT 5.1. But these facts are not surprising really given that the NT kernel has far more advanced threading support than other mainstream OSes.

Yes, I agree, will add that this does not automatically mean improved performance. Most applications don't spontaneous create threads during execution or can progress if a main/control/sync thread is stalled. so it's misleading to characterise this increased utilisation of elements as a flat out performance boost. AMD should be giving us more of a flat-out performance boost with bulldozer and CMT.

Programs that spawn threads at runtime (other than for dynamically scaling a thread pool) are generally considered to be badly written/conceived. But even when a thread pool exceeds a certain factor of the number of available logical processor cores, a program too can be considered badly designed. You will see over the coming years that NT's really rather excellent support for I/O completion ports becomes ever more prominent. It really is the holygrail design pattern for multi-threaded applications. Key server applications use it extensively, such as SQL Server, IIS Web Server. You find that games start using it too (I'm sure some already do). The .NET Framework already supports it at a low level for its implementation of thread pools.

But no, if the CPU can find empty pipelines into which it can inject parts of the other thread. Then this will *always* result in performance gains in total. Of course those gains could be lost if the thread subsequently tries to acquire a lock that another thread in the process is already holding. So then the additional pipeline utilisation would have course been "wasted" because the thread will have to sleep anyway.

CMT is a totally different design to SMT. SMT is very simplistic. It allows 2 threads to be scheduled on the same CPU core - primarily to help reduce the OS overhead of context switching. Then the CPU core decides how to best inject their work into its pipelines.

CMT is more advanced because whilst it presents itself to the OS in the same way as SMT (i.e. 2 logical cores for every 1 real core) it actually backs this up with dedicated pipelines and logic units for both threads. It depends on the design but often more complex logic units like for floating point are not duplicated and are still shared between both threads.

P.S This sort of thinking regarding system resources imo has led Microsoft to create cache heavy OS's under the guise of 'better utilisation of free memory'. When the end result is a OS that need 2GB of ram to feel like XP did with 1/4 that, and still thrashes the HDD to death. /end off topic rant.

Eh? We're talking about CPUs. Discussion about memory ends pretty much at the concepts of virtual memory and L3 cache. Discussion of Superfetch has no place here.

CmdrTobs · 23 Jan 2011 at 14:08

NathanE said:
No. A core can only know about one thread at a time. That's the whole point. There is no such thing as a "semi idle" thread.

Not sure where you are getting that I think an incorrect idea from, I think you should read it again mate. There are such things as "semi-idle" threads. The threads I was referring to are the general cabal of low priority threads that are spawned usually by an OS and spend most time sleeping with low priority.I think you were not reading me in the right context.

NathanE said:
A context switch is expensive because the entire stack, registers etc must be persisted (stored) or restored into the core. It's a bit like hibernation of a whole PC except on a thread-level. T....

I don't disagree with the facts of multitasking.... I do disagree with your view on windows handling on threads. There is plenty of evidence to the contrary. One of the most dramatic experiences I have personally had is in Supreme Commander. Windows insists on executing the various threads all on the same core. See here for

Even in casual observation it's noticed windows will overlove a certain core in a system. That to me is not a sign its advanced and unless you have seen Microsoft's closed source you can't really comment on how advanced it is in the presence of widespread evidence to the opposite.

NathanE said:
Programs that spawn threads at runtime (other than for dynamically scaling a thread pool) are generally considered to be badly written/conceived. But even when a thread pool exceeds a certain factor of the number of available logical processor cores, a program too can be considered badly designed. You will see over the coming years that NT's really rather excellent support for I/O completion ports becomes ever more prominent. It really is the holygrail design pattern for multi-threaded applications. Key server applications use it extensively, such as SQL Server, IIS Web Server. You find that games start using it too (I'm sure some already do). The .NET Framework already supports it at a low level for its implementation of thread pools.

I don't think i made a good/bad judgment. I simply siad most programs can't. That programing rule is loose too as it's highly dependent on the program and how may threads. I don't think we are disagreeing with anything tbh here.

NathanE said:
But no, if the CPU can find empty pipelines into which it can inject parts of the other thread. Then this will *always* result in performance gains in total.

You see, this time you added the key remark. "in total". What a user calls "performance" is rarely ever the raw computations of the system in that moment. It's the performance of the application in focus. When someone runs 3D mark they want that to run it as fast as possible,thats their right measure of 'performance'. If they drop a few points (due to blocks) but windows manages to send out a few network discovery packets and thus over all do more in that period. -People don't call that better performance. Do you understand why I wrote what I did now?

We agree on the technicalities, I just think you and others who say similar are being unintentionally misleading.

NathanE said:
Of course those gains could be lost if the thread subsequently tries to acquire a lock that another thread in the process is already holding. So then the additional pipeline utilisation would have course been "wasted" because the thread will have to sleep anyway.

It's not just that sense of waste. It's waste from the heat in SMT when it's not getting me any faster. Then, what if the 2nd thread was pushed through now your main thread wants the ALU but it has to wait a few ticks?

NathanE said:
CMT is a totally different design to SMT. SMT is very simplistic. It allows 2 threads to be scheduled on the same CPU core - primarily to help reduce the OS overhead of context switching. Then the CPU core decides how to best inject their work into its pipelines.

Again, this makes sense for a P4. I question the saving on a CPU that already has multiple cores. Context switching is only the bottleneck if you have 2 or more hefty threads fighting for time. Assuming a program has say 3 hefty threads (which is a lot for desktop duty) a typical quad-core will save no time from context switching. - With the proviso IF windows schedules properly and puts all the 'semi-idle' processes and thus threads on their own core.(u know what I mean; to have there threads be allocated to that physical CPU core and split time on it)

In short; Multiple cores have killed the context splitting bottleneck. What is your reply to this?

NathanE said:
CMT is more advanced because whilst it presents itself to the OS in the same way as SMT (i.e. 2 logical cores for every 1 real core) it actually backs this up with dedicated pipelines and logic units for both threads. It depends on the design but often more complex logic units like for floating point are not duplicated and are still shared between both threads.

I fully understand, I think you mean that at AMD guy.

NathanE said:
Eh? We're talking about CPUs. Discussion about memory ends pretty much at the concepts of virtual memory and L3 cache. Discussion of Superfetch has no place here.

Hence why I added it post scriptum and added the cheeky "/end off topic rant" :rolleyes:

AceTK · 23 Jan 2011 at 14:32

Really wish AMD would hurry up and bring bulldozer out so I can upgrade my current setup.

NathanE · 23 Jan 2011 at 15:48

CmdrTobs said:
I don't disagree with the facts of multitasking.... I do disagree with your view on windows handling on threads. There is plenty of evidence to the contrary. One of the most dramatic experiences I have personally had is in Supreme Commander. Windows insists on executing the various threads all on the same core. See here for

Even in casual observation it's noticed windows will overlove a certain core in a system. That to me is not a sign its advanced and unless you have seen Microsoft's closed source you can't really comment on how advanced it is in the presence of widespread evidence to the opposite.

You shouldn't form your own opinions based upon content written by the blind leading the blind on Internet forums trying to discuss things of which they have no understanding.

Here's a couple links to get you started:

HT: http://download.microsoft.com/downl...ae-9272-ff260a9c20e2/Hyper-thread_Windows.doc
NUMA: http://msdn.microsoft.com/en-us/library/aa363804(v=vs.85).aspx

I would also recommend you read the book called Windows Internals by a certain Mr Russinovich. Here are some excerpts from the Fifth Edition:

Chapter 2, Page 40:

"Hyperthreading is a technology introduced by Intel that provides many logical processors
on one physical processor. Each logical processor has its CPU state, but the execution engine
and onboard cache are shared. This permits one logical CPU to make progress while the
other logical CPUs are busy (such as performing interrupt processing work, which prevents
threads from running on that logical processor). The scheduling algorithms are enhanced to
make optimal use of multiprocessor hyperthreaded machines, such as by scheduling threads
on an idle physical processor versus choosing an idle logical processor on a physical processor
whose other logical processors are busy."

"In NUMA systems, processors are grouped in smaller units called nodes. Each node has
its own processors and memory and is connected to the larger system through a cachecoherent
interconnect bus. Windows on a NUMA system still runs as an SMP system, in that
all processors have access to all memory—it’s just that node-local memory is faster to reference
than memory attached to other nodes. The system attempts to improve performance
by scheduling threads on processors that are in the same node as the memory being used. It
attempts to satisfy memory-allocation requests from within the node, but will allocate memory
from other nodes if necessary."

Chapter 5, Processes, Threads, and Jobs, Page 443:

"Choosing a Processor for a Thread When There Are Idle Processors
When a thread becomes ready to run, Windows first tries to schedule the thread to run on an
idle processor. If there is a choice of idle processors, preference is given first to the thread’s
ideal processor, then to the thread’s previous processor, and then to the currently executing
processor (that is, the CPU on which the scheduling code is running).
To select the best idle processor, Windows starts with the set of idle processors that the
thread’s affinity mask permits it to run on. If the system is NUMA and there are idle CPUs in
the node containing the thread’s ideal processor, the list of idle processors is reduced to that
set. If this eliminates all idle processors, the reduction is not done. Next, if the system is running
hyperthreaded processors and there is a physical processor with all logical processors
idle, the list of idle processors is reduced to that set. If that results in an empty set of processors,
the reduction is not done.
If the current processor (the processor trying to determine what to do with the thread that
wants to run) is in the remaining idle processor set, the thread is scheduled on it. If the current
processor is not in the remaining set of idle processors, it is a hyperthreaded system,
and there is an idle logical processor on the physical processor containing the ideal processor
for the thread, the idle processors are reduced to that set. If not, the system checks whether
there are any idle logical processors on the physical processor containing the thread’s previous
processor. If that set is nonzero, the idle processors are reduced to that list. Finally, the
lowest numbered CPU in the remaining set is selected as the processor to run the thread on.
Once a processor has been selected for the thread to run on, that thread is put in the
standby state and the idle processor’s PRCB is updated to point to this thread. When the idle
loop on that processor runs, it will see that a thread has been selected to run and will dispatch
that thread."

"Choosing a Processor for a Thread When There Are No Idle Processors
If there are no idle processors when a thread wants to run, Windows compares the priority of
the thread running (or the one in the standby state) on the thread’s ideal processor to determine
whether it should preempt that thread.
If the thread’s ideal processor already has a thread selected to run next (waiting in the
standby state to be scheduled) and that thread’s priority is less than the priority of the thread
being readied for execution, the new thread preempts that first thread out of the standby state and becomes the next thread for that CPU. If there is already a thread running on that
CPU, Windows checks whether the priority of the currently running thread is less than the
thread being readied for execution. If so, the currently running thread is marked to be preempted
and Windows queues an interprocessor interrupt to the target processor to preempt
the currently running thread in favor of this new thread.
If the ready thread cannot be run right away, it is moved into the ready state where it awaits
its turn to run. Note that threads are always put on their ideal processor’s per-processor
ready queues."

The Supreme Commander game *must* be setting its thread affinity when it starts up to one core. A lot of naive programmers do this thinking that it solves all their multi-threading concurrency issues in one fell swoop. How little do they know. The thread that you linked to shows a utility which simply "reverses" the broken thread affinity setting that the game forces upon the kernel thread scheduler. Of course, one must assume the programmers did it for a "good" reason in the first place so I would imagine that stability of the game may be compromised. I.e. it must contain concurrency bugs that they couldn't fix in time for their ship date.

CmdrTobs said:
It's not just that sense of waste. It's waste from the heat in SMT when it's not getting me any faster. Then, what if the 2nd thread was pushed through now your main thread wants the ALU but it has to wait a few ticks?

SMT has come along way since it first appeared on the NetBurst P4 chips. The implementation in Nehalem is far more advanced and actually has many optimisations that take care of certain power saving scenarios. NetBurst was *never* intended to conserve power. It was from a bygone era.

CmdrTobs said:
Again, this makes sense for a P4. I question the saving on a CPU that already has multiple cores. Context switching is only the bottleneck if you have 2 or more hefty threads fighting for time. Assuming a program has say 3 hefty threads (which is a lot for desktop duty) a typical quad-core will save no time from context switching. - With the proviso IF windows schedules properly and puts all the 'semi-idle' processes and thus threads on their own core.(u know what I mean; to have there threads be allocated to that physical CPU core and split time on it)

In short; Multiple cores have killed the context splitting bottleneck. What is your reply to this?

My reply is the same. The ability to schedule two threads to a single real core is still very valuable, even when the whole CPU package itself contains multiple cores. Because it reduces OS overheads of context switching and enables fuller utilisation of available CPU resources for true user-focused work.

A thread is nomadic, it is rarely locked to any specific core or set or cores. (Unless a specific affinity mask has been set but those are for server applications and programmers that are debugging only.) That means a thread can jump between cores, potentially for every time slice it receives. Although the likelihood of this occurring is reduced because Windows like most mainstream OSes will try to keep a thread from switching needlessly between cores because this is expensive in terms of moving its thread stack.

"Semi-idle" is not a technical term. I'm not familiar with it. Can you define what you mean? When I think about thread states I tend to think in terms of the state machine. And a state machine never has any half-way houses like "semi-idle".

drunkenmaster · 23 Jan 2011 at 16:49

Remember that(sorry I'm too tired to read your last post Nathane, you might have covered it) Intel/AMD and by extension MS trying to support them are pushing power saving tech.

Its better for the CPU to load one core highly than 4 cores lightly. Clock gating isn't very efficient but is useful for saving power, if you have to load 4 cores to 25% you'll end up causing them all to use full voltage and full clocks, and turbo speeds will also be limited(at least on Intel). If the workload is enough to fit on a single core thats in turbo mode at its fastest you can both keep it all on a single core, which means other cores can clock down or, finally on the latest CPU's which will include Bulldozer(not sure about Llano) you have power gating which makes loading one core even more preferable over loading more than one core(or one module in Bulldozer's case.

I also personally argue against the idea Bulldozer isn't 2 real cores, largely because its nonsense, the only difference is its not two "as we perceive them" cores.

Firstly Phenom had three decodes, on a three issue wide core, its reduced to a two issue core, but now has four decoders, essentially two for each core. Each core has its own schedualler, and entirely interger core, L2 is shared, but that can have advantages, rather than each core having separately 1mb to add up to 8mb total for a 8mb core, with each individual core only having 1mb cache, you can have the same cache but allow each core to access 2mb each. Its more likely two cores in the same module will require lots of duplicate data in cache, but they'd only need one copy.

Its not about having a not very real core, the modules are about saving die space and efficiency.

As gpu's move up in shaders, and cpu's move up in cores, its simply not feasible to double everything, every time, because its inefficient. Even power gating, which costs transistors, will have half the transistor cost due to modules rather than cores. 8 separate cores, 8 rings of transistors , 4 modules, 4 rings of transistors for power gating.

2mb l2 for every core, or pair them together, and have 4mb l2 per pair of cores, in single threaded situations with one core working hard, you'll end up with more available cache. Again having one lot of cache in one location, than two sets of cache in two locations is simply more efficient.

Also remember that Bulldozer is a massive massive step towards APU's, its got the same raw 2x128bit fpu's as Phenom did per core, this is a purposeful reduction in FPU per interger core, because this architecture will have gpu's on die within a year or so, or a massive FPU beast that is many many times fast than a couple 128bit fpu units coupled together.

You can't just look at a Bulldozer module, then the cores and decide it isn't a real core because some hardware is shared. Cores change, cores evole, efficiency is key and as we move to 8 cores, then 16, then 32, modules will become standard, and eventually have 4 interger cores per module, because it increases efficiency, keeps cpu size manageable and reduces a lot of waste.

Its not marketing speak or gimmicks, its simply where cpu design and the entire industry is going. You can be certain at some stage (maybe Haswell? when they finally more 8 cores to midrange) Intel will introduce efficiency and sharing of resources.

Remember that AMD are moving 8 core to maintrream essentially 18 months before Intel plan to, and AMD are doing it at 32nm while Intel are doing it at 22nm. We WILL see the same changes from Intel.

http://www.anandtech.com/show/3863/amd-discloses-bobcat-bulldozer-architectures-at-hot-chips-2010/5

Fetch and instruction cache might be similar, but that really doesn't mean a smegging thing. Every fetch, every instruction comes from memory through the same memory controller, so every core shares hardware at SOME stage along the line, if you increase the fetch logic so it can draw twice as many instructions through just as efficiently, or have to separate half as efficienty fetch units, what difference could it possibly make, none. You HAVE to intergrate some things because at some stage its just too much independant logic for absolutely no reason.

If they didn't have this two CORES in a module design, each core would cost 100% more die space per core, instead of 5% more die space. Which would most certainly mean AMD couldn't fit 16 cores at 32nm.

Consider the fact that Intel launched sub £200 quad cores on 65nm, and it won't be till 22nm, end of 2012 before they do the same for octo cores and realise that the same pattern would mean 16 cores in the mainstream wouldn't be viable till well, a process no one knows is possible yet

Bulldozer is two REAL cores per module, its as simple as that, just because its not a doubling of every single piece of logic doesn't mean anything, its inefficient to double up logic you don't need to double up and look at Nvidia and what happens when you're unwilling to make any sacrifices or changes for years, and push for the biggest possible size every process can produce, heat, power, delays and loss of money.

zoomee · 23 Jan 2011 at 17:29

Wow! - excellent discussion going on here fellas. Did you hear that sound? woooooosshhh - most of it flew above me head

lol

Let me throw something into the equation (forgive my stupidity!) - If as we all know - most compilers are intel favouring (possibly hence supporting HT better), how will this impact on bulldozer? Bulldozer will still have to run on intel-optimised software so will we actually see any benefit at all, IF there is software input required to use the 'modules' efficiently....Maybe if AMD started building their own compilers we would see better use of current technology (let alone next gen stuff), instead of relying on intel's 'optimised' compilers. Please correct me if I'm barkng up the wrong tree here. - just wondering if theres any loss of performance due to the 'compiler' fiasco mentioned not too long ago.

edscdk · 23 Jan 2011 at 17:49

dave_beast said:
An easy way to explain it whilst remaining accurate, is to imagine a worker at a toll booth taking tolls from vehicles on a single track, he can process more but is reliant on the single track providing the customers, hyperthreading increases the tracks to two, and the worker at the toll booth has the approaching vehicles staggered so he can collect from either side, effectively doubling the rate.

this is wrong, it does not double performance,

imagine if the worker could take the toll from 100 people per hour (this is his maximum workign speed),

due to the drivers not driving past in an orderly way he only manages to get 85 tolls per hour...

now if you open two lanes one either side, the worker can compensate for the drivers lagging behind or bunching up by taking the toll from the drivers on which ever side has someone ready to pay...

He is now able to take 98 tolls per hour, however he will never be able to take 101 tolls per hour.

ATIorNvidia · 23 Jan 2011 at 22:08

CmdrTobs said:
Liar. You're back tracking now. Nobody said Bulldozer had HT (SMT). It was clearly stated it has CMT, a technical step up, NOT simply a derivative of SMT which is what you are implying I said because you can't find quotes of me saying any such thing.

I wastefully took the time to lay it out to you in one detailed post showing you where you were going wrong and what your mistakes were and you responded with ignorance. Go back and look at my precise language. Please stop replying too. Skin saving adds nothing to the debate. It was clear what you meant and you were wrong. Glad you engaged in some google to inform yourself (a little bit) and added a veiled retraction. Better you did that before you embarrassed yourself and derailed a thread.

Are you getting angry? :rolleyes:

Somebody else did say Bulldozer has a form of HT. As I said originally from my post earlier in the thread, Bulldozer has 2 cores and 2 threads in each module In post #58 you clearly disagreed with me. Therefore you were implying I was wrong about a 4 module BD CPU having 8 cores and 8 threads. Get it now?

Saving skin? You completely avoided answering my question because you know deep down you're incapable of answering it.

ATIorNvidia said:
Show me any programs that uses more than 4 cores and doesn't benefit from HT, until then you don't really have a case.

*Whistles*

jakspyder · 23 Jan 2011 at 22:54

Keep discussion fresh ATIorNvidia dont start the flame war

CmdrTobs · 24 Jan 2011 at 04:00

NathanE said:
You shouldn't form your own opinions based upon content written by the blind leading the blind on Internet forums trying to discuss things of which they have no understanding.

I specifically presented that tidbit as personal experience with problems on windows, not as some sort of sourced trump all proof. It's not blind leading the blind it's experience sharing. You share the positive. You share the negative. You should not be providing any guarantees for closed source software you didn't code. These are the same people who pretended Windows was safe for years till MS Blast woke them up, that is literally the blind leading leading the blind.You can take any view on closed source and in the end it will be your opinion and experience, you can't get facts on closed source. You can point to supreme Commander, but Chris Taylor can point right back. And no books like "Windows Internals" don't document bugs.

Ok lets get back on track and answer specifics:

This topic is a Hyperthreading and pros cons on a desktop machine, not explicit points on scheduling and your quotes told us little about what I said. Lets put some context of what has been said back.

CmdrTobs said:
I thought 2+ core CPU's ended that bottle neck for intensive threads as you can assigning the hundreds of semi-idle threads to one core. (provided windows scheduler works well - a whole other debate)

I generally have on prior pages taken a sceptical view of window's sheduler logic to avoid the HT situations I describe that result in slow down.

Horses mouth:

Disable hyper-threading technology

Under heavy computing workloads, hyper-threading technology, which allows a single processor to appear as two processors, may cause poor server performance. Because workload is dynamic and can vary quickly, we recommended disabling hyper-threading on the physical server to prevent this potential problem. - http://msdn.microsoft.com/en-us/library/cc708332(v=ws.10).aspx

Ok, this had no date, I think it's post 2003 so that's still the Vista kernal. Microsoft may have revised this device with the I7. Although for a server, many home applications today today fit this bill. (I remember HT sucking hard on busy IRC servers in the P4 era at university)

Moving to "Windows Platform Design Notes
Design Information for the Microsoft® Windows® Family of Operating Systems
Windows Support for Hyper-Threading Technology"

Brief quotes but you posted a link to this so you've read it through.(I have this in my MSDN folder)

So scheduling a thread onto an HT processor that already has an active logical processor has the following effects:
o Slowing down the performance of that active logical processor
o Limiting the performance of the new scheduled thread on the second logical processor

• The good news is that, for multi-threaded applications,the sum of the performance of these two threads will typically be better than the performance of a similarly equipped non-HT processor.

Windows Server 2003 family and Windows XP has been modified to identify HT processors and to favor dispatching threads onto inactive physical processors wherever possible.[/I]

Now my comments on the highlights Respectively.

nathanE said:
Originally Posted by NathanE
But no, if the CPU can find empty pipelines into which it can inject parts of the other thread. Then this will *always* result in performance gains in total.

I then said

Cmdrtobs said:
You see, this time you added the key remark. "in total". What a user calls "performance" is rarely ever the raw computations of the system in that moment. It's the performance of the application in focus. When someone runs 3D mark they want that to run it as fast as possible,thats their right measure of 'performance'. If they drop a few points (due to blocks) but windows manages to send out a few network discovery packets and thus over all do more in that period. -People don't call that better performance. Do you understand why I wrote what I did now?

Which is a practical example of the scenario Microsoft lays out above and comes to the conclusion that agrees with you, yes overall performance may increase, but STILL agrees with my core point I have drilled for pages now that in the thread you are focusing it can and does cause lower performance.

Show me your area of contention with this?

2nd issue:

You said:

NathanE said:
Thread context switching as a result of time slicing the CPU is expensive. And whilst it occurs no "real" work is being done. Therefore if the CPU exposes 2 virtual cores it allows the kernel to schedule two threads to the CPU. Even if the CPU can't actually handle both truly concurrently the kernel at least has "enqueued" the next thread thus removing a critical performance bottleneck.

Then reiterated a few times, another time was to respond to me in which you said this:

NathanE said:
It allows 2 threads to be scheduled on the same CPU core - primarily to help reduce the OS overhead of context switching. Then the CPU core decides how to best inject their work into its pipelines.

but MS in there HT paper says

To take advantage of this performance opportunity, the scheduler in the Windows Server 2003 family and Windows XP has been modified to identify HT processors and to favour dispatching threads onto inactive physical processors wherever possible.

Which shows MS does not the think there worth while gains in avoiding context via splitting by using a a physical and HT core.

I've said on the matter to druken master:

CmdrTobs said:
The right approach for me is to prioritise giving me more real cores

Which appears to be Microsoft take when scheduling, running up wind to your claims. (your not wrong in theory). The conclusion of the scenario on page 11 (correct me I am wrong is) is it's better to ignore the HT processor and crack open a new physical core where possible.

[Remembering most core pairs share L2 so you can get the boon of threads inheriting a favourable cache sate was dispatched without HT. Your 'windows internals' agree it's better to use the 'peas in a pod core'. ]

Now nobody please say "but then you will run out of Ht or physical cores quicker if you don't have HT!" because my whole argument is that we are dealing with desktops who's bulk program library have 1 - 3 demanding threads. Normally the former. To quote my opening post:

CmdrTobs said:
steer clear of Hyper-threading if you overclock use real-time/user time applications(aka everything you want to feel fast). Get hyper-threading if you like encoding video and unzipping files as fast as possible.

Seems just as true now as did as it did 4 pages ago, disagree, show me why and how?

I was going to wade ino the quagmire of justifying my own prior opinions on windows scheduler being bad, which you disagree - I just can't be arsed. Stringing this together was ball ache enough.Trust, the Linux kernel is better. (and I have no special love for the linux kernel)

In short. Please please, respond to the actual veracity of my statements. Not what you think I know or don't know.

Don't take this as an insult but; Quoting walls of texts from books with no specific point other than the implicit subtext of "Your wrong! heres the truth" makes you look a vitriolic and priggish. Which is sad since closely rereading your text over again I don't think we are saying anything mutually exclusive. The devils in the details. Meet me half way?

NathanE said:
Here's a couple links to get you started:

HT: http://download.microsoft.com/downl...ae-9272-ff260a9c20e2/Hyper-thread_Windows.doc
NUMA: http://msdn.microsoft.com/en-us/library/aa363804(v=vs.85).aspx

"Semi-idle" is not a technical term. I'm not familiar with it. Can you define what you mean? When I think about thread states I tend to think in terms of the state machine. And a state machine never has any half-way houses like "semi-idle".

Perhaps I should have been more clear and say semi-idle processes (that typically contain 1-3 thread that are low priority and that are typically sleeping.

I describe them as semi-idle as such because:
- as mentioned low priority
-contain long sleep states
-whos execution within a window of 10000's of ms make no difference to any perception and user measure of performance.

BUT, they are not ALWAYS idle and so windows will wake them without user intervention (hence the semi part).

I did not want to simply call them 'background tasks' either as you can have a process whose threads will only respond to a user triggered event. (plugging in a USB periphery) Which would have contradicted the scenario I was posing as it by definition of being an user initiated action it would be a subject of performance...

I hoped you would be in the zone and understand in the context of the scenario I was building up.

Since you are interested in the "State machine" could you shed some light on Vanderpool and thus the performance difference between the 2600 and 2600K in VM.

I am going stick my thumb in the air and say 2600K fail. (which is intels point I think, to make enterprise by locked CPU's)

Thanks.

CmdrTobs · 24 Jan 2011 at 04:36

drunkenmaster said:
Remember t....

I am not a Luddite! I am not disagreeing with technical rational. I am just stating what I know would run faster for me and most other desktops everyday. A choice Intel lets you make with a range of chips.

Sure we approach omeic limts, purity limits, deposition process, current density limits all sorts of limits and we can't have the same chips we have today but just at 20GHz in 5years time.

So your preaching to the converted. Some who's trained profession is in semiconductors. (not for logic though)

Lets be clear on one thing, a bulldozer module is still a core!!!!! Maybe 'Module' will replace the current vernacular one day....

When the Core2Duo started the mass regin of the word 'core' to popularly mean 1 complete, thread executing unit. It displaced die, Remember die? when one CPU = 1 die and we used to go on all day long about die's?

For the time being, I think it's confusing to change now because then we start to call a quad core bulldozer an octocore octo thread executing thingy. Then we still would call a new I7 octo cores and thus 16 executable threads a octcore.

Then we will watch how the FPU of the intel in ecoding benches ripps apart the bulldozer and come to the bull**** conclusion that somehow per core the intel is miles better when it may not be.

If you want to call them something else may I propose 786dx or sx perhaps to show 1 less FPU than Integer? This will help keep consistency with the naming convention on current products.

But for now, Bulldozer is not out and we are not changing to module!. Marketing can GTFO

1 core dual continuous multi-thread enabled.

NathanE · 24 Jan 2011 at 10:48

I specifically presented that tidbit as personal experience with problems on windows, not as some sort of sourced trump all proof. It's not blind leading the blind it's experience sharing. You share the positive. You share the negative. You should not be providing any guarantees for closed source software you didn't code. These are the same people who pretended Windows was safe for years till MS Blast woke them up, that is literally the blind leading leading the blind.You can take any view on closed source and in the end it will be your opinion and experience, you can't get facts on closed source. You can point to supreme Commander, but Chris Taylor can point right back. And no books like "Windows Internals" don't document bugs.

And it may well be your personal experience. I'm not denying that that Supreme Commander game is clearly badly written. What I said was that your assumption that because some XYZ game doesn't work well with multi-threading (at least by default) must therefore imply that Windows is crap. This is not the case. Your assumption is false. Read on.

This isn't rocket science. It's computer science. I know precisely how Windows works in these regards. You don't need to view the source code in order to figure it out. A short session with Process Explorer is usually sufficient. It helps to be a programmer familiar with the lowest levels of the NT kernel as well.

Who is Chris Taylor?

There are no thread scheduling bugs in Windows. Everything is by design.

I generally have on prior pages taken a sceptical view of window's sheduler logic to avoid the HT situations I describe that result in slow down.

Yes you have taken many pointless and unwarranted snipes at Windows in general. Repeatedly referring to its closed source nature. And accusing it of having bugs in one of its most critical kernel components: the thread scheduler. Other than a crappy little utility written by some fan on a forum for some crappy game (technically speaking, I don't give a hoot about its gameplay), I'm still yet to see any real evidence from you to backup your outlandish claims.

Disable hyper-threading technology

Under heavy computing workloads, hyper-threading technology, which allows a single processor to appear as two processors, may cause poor server performance. Because workload is dynamic and can vary quickly, we recommended disabling hyper-threading on the physical server to prevent this potential problem. - http://msdn.microsoft.com/en-us/libr...(v=ws.10).aspx

Here again you are showing a supreme level of ignorance. Yes an MSDN article states that Virtual Server (a long obsoleted product) does not play well with Hyperthreading. Is it really any surprise that a Generation 1 virtualisation product didn't like to be run on Hyperthreading? No not really. VMware had the same issues many many years ago. Thankfully such things are of no concern today. Microsoft's Hyper-V (the successor to Virtual Server) supports HT with no problems. In fact the documentation explicitly states that it should be enabled.

http://windowsitpro.com/article/articleid/101631/q-does-hyper-threading-affect-hyper-v.html said:
The new four-core Intel Core i7 processor enables hyper-threading, which splits each processor core into two virtual cores to (potentially) improve performance.
The concern with Hyper-V and hyper-threading is that you assign a number of processor cores to each virtual machine (VM). Imagine that you assign one processor each to two guest VMs from the Hyper-V management console, thinking that each is going to use a separate core. What if the hypervisor assigns each of the VMs to the same physical core, with each getting a virtual core? You'd potentially get lousy performance and three physical cores not doing much, where you'd have liked each VM to get its own physical core.
Fortunately, this isn't the case. Microsoft has done a lot of work around Hyper-Threading and Hyper-V. Essentially, while Hyper-Threading will aid performance sometimes, it will never hurt performance, so Hyper-Threading should be enabled.

There are however other server/enterprise products (BizTalk is one) whose documentation will state to turn off HT. This is done because either the product's thread affinity algorithms were written by a naive intern and the vendor hasn't had a chance yet to fix it. Or because, quite simply, they've not tested the product and therefore don't want to support such environments.

Ok, this had no date, I think it's post 2003 so that's still the Vista kernal. Microsoft may have revised this device with the I7. Although for a server, many home applications today today fit this bill. (I remember HT sucking hard on busy IRC servers in the P4 era at university)

The HT algorithms in the scheduler undergo improvement in every release of Windows NT. NT 5.1 was the first kernel to implement support. 5.2 improved it, particularly for server workloads. 6.0 refined it further. I really doubt an IRC server had any issues with HT.

Which is a practical example of the scenario Microsoft lays out above and comes to the conclusion that agrees with you, yes overall performance may increase, but STILL agrees with my core point I have drilled for pages now that in the thread you are focusing it can and does cause lower performance.

If the CPU package as a whole is near 100% utilisation then yes. Hyperthreading in this scenario will potentially have the effect of slowing down the ultimate throughput for a particular thread. But only because hyperthreading is doing its job of allowing a more equal allocation of CPU resources between multiple threads. This shouldn't be a problem in the real world because we have something called "thread priorities". On Windows, the actively focused window (such as a game) always has a slightly higher priority than a non-focused window. Therefore a game is allowed to, within reason, preempt other threads in order to get CPU resources. But furthermore, very few games nowadays hammer the CPU enough to get it to 100% utilisation levels (especially not an i7); so again this shouldn't really be a problem in the real world.

but MS in there HT paper says

To take advantage of this performance opportunity, the scheduler in the Windows Server 2003 family and Windows XP has been modified to identify HT processors and to favour dispatching threads onto inactive physical processors wherever possible.

Which shows MS does not the think there worth while gains in avoiding context via splitting by using a a physical and HT core.

Which is the same behaviour that Linux uses (and hopefully any operating system that implements HT support in its scheduler). Go read the source code! It would be totally foolish to blindly schedule a thread onto a logically HT core when there is a perfectly good "real" core sitting idle. And just to be perfectly clear: The last Windows OS that actually did this was Windows 2000, NT v5.0. But that's because it was released before Intel came along with NetBurst's hyperthreading.

Which appears to be Microsoft take when scheduling, running up wind to your claims. (your not wrong in theory). The conclusion of the scenario on page 11 (correct me I am wrong is) is it's better to ignore the HT processor and crack open a new physical core where possible.

[Remembering most core pairs share L2 so you can get the boon of threads inheriting a favourable cache sate was dispatched without HT. Your 'windows internals' agree it's better to use the 'peas in a pod core'. ]

Now nobody please say "but then you will run out of Ht or physical cores quicker if you don't have HT!" because my whole argument is that we are dealing with desktops who's bulk program library have 1 - 3 demanding threads. Normally the former. To quote my opening post:

Where did I say otherwise?

Any thread scheduler that has properly implemented HT support will still prioritise the scheduling of threads onto "real cores" rather than scheduling them onto HT cores. This is a fact. Windows does this. Linux does this. Big whoop. It's a few lines of code to do this, no biggie.

Seems just as true now as did as it did 4 pages ago, disagree, show me why and how?

I was going to wade ino the quagmire of justifying my own prior opinions on windows scheduler being bad, which you disagree - I just can't be arsed. Stringing this together was ball ache enough.Trust, the Linux kernel is better. (and I have no special love for the linux kernel)

In short. Please please, respond to the actual veracity of my statements. Not what you think I know or don't know.

Don't take this as an insult but; Quoting walls of texts from books with no specific point other than the implicit subtext of "Your wrong! heres the truth" makes you look a vitriolic and priggish. Which is sad since closely rereading your text over again I don't think we are saying anything mutually exclusive. The devils in the details. Meet me half way?

Steer clear of HT if you overclock or use "real time / user time" applications? Answer: Still no.

Steer clear of HT if you use certain very specialist (and obsolete) enterprise applications such as Virtual Server or BizTalk? Short Answer: Yes. Full Answer: No, upgrade to modern software.

If you're running a benchmark such as 3DMark just set its process to the highest priority. That will totally evaporate all and any concerns you could possibly have with HT. And then, after the benchmark is finished, revel in the fact that the rest of your system is running at full pace again. On an i7 as much as 20 to 30% of its performance comes from HT when given a heavily multi-threaded workload. So it definitely isn't something you want to turn off at a BIOS level.

Quoting a few small excerpts from the very very popular Windows Internals book by Mark Russinovich was required. When somebody is blindly attacking Windows for being closed source and having bugs in its thread scheduler when they clearly don't have all that much of a clue; the only thing one can fall back onto is cold hard facts. The fact that you're attacking this book material as being "vitriolic" is rather telling of your true agenda in this.

If you actually read the Linux source you'll see that it has almost identical behaviour to Windows NT as what Mark describes in his book. As I said before, it really isn't rocket science this. It's computer science. It might seem all magical and mystical to you how this stuff works under the hood. But to me it is clear as day. It is often these "magical and mystical" incomprehensions that breed the corrosive mindset of "Oh Microsoft just CAN'T have possibly got it right and Linux, given its written purely by full time geeks, simply MUST have got it right... therefore I will blindly throw all my weight behind Linux!".

You really shouldn't place such weighting on Linux. It is incredibly fragmented. Hell it doesn't even have "one" thread scheduler. It has many. Many many many. All different peoples interpretations of what is and isn't important when it comes to scheduling threads. There was a time not many years ago when Linux couldn't really play a video whilst extracting a compressed archive. It just simply couldn't do it. Even if you had sufficient hardware. I'm sure this is fixed by now. But there should be no mistake that thread scheduling on Linux is still a very very hot topic and they often make large changes to it in every other release. They just can't make their minds up nor get to a point where everybody is happy with the way it allocates priority and resources to different types of applications. Microsoft solved this years ago by introducing something called the "Multimedia Class Scheduler" which is a special background service in Windows that provides out-of-band hints to the thread scheduler as to what sort of service-level a process is expecting at the current point in time. It is why a Vista / W7 machine can, given the appropriate hardware, play a game of COD:MW whilst on another monitor have a BluRay video playing. Not many operating systems today can claim to have an out-of-the-box-tested-and-stable thread scheduler advanced enough to do that. Hell many operating systems simply don't have a windowing system and/or hardware graphics acceleration that is capable of doing it, let alone a thread scheduler.

The source code for Linux's thread scheduler is available for all to read here: https://github.com/mirrors/linux-2.6/blob/master/kernel/sched.c
It takes about 3 minutes before you realise it functions in an almost identical way to what Mark describes in Windows Internals. Hyperthreading implementations tend to be fairly boilerplate, i.e. there is usually only one way to skin a cat.

Perhaps I should have been more clear and say semi-idle processes (that typically contain 1-3 thread that are low priority and that are typically sleeping.

I describe them as semi-idle as such because:
- as mentioned low priority
-contain long sleep states
-whos execution within a window of 10000's of ms make no difference to any perception and user measure of performance.

BUT, they are not ALWAYS idle and so windows will wake them without user intervention (hence the semi part).

I did not want to simply call them 'background tasks' either as you can have a process whose threads will only respond to a user triggered event. (plugging in a USB periphery) Which would have contradicted the scenario I was posing as it by definition of being an user initiated action it would be a subject of performance...

I hoped you would be in the zone and understand in the context of the scenario I was building up.

Since you are interested in the "State machine" could you shed some light on Vanderpool and thus the performance difference between the 2600 and 2600K in VM.

I don't think a few threads that have an idle or low priority are of any concern. They won't be able to impact upon a higher priority process/thread such as a game (or hell, ANY foreground window).

State machine is a computer science or even just mathematical term for thinking about transitive state concepts. It has nothing to do with Virtualisation.

CmdrTobs · 29 Jan 2011 at 14:04

NathanE said:
What I said was that your assumption that because some XYZ game doesn't work well with multi-threading (at least by default) must therefore imply that Windows is crap.

No actually. The sentiment was HT is 'crap' but I warned about situations how windows could make it worse as part of a balanced view point. Which brings me back to supreme cmdr, windows jumps it's threads around upon alt tabbing. That whole tools use can be circumvented by alt tabing a numbering times which is the inconsistent threading behaviour. It's not a great leap that this could have extra consequences for multi-threading.

NathanE said:
This isn't rocket science. It's computer science. I know precisely how Windows works in these regards. You don't need to view the source code in order to figure it out. A short session with Process Explorer is usually sufficient. It helps to be a programmer familiar with the lowest levels of the NT kernel as well.

Who is Chris Taylor?

There are no thread scheduling bugs in Windows. Everything is by design.

Wrong tool there. To know how windows *precisely* works you need a dissembler not process explorer, but keep digging.

NathanE said:
Yes you have taken many pointless and unwarranted snipes at Windows in general. Repeatedly referring to its closed source nature. And accusing it of having bugs in one of its most critical kernel components: the thread scheduler. Other than a crappy little utility written by some fan on a forum for some crappy game (technically speaking, I don't give a hoot about its gameplay), I'm still yet to see any real evidence from you to backup your outlandish claims.

Jesus Christ "snipes." No specific quotes for me then? Is that because you are blowing things out of proportion to fit this narrative you have been pushing? I think skepticism is healthy in windows' case (as is with all closed source unknowns), and I say that having written proof of concept code for windows.

Please quote some of these 'outlandish' claims I have made? Or is this more of your tin?

NathanE said:
Here again you are showing a supreme level of ignorance. Yes an MSDN article states that Virtual Server (a long obsoleted product) does not play well with Hyperthreading. Is it really any surprise that a Generation 1 virtualisation product didn't like to be run on Hyperthreading? No not really. VMware had the same issues many many years ago. Thankfully such things are of no concern today. Microsoft's Hyper-V (the successor to Virtual Server) supports HT with no problems. In fact the documentation explicitly states that it should be enabled.

No I remember a lot of products did not run well with hyper-threading especially many high end servers. (Google for dozens of articles ) and to be fair I did qualify that statement with it's age and application in relation to the permanences on that kernel then, but don't let that stop you from misrepresenting me.....

NathanE said:
The HT algorithms in the scheduler undergo improvement in every release of Windows NT. NT 5.1 was the first kernel to implement support. 5.2 improved it, particularly for server workloads. 6.0 refined it further. I really doubt an IRC server had any issues with HT.

.... Which was my point you would have understood if you weren't so bellicose. Remember my main point was about hypther-threading and in the context I laid out (in a overclokers desktop) it's not *worth* it. I balanced my negative view on HT at the hardware level by saying windows might not be up to scratch until you interjected and said unequivocally said 'windows is perfect'. Now YOU are telling me it's improving all the time? Which is it mate?

Yes it DID gimped an IRC sever. It tended to gimp a lot of the general multi-threaded asynchronous socket server designs at high load. This is all old hat:
http://news.cnet.com/Does-hyperthreading-hurt-server-performance/2100-1006_3-5965435.html

NathanE said:
If the CPU package as a whole is near 100% utilisation then yes. -- very few games nowadays hammer the CPU enough to get it to 100% utilisation levels (especially not an i7); so again this shouldn't really be a problem in the real world.

Which is the sort of response you should have made in the first place to contribute to the debate. NOT paste a wall of text of little relevance to the under the guise of correction for things nobody disagreed with..

It's a good and valid point about games not utilising 100% CPU time in 'normal' running. I wish you could have said this instead of what you did say. We could have contuned a constructive debate. What is most tragic is that we are not in real disagreement. Your just too bellicose to see it.

NathanE said:
Which is the same behaviour that Linux uses (and hopefully any operating system that implements HT support in its scheduler). Go read the source code! It would be totally foolish to blindly schedule a thread onto a logically HT core when there is a perfectly good "real" core sitting idle. And just to be perfectly clear: The last Windows OS that actually did this was Windows 2000, NT v5.0. But that's because it was released before Intel came along with NetBurst's hyperthreading.

"It would be totally foolish to blindly schedule a thread onto a logically HT core when there is a perfectly good "real" core sitting idle"

Which was my CORE point I have made on EVERY single page. Most home applications are essentially single threaded anyway(please don't do your favriote trick of misrepresenting me by talking about UI threads etc... you know exactly what I mean)

My point was for the 50th time: In the desktop overclocking world we have all the execution cores we want atm satisfied by physical cores. Hyperthreading in general becomes superfluous and negative due to heat cost.

NathanE said:
Where did I say otherwise?

EXACTLY You didn't. You were saying I'M saying otherwise. When I question the importance of context switching on a desktop that can make the same savings multicore.

NathanE said:
Steer clear of HT if you overclock or use "real time / user time" applications? Answer: Still no.

More misrepresentation. If you go back read from the OP and read the back and fourths and evolution of the debate we were ALWAYS talking about I5 vs I7 on this matter: Is buying an I5 and overclocking rather than a I7 and turning hyper-threading off to overclock as hypthreading will not boost your performance more than OCing on day to basis.

but here your phrasing it as if the argument was made in a vacuum of zero considerations evo though I have repeated these conditions from the get go over and over.

READ THE THREAD FROM OP.

NathanE said:
Steer clear of HT if you use certain very specialist (and obsolete) enterprise applications such as Virtual Server or BizTalk? Short Answer: Yes. Full Answer: No, upgrade to modern software.

No, be honest. You agree with me. Steer clear of hyper-threading if you use desktop applications and are an overclocker. In this light HT ain't worth it. Unless you are going to make the claim that Hyper-threading is better for day to day desktop performance than the overclock available on a sandy bridge. This is something we can disagree on vehemently if you do?

NathanE said:
If you're running a benchmark such as 3DMark just set its process to the highest priority. That will totally evaporate all and any concerns you could possibly have with HT. And then, after the benchmark is finished, revel in the fact that the rest of your system is running at full pace again. On an i7 as much as 20 to 30% of its performance comes from HT when given a heavily multi-threaded workload. So it definitely isn't something you want to turn off at a BIOS level.

.... Again unless you have the heat to spare (i.e don't OC). Which I outlined on PAGE 1 of this topic. Again, are willing to say those speculative 30% gains can be reached in even 1% of desktop applications?

NathanE said:
Quoting a few small excerpts from the very very popular Windows Internals book by Mark Russinovich was required. When somebody is blindly attacking Windows for being closed source and having bugs in its thread scheduler when they clearly don't have all that much of a clue; the only thing one can fall back onto is cold hard facts. The fact that you're attacking this book material as being "vitriolic" is rather telling of your true agenda in this.

The material is not vitriolic YOU were being vitriolic . The popularity of the book is irrelevant. You quoted without context, point or contention to in order to run a subtext that was vacuum clear. You got called.

NathanE said:
If you actually read the Linux source you'll see that it has almost identical behaviour to Windows NT as what Mark describes in his book. As I said before, it really isn't rocket science this. It's computer science. It might seem all magical and mystical to you how this stuff works under the hood. But to me it is clear as day. It is often these "magical and mystical" incomprehensions that breed the corrosive mindset of "Oh Microsoft just CAN'T have possibly got it right and Linux, given its written purely by full time geeks, simply MUST have got it right... therefore I will blindly throw all my weight behind Linux!".

*Sigh*, at that point I said linux kernal . The argument expanded by that point to whole kernels (proably due to one of your charges) . It's impossible to blindly follow something that is open.

Linux kernel by OpenSource nature recives far more updates than a closed source kernal this is fact. This automatically makes it more adv. I don't think you actually disagree with me. Again, it's more of your subtext to attempt to discredit ppl as being part of a 'heard' and yourself as some teacher via CTRL-V lectures. That canard is dead.

NathanE said:
You really shouldn't place such weighting on Linux. It is incredibly fragmented. Hell it doesn't even have "one" thread scheduler.....

There we go again. More of your narrative and subtext about what others must think or understand rather than a point.

NathanE said:
I don't think a few threads that have an idle or low priority are of any concern. They won't be able to impact upon a higher priority process/thread such as a game (or hell, ANY foreground window).

They were a concern in the scenario I was outlining about end users performance with Hyperthreading. Maybe be OTT but by then I was well aware of your fastidious argument style that jumps on uncrossed T's and undotted i's.

I know personally windows still will put discovery packets out no matter how stressed. I just wanted to clarify these sorts of things in what I was saying. Again no real contention mate.

NathanE said:
State machine is a computer science or even just mathematical term for thinking about transitive state concepts. It has nothing to do with Virtualisation.

Since we were talking about threads I thought you were commenting on the CPU's state as a 'snapshot'.

I asked you to offer info as I presume Vanderpool provides hardware support to allow the Virtual machine to see the CPU in the 'native machine state' keeping the details to the host OS without software interpretation... but anyway you have made no comment.

In Closing: Nobody has posted in a week. So please this lock this thread as it's been derailed. If you even care to respond at this juncture PM is best.

pswfps · 29 Jan 2011 at 15:37

I'm a software developer and can safely say that hyperthreading is extremely useful for enhancing the performance of heavily threaded apps. If I'm running say 16 threads in parallel, within a single application process, the effect of hyperthreading is enormous.

On the other hand, for gaming and regular home use, you might as well have bought an i5 or a Phenom X4.

NeTTy · 30 Jan 2011 at 11:03

Hold my hand up right away I'm no expert and I also didn't read the whole thread (it seemed to degenrate rather quickly into a bitch fight).

From my limited knowledge the best way I can explain HT is to think of it as having lots of marbles (threads in a que) rather than picking up one at a time and placing them into a box with one hand taking the little extra time to pick up two marbles with one hand and place them into the box, it doesn't quite take as long as individually moving them but is still faster than moving them one by one.

I belive the operating system and how it deals with process schedualing and memory management would also play a role in how effective it is.

Again appologies if you are all thirsting for detailed tech speak and if any information isn't 100% accurate but its just how I understand it

A book I found helpful with regard to schedualing and memory management is:
Operating sytems and concepts by Abraham Silberschatz.

Competitor rules

hyperthreading ? is it reallu useful ??????

More options

FMTopfan

FMTopfan

CmdrTobs

CmdrTobs

NathanE

NathanE

CmdrTobs

CmdrTobs

AceTK

AceTK

NathanE

NathanE

drunkenmaster

drunkenmaster

zoomee

zoomee

edscdk

edscdk

ATIorNvidia

ATIorNvidia

jakspyder

jakspyder

CmdrTobs

CmdrTobs

CmdrTobs

CmdrTobs

NathanE

NathanE

CmdrTobs

CmdrTobs

pswfps

pswfps

NeTTy

NeTTy