• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Will we see 6Ghz cpu's?

Associate
Joined
14 Feb 2010
Posts
135
Problem is that power consuption increases exponentially with higher clock frequencies. eg, it costs less power to have 2 x 2GHz cores than it does 1 x 4GHz core. Obviously there's also a ceiling on how fast the silicon can switch. So basically it's better to do more per clock than increase clocks/s.
 
Associate
Joined
14 Feb 2010
Posts
135
Yes your correct that's how parallelization is typically done in applications. The problem here is you still need separate sections of code you can allocate to threads.

For example:

An email gateway thats reading emails, and applying some NLP (Natural Language Processing) logic to the email text. You can have a separate thread to receive/process each email. However if there is typically only 8 emails at a time to receive/process then any more then 8 CPU cores would be useless.

.Net 4 framework does have support for multi core programming, however it's SIMD (Single Instruction Multiple Data) vector style processing, not MIMD. MIMD over multiple cores is the next big goal.

I'm certainly not denying the theoretical benefits of MIMD as you describe but in the meantime it's clear to us that most applications aren't making proper use of the current multi-core CPUs around. I agree that separating the the sections of code into logical standalone units suitable for threading is usually difficult and requires development effort. However it can usually be done with proper planning.

Taking your email example as a case in point. I probably would't want to spawn a thread per email as it creates the inefficiencies you outline. Instead it would be better to break down the overall logic of processing an email into discrete sub-tasks and assign a thread to each sub-task. Accepting that a lot of these sub-tasks must be done in correct sequence, each thread puts it's finished product onto some sort of queing mechanism for the next thread to pickup and process. So on until completion. That way, should there be a glut of emails, the workload is evenly spread with multiple cores working together to process the overall job. That would be better than a single threaded process trying to handle everything on it's own.
 
Associate
Joined
23 Aug 2005
Posts
1,273
c = 299,792,458 metres per second
divided by 6,000,000,000
light can travel ~5cm per clock cycle @ 6GHz, but its not light its electricity through silicon, which is slightly slower.
Soo, maybe ;)

Edit: Jim Gray, 62m 40s in - smoking hairy golf balls
 
Last edited:
Soldato
Joined
19 Jun 2009
Posts
3,852
Taking your email example as a case in point. I probably would't want to spawn a thread per email as it creates the inefficiencies you outline. Instead it would be better to break down the overall logic of processing an email into discrete sub-tasks and assign a thread to each sub-task. Accepting that a lot of these sub-tasks must be done in correct sequence, each thread puts it's finished product onto some sort of queing mechanism for the next thread to pickup and process. So on until completion. That way, should there be a glut of emails, the workload is evenly spread with multiple cores working together to process the overall job. That would be better than a single threaded process trying to handle everything on it's own.

Thats exactly how it could be done / we did it, I was giving a simple example to demonstrate :cool:

We put the separate sections of code 'sub tasks' into .net services. Documents (emails) where stored on SQL Server, and they had a status code indicating where in the process pipeline they were. This was all done before documents were either sent into a workflow environment (for human agent), or automatically processed if the entities were found matching auto-process rules.

In tests we could auto-process 5000 documents an hour with all services and SQL Server running on a single Pentium 3 Xeon box. We tested on old hardware as if the application ran fast here, it would have no problems real world on latest tech.
 
Last edited:
Associate
Joined
24 Oct 2009
Posts
138
As cache is needed to 'soften' the speed difference between CPU and memory, i can see memory being integrated onto the same silicon as a CPU in the distant future. So instead of cache you'd literally have your 8GB of registers on the CPU die.

Now THAT would be fast.....
 
Soldato
Joined
5 Jan 2003
Posts
5,001
Location
West Midlands
Well if Bulldozer actually can run a single thread across multiple cores then multicore setups will be far more efficient, to the point where clock speed will matter less and less.

Most single threaded apps dont even make good use of the Core 2 Duo & i7's ability to run 4 instructions from 1 thread in parallel. Thats the nature of single threaded apps, they have too many branches, and dependancies, so there is a limit to how much resources you can throw at them, and hope to get any improvement.

I'll believe that a CPU can do an almost impossible task of splitting a single threaded app onto multiple cores when I see it with my own eyes.

Unlike graphics programming, where a task can easily be split into hundreds of nearly identical operations, and run in parallell over a vast array of execution units, the average X86_64 application has too many dependancys.

Overall, its easier to teach programmers how to write good threaded code, than try to make a hardware fix. This may not be the case with all computing plaforms, but trying to retain X86 compatibilit at the same time.. Tough challenge.

As for bulldozer, people have been talking about "reverse hyperthreading" for at least the last 4-5 years now, and its always going to be "AMD's next thing". Ironically Intel are attempting the same deal, they call it Anaphase, but I wouldnt hold your breath while waiting for it.
 
Last edited:
Soldato
Joined
19 Jun 2009
Posts
3,852
L1 cache hits on a CPU are normally well in excess of 90%, that's before accessing the L2 cache. The CPU registers are even faster access then the L1 cache.

Making main memory the same speed as cache or registers would give little gain, and cost crazy crazy amounts of money.

EDIT: That was in response to the post above the last one.
 
Last edited:
Associate
Joined
30 Sep 2010
Posts
268
Location
London
As someone said up in this thread were going to get to the limit of what silicon can do eventually and something completely new will replace it, similar to whats happening with SSDs vs HDDS. Alternatives are already being used just not commercially, doubt will be seeing such things in 2 years tho.
 
Soldato
Joined
19 Jun 2009
Posts
3,852
http://www.xbitlabs.com/news/cpu/display/20101017170037_AMD_No_Core_Wars_Incoming.html

Kind of relevant, if there aren't going to be 'core wars' then maybe we will get 6-8 core cpus and then have the frequency ramp up again.

He's referring to efficiency (not clock ramp up).

Although here we are talking about increases using multiple cores. Later CPU designs have been more efficient since the 8086.

For example the 8086 has 4 pipeline stages, IFETCH, IDECODE, IEXECUTE, IADDRESS.

The reason why you have pipelines is so you can have parallelization within the CPU core. So going back to that 4 stage 8086, the first instruction can be executing (IEXECUTE), while the second is being decoded, and the third in the IFETCH buffer. Pipeline approach keeps the separate CPU parts more busy. Same as a group of people working on a car production line.

Now jump ahead to the Core2 - this had at least 20 pipelines. So there was much more parallelization happening inside the CPU. This made is much more efficient then previous designs, and why Core2's were so fast relative to their clock cycles.

You begin to see why it's so complex.

1) First parallelization inside the CPU core with pipelines.
2) Then with MMX there is dedicated SIMD units taking load of the floating point - more parallelization (fine grain)
3) Then you have multi-cores, and the OS scheduling a process to the next redundant CPU core. (coarse grain)

Then there is MIMD over multi cores (fine grain) that it looks like Bulldozer is trying to do.

After all this, what is going to schedule all the above? Firmware on the CPU, OS, the compiler, or responsibility of the developer to write optimised threaded apps?

BTW Sorry to keep posting on about this stuff - My final year degree project was in this area 13 years ago, and it's still close to my heart..
 
Last edited:
Back
Top Bottom