Multicore Is Bad News For Supercomputers

helmutcheese · 6 Dec 2008 at 16:50

QUOTED :

" Trouble Ahead: More cores per chip will slow some programs [red] unless there’s a big boost in memory bandwidth [yellow].

With no other way to improve the performance of processors further, chip makers have staked their future on putting more and more processor cores on the same chip. Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16‑core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores. The performance is especially bad for informatics applications—data-intensive programs that are increasingly crucial to the labs’ national security function.

High-performance computing has historically focused on solving differential equations describing physical systems, such as Earth’s atmosphere or a hydrogen bomb’s fission trigger. These systems lend themselves to being divided up into grids, so the physical system can, to a degree, be mapped to the physical location of processors or processor cores, thus minimizing delays in moving data.

But an increasing number of important science and engineering problems—not to mention national security problems—are of a different sort. These fall under the general category of informatics and include calculating what happens to a transportation network during a natural disaster and searching for patterns that predict terrorist attacks or nuclear proliferation failures. These operations often require sifting through enormous databases of information.

For informatics, more cores doesn’t mean better performance [see red line in “Trouble Ahead”], according to Sandia’s simulation. “After about 8 cores, there’s no improvement,” says James Peery, director of computation, computers, information, and mathematics at Sandia. “At 16 cores, it looks like 2.” Over the past year, the Sandia team has discussed the results widely with chip makers, supercomputer designers, and users of high-performance computers. Unless computer architects find a solution, Peery and others expect that supercomputer programmers will either turn off the extra cores or use them for something ancillary to the main problem.

At the heart of the trouble is the so-called memory wall—the growing disparity between how fast a CPU can operate on data and how fast it can get the data it needs. Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not. So keeping all the cores fed with data is a problem. In informatics applications, the problem is worse, explains Richard C. Murphy, a senior member of the technical staff at Sandia, because there is no physical relationship between what a processor may be working on and where the next set of data it needs may reside. Instead of being in the cache of the core next door, the data may be on a DRAM chip in a rack 20 meters away and need to leave the chip, pass through one or more routers and optical fibers, and find its way onto the processor.

In an effort to get things back on track, this year the U.S. Department of Energy formed the Institute for Advanced Architectures and Algorithms. Located at Sandia and at Oak Ridge National Laboratory, in Tennessee, the institute’s work will be to figure out what high-performance computer architectures will be needed five to 10 years from now and help steer the industry in that direction.

“The key to solving this bottleneck is tighter, and maybe smarter, integration of memory and processors,” says Peery. For its part, Sandia is exploring the impact of stacking memory chips atop processors to improve memory bandwidth.

The results, in simulation at least, are promising [see yellow line in “Trouble Ahead]. "

Photo: Intel
The Future: Intel’s experi*mental chip has 80 cores.

jimmybates · 6 Dec 2008 at 21:58

Scary stuff

gurusan · 6 Dec 2008 at 22:01

but by that time I wonder if we will have like...quad channel DDR5? Would be awesome

R.Muller · 6 Dec 2008 at 22:03

Photo: Intel
The Future: Intel’s experi*mental chip has 80 cores

yay crysis on max settings!

helmutcheese · 6 Dec 2008 at 22:23

TheRedZealot · 6 Dec 2008 at 23:55

R.Muller said:
Photo: Intel
The Future: Intel’s experi*mental chip has 80 cores

yay crysis on max settings!

GPU limited, we would need 10 GTX980's

OneTime · 7 Dec 2008 at 01:32

80 Cores...wow, just wow...lol

Energize · 7 Dec 2008 at 01:41

iirc it does 1 Teraflop at 3.2Ghz.

Talrinys · 7 Dec 2008 at 11:15

Energize said:
iirc it does 1 Teraflop at 3.2Ghz.

1 teraflop that no one will be able to utilize for 95% of the applications that could be programmed. Single-threaded performance is so much more interesting than these crazy experiments.

muppetkiller · 7 Dec 2008 at 20:24

This chip with all the extra cores would be producing more heat, and require a-lot more power to run.

I know i will never get my hands on something like that but, but the price of electricty i dont think i will lose sleep over it.

rjk · 7 Dec 2008 at 23:21

still wont work well for games

rypt · 7 Dec 2008 at 23:46

muppetkiller said:
I know i will never get my hands on something like that but, but the price of electricty i dont think i will lose sleep over it.

Electricity does not have to be expensive if it is generated by Nuclear stations rather than coal / gas stations.

james.miller · 7 Dec 2008 at 23:48

muppetkiller said:
This chip with all the extra cores would be producing more heat, and require a-lot more power to run..

how do you know that? compared to what? seperate cores allow a lot more control over what can and cant be switched off to save power you know.

Yamahahahahaha · 8 Dec 2008 at 00:27

rypt said:
Electricity does not have to be expensive if it is generated by Nuclear stations rather than coal / gas stations.

Too cheap to metre?

Iirc the 80-core chip uses less electricity than Netburst chips of similar speed, but they lack many of the more powerful functions of modern processors.

Natima · 8 Dec 2008 at 03:04

There must be some kind of solution to all this "It's impossible to code for so many cores" crap.
Maybe a similar solution to Lucid Logix? A seperate chip designed to distribute processing/code over many cores?

That may be impossible in itself, I'm no programmer, But is there nothing anyone can do except learn to multithread more efficiently?

SKILL · 8 Dec 2008 at 08:07

The problem is that lots of algorithms need to process the same bit of data in multiple ways, so it needs to wait for the first bit to be calculated before it can then use the result of that in the next bit. This type of code is inherently serial and can't be multithreaded.

This was 'discovered' by a bloke called Gene Amdahl way back in the 70's (if not earlier), and that gave the term 'Amdahl's Law', basically giving a formula for speedup based on numbers of cores and amount of code that is 'Parallelizable'

So basically what is now happening, was seen about 30 years ago and is one of the reasons we went from big old, many-core, mainframes to individual workstations. Seems we're now back to where we started, have no idea how intel/amd will fix it.
Options I see is ramp up clock speed and stick to sensible numbers of cores, but then they did that with the P4.
Could try a new architecture of cpu, x86 is inefficient in a lot of ways, but making the move to a new architecture would require a lot of effort, and support from MS.
Or could find an innovation in chip making that allows vastly higher clocked transistors without the power problems...

Schnippzle · 8 Dec 2008 at 08:47

I guess its like saying "we'll fit this massive turbo to this engine, sort out a new manifold and loads of other bits....but we wont improve the cooling".

Seems a bit silly tbh.

rypt · 8 Dec 2008 at 11:13

SKILL said:
The problem is that lots of algorithms need to process the same bit of data in multiple ways, so it needs to wait for the first bit to be calculated before it can then use the result of that in the next bit. This type of code is inherently serial and can't be multithreaded.

This was 'discovered' by a bloke called Gene Amdahl way back in the 70's (if not earlier), and that gave the term 'Amdahl's Law', basically giving a formula for speedup based on numbers of cores and amount of code that is 'Parallelizable'

So basically what is now happening, was seen about 30 years ago and is one of the reasons we went from big old, many-core, mainframes to individual workstations. Seems we're now back to where we started, have no idea how intel/amd will fix it.
Options I see is ramp up clock speed and stick to sensible numbers of cores, but then they did that with the P4.
Could try a new architecture of cpu, x86 is inefficient in a lot of ways, but making the move to a new architecture would require a lot of effort, and support from MS.
Or could find an innovation in chip making that allows vastly higher clocked transistors without the power problems...

Yes, but plenty of code that is serial right now can be re-written as parallel given enough skill/time.

SKILL · 8 Dec 2008 at 12:46

Some maybe, but not the majority, there are many algorithms that are simply serial in nature. Best we can do for the majority of things is split into as many seperate data streams/algorithms as possible, and run each of those serial streams on a seperate core, but then how many streams/algorithms are there to actually process for any one thing?

80? doubt that, even 4 is possibly overkill in terms of threads that can actually fully utilise a modern day x86 core in anything other than a few select tasks.

milkinc13 · 8 Dec 2008 at 15:56

"Photo: Intel
The Future: Intel’s experi*mental chip has 80 cores"
can it run Crysis?
serious note, distrobuted computing would go from strength to strength. look at the impact the PS3 has had on F@H and that has 8 cores!, would be like owing 10 of them without scary power bills.

also how many programes take advantage of more then 2 cores, let alone 80