• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Kaveri APU Architecture Detailed

Caporegime
Joined
17 Mar 2012
Posts
50,904
Location
ARC-L1, Stanton System
http://wccftech.com/amd-kaveri-apu-...neration-apu-featuring-steamroller-gcn-cores/

http://www.tomshardware.com/news/AMD-Kaveri-APU-Gaming,22947.html

  • CPU: Up to 4 Steamroller cores (2 Modules)
  • GPU: Up to 512 stream processors using new next gen cores
  • Socket FM2+ and possibly not compatible with Socket FM2 mobos due to FM2+'s two extra pin sockets
  • Second AMD processor to use 28nm fab process (AMD Kabini was the first)
  • Heterogeneous System Architecture - unified memory architecture allowing cross sharing of system ram between the GPU and CPU.
  • DDR3 and GDDR5 (also possibly DDR4) integrated memory controller - though likely will be unable to use both memory types in the same system
  • Shipping to manufacturers in late 2013 (retail release expected in 2014).
  • AMD demos an engineering sample of Kaveri at Computex June 2013 (see video below)
Looking at the Sockets pictured in link it does indeed look like this will not fit in an FM2 Socket.

It has the same number of Stream Processors as a 7750, it looks like its also the same GCN Architecture.

GDDR5 Memory controller, does that mean the iGPU will come with onboard GDDR5 Ram?
 
Oh dear if this isn't compatible with FM2 >.<, I always say AMD aren't so great with sockets as people seem to believe.

I wonder what they'll do for the 7750's bandwidth. But I'd be quite happy or a 7750 level of graphics in a CPU for gaming on.

The GDDR5 memory controller, I'd assume would be for onboard memory, possibly to address the video memory bandwidth?
 
FM2+ was revealed a month or so back; FM2 CPUs will work in FM2+ boards, but FM2+ CPUs wont work in FM2 boards.

GCN is as previously promised. Bandwidth is an issue, since current Trinity/Richland tops out at roughly the same bandwidth as a DDR3 6670 - they must have something up their sleeves to overcome this; on-chip memory, most likely but....

...the GDDR5 memory controller is interesting. The on-chip memory (if it carries any) wouldn't be GDDR5 - most likely it would be SRAM as with the Xbone. Maybe we'll see GDDR5 dimms on the market, compatible with FM2+ boards?
 
Snip* :D

I wonder what they'll do for the 7750's bandwidth. But I'd be quite happy or a 7750 level of graphics in a CPU for gaming on.

The GDDR5 memory controller, I'd assume would be for onboard memory, possibly to address the video memory bandwidth?

Thats what i'm thinking, it seems to me the only explination for a GDDR5 controller on an APU.

It should be a pretty big jump up from the old VLIW4 iGPU
 
...the GDDR5 memory controller is interesting. The on-chip memory (if it carries any) wouldn't be GDDR5 - most likely it would be SRAM as with the Xbone. Maybe we'll see GDDR5 dimms on the market, compatible with FM2+ boards?

Why would it be SRAM?
The APU in the PS4 IIRC uses GDDR5?

That's an absolute kick in the balls if Kaveri won't work in FM2.
FM2 was "Future proof" apparently, first they dropped FM1 with it only having Llano, and now FM2 with Kaveri.
 
Why would it be SRAM?
The APU in the PS4 IIRC uses GDDR5?

That's an absolute kick in the balls if Kaveri won't work in FM2.
FM2 was "Future proof" apparently, first they dropped FM1 with it only having Llano, and now FM2 with Kaveri.

PS4 doesn't have on-chip memory, it uses super-fast GDDR5 to deliver its bandwidth. Xbone uses slower DDR3 and makes up [some of] the difference with on-chip SRAM.

The new socket really isn't a big deal; AMD's APUs aren't really aimed at enthusiasts - they're aimed at people who tend not to be interested in replacing CPUs. The enthusiast market is, sadly, pretty small - particularly in the low-end space the FM2 socket fills
 
If i understand it correctly the PS4 uses stacked GDDR5

No, it doesn't, it just has a memory controller on die and memory off die, like a gpu card, its got nothing stacked or on die.

Almost every GPU amd has made for a few years has a ddr3/gddr5 compatible memory controller, its why we've seen the low end cards and sometimes lower midrange with alternative versions with ddr3 for cheaper.

In terms of desktop I would imagine its mostly a case of allowing Kaveri to be sold in different packages, ie one company wants to sell it as a gaming computer in their line up so they pair it with gddr5, higher gaming performance with it not being bandwidth limited, but a small reduction in cpu performance for cpu oriented usage. Another computer will be aimed at every day use, more cpu heavy, workstation style and they pair that with ddr3 for slightly higher cpu performance, lower gaming performance.

On sockets, firstly who cares, secondly you have two options, move forward when new things happen, delay new features and moving the industry going forward for backwards compatibility of products.... there is a reason Windows runs like a dog and mobo's came with worthless(for 99% of people) pci slots, serial ports, IDE connectors for donkeys years after they were dead, because people couldn't move on. I welcome new sockets and new ideas and couldn't give a damn about old socket/mobo support.

Kaveri is the first widely available HSA chip, gddr5 compatible and GCN onboard, I neither expect socket compatibility, want it or think it would be good for anyone.


In terms of stacked memory, stacked would be where you basically have copper traces through the memory so another memory chip above it can directly access the pcb below it, saving space and a lot of power(signals moving cm's on chip and inches off it have hugely different power requirements and speed capabilities). Interposer is where you create what is essentially a small PCB you can plug the chip and memory chips and anything else you want onto it but its on silicon scale with on die chip speeds of communication, interposer should be used before stacking, way before, because its going to be WAY cheaper. What we could see with Kaveri, and gpu's very soon, is an interposer with the chip and some dedicated super high bandwidth memory. One of the key problems with bandwidth off a chip is the pinout, every connection to a memory slot takes up hundreds of pins, going from dual to quad channel takes up space that simple isn't there on a smaller chip, with interposers you have the traces on a minute scale and without any pinout issues, also capable of higher speed and MUCH lower power you can use low power memory with a VERY wide bus(512-1024bit) very easily and get insane bandwidth pretty easily.

Though interposer tech is available(as is stacking) and it was basically expected to appear on some chips this or next year, its more likely its still too expensive and/or yields are too low to see on Kaveri or any gpu's in the next year. But Kaveri with some dedicated on die super wide memory with cheaper slower system memory could make for a very interesting system.
 
No, it doesn't, it just has a memory controller on die and memory off die, like a gpu card, its got nothing stacked or on die.

Sony,AMD and Hynix have worked with Amkor on stacking DRAM and there are AMD documents detailing some of this relationship.

There is a lot of evidence to show that the PS4 uses stacking of some sort.
 
Last edited:
Sony,AMD and Hynix have worked with Amkor on stacking DRAM and there are AMD documents detailing some of this relationship.

There is a lot of evidence to show that the PS4 uses stacking of some sort.

I don't know about that. Sony are going for VFM and a low price point, evident in the launch price vs Xbox One. It's probable that the memory will be stacked on die later in the product life when it's more economical to do so, and not for performance reasons either
 
I don't know about that. Sony are going for VFM and a low price point, evident in the launch price vs Xbox One. It's probable that the memory will be stacked on die later in the product life when it's more economical to do so, and not for performance reasons either

The thing is though the PS4 is a very large SOC and you could argue they might have had better yields going with separate dies. However,despite this they went the SOC route as was probably better in many terms including power consumption and for programming considerations.

There seems to be quite a few hints indicating the use of an interposer at least:

http://webcache.googleusercontent.c...fm+&cd=1&hl=en&ct=clnk&gl=uk&client=firefox-a

http://www.gsaglobal.org/events/2012/0416/docs/3D_Panel.pdf

http://www.i-micronews.com/upload/Rapports/3D_Silicon_&Glass_Interposers_sample_2012.pdf

Page 10 of the second link:

http://livedoor.blogimg.jp/sag_alt/imgs/9/5/95dd2b6d.jpg

The pricing premium does not appear to be large either:

http://www.electroiq.com/articles/ap/2012/12/lifting-the-veil-on-silicon-interposer-pricing.html

:p

Please don't tell you believed all that codswallop about FM2 being 'future proof'. Everything is future proof until they invent something new.

The thing is it is the AMD CPUs which seem to be more future proof with regards to sockets than their motherboards.
 
Last edited:
I actually only just read the other Toms Hardware link and noticed something a bit more interesting.

and the Steamroller cores would feature a 15 to 20 percent improvement in IPC (Instructions per Cycle).
IPC, not "15% performance increase" which turned out to be 5% IPC and 10% clock rate increase (Bulldozer to Piledriver)
That would put it 10 to 15% better than Thuban, about 4 years later, still, better late than never, i guess...
 
Last edited:
I actually only just read the other Toms Hardware link and noticed something a bit more interesting.

IPC, not "15% performance increase" which turned out to be 5% IPC and 10% clock rate increase (Bulldozer to Piledriver)
That would put it 10 to 15% better than Thuban, about 4 years later, still, better late than never, i guess...

We seem to be at a plateaux currently. I suspect Nehalem to Haswell is a similar IPC improvement - and that's nearly 5 years.

Still, 15% jump in one generation is quite an improvement - Bulldozer was a step back, but perhaps the architecture will come good in the long run.
 
IPC, not "15% performance increase" which turned out to be 5% IPC and 10% clock rate increase (Bulldozer to Piledriver)
That would put it 10 to 15% better than Thuban, about 4 years later, still, better late than never, i guess...

"IPC" is used wrongly a lot on this forum at the moment. For a start, since it stands for instructions per clock (or per cycle), it can't be anything to do with "clock rate increase" as you've written. It's due to many factors, like predictor efficiency, pipeline length, and the memory systems.

Also remember that IPC may have little to do with actual computing speed. For example, a CPU with fewer IPC might actually be faster than another one if it can achieve in one operation something that takes the other CPU several operations to do.
 
"IPC" is used wrongly a lot on this forum at the moment. For a start, since it stands for instructions per clock (or per cycle), it can't be anything to do with "clock rate increase" as you've written. It's due to many factors, like predictor efficiency, pipeline length, and the memory systems.

Also remember that IPC may have little to do with actual computing speed. For example, a CPU with fewer IPC might actually be faster than another one if it can achieve in one operation something that takes the other CPU several operations to do.

He means where PD was touted at X percent faster than BD.
But that figure was due to IPC and faster clock speeds.

Whereas this is touted as IPC being X amount faster.
 
Back
Top Bottom