Volcanic Islands Hawaii GPU 20nm 4096SP 256TMU 64ROPs 512Bit

Freddie1980 · 9 May 2013 at 09:09

A 512 bit memory controller means this is going to be one very expensive card to produce.

psychas · 9 May 2013 at 09:14

d_brennen said:
Could end up being anther 2900XT. 512bit memory bandwidth sounds tasty if they keep prices in check

2900xt failed cause it was on the wrong nm process 80 nm when it should have been on 65 nm like the other 2000 series.

ps. i had 2900xt not the best buy ;/

so should not be problem for volcanic islands.

chipachap · 9 May 2013 at 09:26

beasty6 said:
hmmmm

im not so sure, AMD have just released the 7990 and now they may have a card with 7990 power on one chip?

if it is true ill get two!

WingZero30 · 9 May 2013 at 09:38

Marine-RX179 said:
Yea...I still remember the whole hype about "6970 gonna beat GTX580 at a much lower price", and then everyone calling it lemon on the forum when it was eventually released

drunkenmaster · 9 May 2013 at 10:13

So many daft statements in this thread, AMD's 20nm cards aren't and won't ever be an answer to Nvidia's 28nm "refresh" cards. Gpu's can't and won't ever compete across processes at the same die sizes.

Almost everything in the thread is pure speculation, but there are some things worth pointing out.

The ONLY reason people are suggesting x86/arm cores on die is because Nvidia has HINTED that they will have arm cores inside Maxwell or the one after, and I can't remember exactly what the statements are, it actually sounded more like they were just saying the units inside maxwell would be Arm compatible, because they'll be using maxwell shaders inside future Tegra chips.

Either way Nvidia isn't a cpu producer, and never has been, its used ARM in their soc's along with their own gpu's so it makes more sense for them to cannibalise something they have on hand to stick in their gpu's. With AMD, ARM is a very low power and efficient only at extremely low power. Its looking very much like Jaguar is going to be significantly faster than ARM for not much more power usage, for high work loads ARM isn't the most efficient chip around. If AMD try to split off serial and parallel work on die I would be surprised if it was Jaguar or ARM based, its most likely to not be any kind of "CPU" at all, but a very specific set of compute units just like shaders, but designed for a very different goal.

Compute units, are CU's, and they have nothing to actually do with compute, at all.

Its like having an octo core chip, then the next gen is octo core, but each core is vastly improved, but saying they've cut down compute because there is still only 8 cores.

Chips get subdivided as they get bigger. WIth old school 2 pipelines, there isn't any reason to subdivide the chip, when you hit 1600, you can't have the front end contact each shader individually, aside from a routing nightmare on die and huge latency across the chip, you separate the 1600 in to several big blocks, then you separate each big block into smaller blocks. Then after 5870, we had a chip that separated the front end into two halfs, and each gen we'll get more division. It works like a pyramid frankly, when you have 4000 shaders on the bottom rung, and 1 major first control unit at the top, you either have 4000 connections and the most inefficient chip on the planet, or you put in a bunch more levels. At one level you have 16 compute units, which will encorporate X amount of shader blocks, and each block will have x amount of shaders and x amount of cache, etc, etc.

On to 512bit, 512bit does increase price, and complexity of the PCB, but we're not talking about £200 increase in cost, nor a massive increase in die size. This is how chips work, what is not viable on 40 nm, becomes viable on 28nm, and something you can't fit on 28nm, you can fit on 20nm. There is no point having twice as much shading power as a 7970, and the same bus, you'll hit bandwidth limits, maybe more so on compute, but considering this is AMD's likely first HSA compatible GPU, to go with their end of year HSA compatible CPU's, is going to be a big concern as they'll obviously be making a push at the professional markets soon enough.

Either way, I've said in other threads, 28nm has been like an inbetween process, it wasn't quite the right size for AMD or Nvidia. It was too big for Nvidia to stick a 384bit bus on a 680gtx, and Titan was WAY to big. This gen is likely to work out much better, with 384bit bus's and increased shader counts, IE 7970 with 30-40% more shaders on a 200-220mm2 card being a 8870xt being sub £200, awesome, with a 8970 being a huge increase in performance, 70-80%, a 325-375mm2 card and just awesomeness.

Why people are banging on about Titan prices because of the specs, even new process brings a massive performance jump and similar prices.

LtMatt · 9 May 2013 at 10:22

Found a HQ picture on another forum. Means nothing to me but have at it gents.

panyan · 9 May 2013 at 11:05

^ wow, thats quite... detailed

Boomstick777 · 9 May 2013 at 11:24

AMD Radeon HD 9970 Hawaii Detailed, Volcanic Islands GPUs Set for Late 2013 [UPDATED]

http://news.softpedia.com/news/AMD-...c-Islands-GPUs-Set-for-Late-2013-351659.shtml

Marine-RX179 · 9 May 2013 at 11:45

Boomstick777 said:
AMD Radeon HD 9970 Hawaii Detailed, Volcanic Islands GPUs Set for Late 2013 [UPDATED]

http://news.softpedia.com/news/AMD-...c-Islands-GPUs-Set-for-Late-2013-351659.shtml

Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?

shankly1985 · 9 May 2013 at 11:47

Looking forward to what ever they bring. New CPU and GPU by end of the year for me it looks like

Greebo · 9 May 2013 at 12:03

Marine-RX179 said:
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?

It wasnt exactly cheap though was it? 10 years ago the ATI Radeon 9800XT was $499 list price. Accounting for inflation and using today's exchange rates plus VAT that's over £500. I can remember desperately wanting one but I couldn't afford it and had to but a lesser £200 ish card. Finally got a 2nd hand one later.

However, the ATI Radeon 9800 was a beast so using that naming might bode well but I honestly don't see their flagship fastest version being under £500.

melmac · 9 May 2013 at 12:24

Marine-RX179 said:
Yea...I still remember the whole hype about "6970 gonna beat GTX580 at a much lower price", and then everyone calling it lemon on the forum when it was eventually released

He was talking about the 680 and 770 not the new card from amd. You don't need to go that far either, what about the whole hype that the Titan was going to be much faster than a 690.

drunkenmaster · 9 May 2013 at 12:31

LtMatt said:
Found a HQ picture on another forum. Means nothing to me but have at it gents.

The interesting things to point out are, the Arm cortex A5, this isn't new, AMD chips have used them for Arm trustzone for quite some time now, but the point is the Arm chip is highlighted as such on die, the "serial processing" unit, is NOT labeled as Arm, and infact contains a HT link, and the "cores" are two SPU units each sharing an FPU unit, which is what both Jaguar and Bulldozer/piledriver/steamroller do, which all points down to a similar kind of design with probably the vast vast majority of the rest of the core going away.

Its worth noting that Intel have gone for modules rather than individual cores in Atom, and removed HT, as AMD did(not the HT obviously), just earlier. I said many years ago, AMD are just going for more cores and lower IPC, and over the next several years/processes will expand each core/module to improve performance per core, Intel just went the other route, same cores, wider cores, and they are clearly moving towards more efficiency, more cores and sharing resources. In a couple gens AMD/Intel chips will be 8 cores with highly efficient cores, all in modules to save waste and gain efficiency(die size and power more than performance).

Anyway, while people are saying this is more an APU design than a GPU because it has a northbridge listed, its irrelevant, whenever you bring on different types of chips you need things linking them together.

Also while HSA, the entire point is to be able to chop and change IP blocks and have everything work together, and while part of AMD(the HSA foundation and most of the industry as well) is looking to offload things to discrete gpu's, and work from the gpu's offloaded to cpu's, latency will always be an issue and having serial work units on die will greatly improve performance above offloading serial stuff to the CPU, for some workloads at least.

In general we'll move forward to a point where you have an C-APU with mostly cpu, a little gpu, and a G-APU, with mostly gpu stuff and a little CPU. When a program can do cpu and gpu stuff concurrently, the workloads get split across the devices, but when the stuff being done in the GPU relies on the serial data being sped up, its done on die without the latency hit rather than moved off die.

Basically depending on the application there will be times its faster to go off die and do loads of serial or parallel work on the "other" device, and there will be applications where the particular serial or parallel work is smaller and its quicker to stay on die with a smaller acceleration unit.

The key at the moment, is having code optimised, and eventually being able to handle the decision on the fly where the CPU can decide where its best to send any given set of data to be processed quickest.

Its possible that is AMD's top end GPU bolted on to a custom jaguar cpu for a specific customer, or it could be that AMD has added a "cpu" to the gpu for this generation, it was always likely coming. Its impossible to tell from a high level achitecture diagram anything like performance or die size. Adding cpu like cores will obviously take die space away from shaders, but if you're removing say 30% of the work off the shaders to the cpu cores, but that also means that work gets done much faster AND leaves the gpu side idle while waiting for serial loads to be done much less, then you could potentially have significantly more performance than the same die all like a current gpu.

If it has 4000 shaders AND some cpu's on die and they get the balance right it could be significantly faster than what you'd expect a 4000 shader higher bandwidth 7970 to perform like.

One of the more interesting questions is, will they move to different clocks like Nvidia used to have(for different reasons), serial cores working at say 2Ghz like upper end of Jaguar, and gpu working at 1Ghz.

Even more interesting would be the question of production, you can already quite easily(but time consuming and more expensive to produce) have different parts of the chip using slightly different process, for instance the companion core in Tegra 3 uses a lower power optimised varient of 40nm process as the core is aimed at lower power, lower leakage and lower clocks. THe other potential option which we're moving towards.... but probably not quite there yet, is making the cpu, and gpu on entirely different silicon, on different processes(20nm HP for GPU and 20nm LP for cpu for instance) then putting them on a transposer(essentially a chip that is made purely to connect other bits of silicon, meaning two chips on the same interposer can communicate at bandwidth/latency that you can't do over a bus between two separate pieces of silicon like a CPU with discrete GPU).

The later is the "ultimate" way to do it and every foundry is working towards that, and Intel might be there already, TSMC probably aren't, the former is a very slow way of optimising different portions of the chip, it will reduce yields, purely because it takes longer to process the wafers, less wafers can be made in a given time which increases costs and reduces how many you can make.

http://www.extremetech.com/computing/119843-the-future-of-computers-3d-chip-stacking

The third technique, which isn’t technically stacking but still counts as “advanced packaging,” uses a silicon transposer (pictured above, below the stacked chips). A transposer is effectively a piece of silicon that acts like a “mini motherboard,” connecting two or more chips together (if you remember breadboard from your days as a budding electronic engineer, it’s the same kind of thing, but on a much smaller scale). The advantage of this technique is that you can reap the benefits of shorter wiring (higher bandwidth, lower power consumption), but the constituent chips don’t have to be changed at all. Transposers are expected to be used in upcoming multi-GPU Nvidia and AMD graphics cards.

Transposers are much closer and much more likely to be done sooner on a mass scale than TSV's, that link gives a good description, transposers are like a mini motherboard done in silicon that you can stick your chips on to, but you have very short lengths, incredible bandwidth and incredibly low latency, the chips can act almost as if they were on the same die.

TSV's stack the chips and, in a crude description its like taking a bunch of chips and stabbing a copper rod through them(that isn't how they are made, just how they would look), so there is a copper connection through all the chips. In reality yields take a huge hit, the chips have to be design specifically for it, and it takes loads more steps and is insanely expensive. It will happen eventually for most things but is much further out from mass scale production of cpu/gpu's than transposers.

Freddie1980 · 9 May 2013 at 12:52

Marine-RX179 said:
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?

But AMD didn't skip the 8000 series, you just can't buy them at retail.

Locky · 9 May 2013 at 13:38

I awaited drunkmasters massive long post on this

titaniumx3 · 9 May 2013 at 13:42

If the specs are true, I think I need to start saving up my monies for this beast.

Let's just hope that we don't have to spend half the GPUs life cycle waiting for AMD to optimise their drivers this time round.

humbug · 9 May 2013 at 13:57

drunkenmaster said:
So many daft statements in this thread, AMD's 20nm cards aren't and won't ever be an answer to Nvidia's 28nm "refresh" cards. Gpu's can't and won't ever compete across processes at the same die sizes.

Almost everything in the thread is pure speculation, but there are some things worth pointing out.

The ONLY reason people are suggesting x86/arm cores on die is because Nvidia has HINTED that they will have arm cores inside Maxwell or the one after, and I can't remember exactly what the statements are, it actually sounded more like they were just saying the units inside maxwell would be Arm compatible, because they'll be using maxwell shaders inside future Tegra chips.

Either way Nvidia isn't a cpu producer, and never has been, its used ARM in their soc's along with their own gpu's so it makes more sense for them to cannibalise something they have on hand to stick in their gpu's. With AMD, ARM is a very low power and efficient only at extremely low power. Its looking very much like Jaguar is going to be significantly faster than ARM for not much more power usage, for high work loads ARM isn't the most efficient chip around. If AMD try to split off serial and parallel work on die I would be surprised if it was Jaguar or ARM based, its most likely to not be any kind of "CPU" at all, but a very specific set of compute units just like shaders, but designed for a very different goal.

Compute units, are CU's, and they have nothing to actually do with compute, at all.

Its like having an octo core chip, then the next gen is octo core, but each core is vastly improved, but saying they've cut down compute because there is still only 8 cores.

Chips get subdivided as they get bigger. WIth old school 2 pipelines, there isn't any reason to subdivide the chip, when you hit 1600, you can't have the front end contact each shader individually, aside from a routing nightmare on die and huge latency across the chip, you separate the 1600 in to several big blocks, then you separate each big block into smaller blocks. Then after 5870, we had a chip that separated the front end into two halfs, and each gen we'll get more division. It works like a pyramid frankly, when you have 4000 shaders on the bottom rung, and 1 major first control unit at the top, you either have 4000 connections and the most inefficient chip on the planet, or you put in a bunch more levels. At one level you have 16 compute units, which will encorporate X amount of shader blocks, and each block will have x amount of shaders and x amount of cache, etc, etc.

On to 512bit, 512bit does increase price, and complexity of the PCB, but we're not talking about £200 increase in cost, nor a massive increase in die size. This is how chips work, what is not viable on 40 nm, becomes viable on 28nm, and something you can't fit on 28nm, you can fit on 20nm. There is no point having twice as much shading power as a 7970, and the same bus, you'll hit bandwidth limits, maybe more so on compute, but considering this is AMD's likely first HSA compatible GPU, to go with their end of year HSA compatible CPU's, is going to be a big concern as they'll obviously be making a push at the professional markets soon enough.

Either way, I've said in other threads, 28nm has been like an inbetween process, it wasn't quite the right size for AMD or Nvidia. It was too big for Nvidia to stick a 384bit bus on a 680gtx, and Titan was WAY to big. This gen is likely to work out much better, with 384bit bus's and increased shader counts, IE 7970 with 30-40% more shaders on a 200-220mm2 card being a 8870xt being sub £200, awesome, with a 8970 being a huge increase in performance, 70-80%, a 325-375mm2 card and just awesomeness.

Why people are banging on about Titan prices because of the specs, even new process brings a massive performance jump and similar prices.

drunkenmaster said:
The interesting things to point out are, the Arm cortex A5, this isn't new, AMD chips have used them for Arm trustzone for quite some time now, but the point is the Arm chip is highlighted as such on die, the "serial processing" unit, is NOT labeled as Arm, and infact contains a HT link, and the "cores" are two SPU units each sharing an FPU unit, which is what both Jaguar and Bulldozer/piledriver/steamroller do, which all points down to a similar kind of design with probably the vast vast majority of the rest of the core going away.

Its worth noting that Intel have gone for modules rather than individual cores in Atom, and removed HT, as AMD did(not the HT obviously), just earlier. I said many years ago, AMD are just going for more cores and lower IPC, and over the next several years/processes will expand each core/module to improve performance per core, Intel just went the other route, same cores, wider cores, and they are clearly moving towards more efficiency, more cores and sharing resources. In a couple gens AMD/Intel chips will be 8 cores with highly efficient cores, all in modules to save waste and gain efficiency(die size and power more than performance).

Anyway, while people are saying this is more an APU design than a GPU because it has a northbridge listed, its irrelevant, whenever you bring on different types of chips you need things linking them together.

Also while HSA, the entire point is to be able to chop and change IP blocks and have everything work together, and while part of AMD(the HSA foundation and most of the industry as well) is looking to offload things to discrete gpu's, and work from the gpu's offloaded to cpu's, latency will always be an issue and having serial work units on die will greatly improve performance above offloading serial stuff to the CPU, for some workloads at least.

In general we'll move forward to a point where you have an C-APU with mostly cpu, a little gpu, and a G-APU, with mostly gpu stuff and a little CPU. When a program can do cpu and gpu stuff concurrently, the workloads get split across the devices, but when the stuff being done in the GPU relies on the serial data being sped up, its done on die without the latency hit rather than moved off die.

Basically depending on the application there will be times its faster to go off die and do loads of serial or parallel work on the "other" device, and there will be applications where the particular serial or parallel work is smaller and its quicker to stay on die with a smaller acceleration unit.

The key at the moment, is having code optimised, and eventually being able to handle the decision on the fly where the CPU can decide where its best to send any given set of data to be processed quickest.

Its possible that is AMD's top end GPU bolted on to a custom jaguar cpu for a specific customer, or it could be that AMD has added a "cpu" to the gpu for this generation, it was always likely coming. Its impossible to tell from a high level achitecture diagram anything like performance or die size. Adding cpu like cores will obviously take die space away from shaders, but if you're removing say 30% of the work off the shaders to the cpu cores, but that also means that work gets done much faster AND leaves the gpu side idle while waiting for serial loads to be done much less, then you could potentially have significantly more performance than the same die all like a current gpu.

If it has 4000 shaders AND some cpu's on die and they get the balance right it could be significantly faster than what you'd expect a 4000 shader higher bandwidth 7970 to perform like.

One of the more interesting questions is, will they move to different clocks like Nvidia used to have(for different reasons), serial cores working at say 2Ghz like upper end of Jaguar, and gpu working at 1Ghz.

Even more interesting would be the question of production, you can already quite easily(but time consuming and more expensive to produce) have different parts of the chip using slightly different process, for instance the companion core in Tegra 3 uses a lower power optimised varient of 40nm process as the core is aimed at lower power, lower leakage and lower clocks. THe other potential option which we're moving towards.... but probably not quite there yet, is making the cpu, and gpu on entirely different silicon, on different processes(20nm HP for GPU and 20nm LP for cpu for instance) then putting them on a transposer(essentially a chip that is made purely to connect other bits of silicon, meaning two chips on the same interposer can communicate at bandwidth/latency that you can't do over a bus between two separate pieces of silicon like a CPU with discrete GPU).

The later is the "ultimate" way to do it and every foundry is working towards that, and Intel might be there already, TSMC probably aren't, the former is a very slow way of optimising different portions of the chip, it will reduce yields, purely because it takes longer to process the wafers, less wafers can be made in a given time which increases costs and reduces how many you can make.

http://www.extremetech.com/computing/119843-the-future-of-computers-3d-chip-stacking

Transposers are much closer and much more likely to be done sooner on a mass scale than TSV's, that link gives a good description, transposers are like a mini motherboard done in silicon that you can stick your chips on to, but you have very short lengths, incredible bandwidth and incredibly low latency, the chips can act almost as if they were on the same die.

TSV's stack the chips and, in a crude description its like taking a bunch of chips and stabbing a copper rod through them(that isn't how they are made, just how they would look), so there is a copper connection through all the chips. In reality yields take a huge hit, the chips have to be design specifically for it, and it takes loads more steps and is insanely expensive. It will happen eventually for most things but is much further out from mass scale production of cpu/gpu's than transposers.

Good info, i guess they could be a pair of 8 core Jaguar chips.

Marine-RX179 said:
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?

Unless they sneak in some HD 8K's next month

Doubtful, but meh....

drunkenmaster · 9 May 2013 at 15:00

I sincerly doubt its a "Jaguar" core, it could be based loosely on it. It could be entirely new, I just really doubt its Arm as AMD have been moving the GPU and CPU towards HSA for themselves for ages, the idea of using Arm IP to create a chip to optimise for an AMD gpu is frankly, ludicrous. HSA makes it possible, but Arm IP isn't optimised for use with AMD GPU's and there is very little reason for AMD to come up with a fully ground up Arm instruction set cpu that is optimised for this kind of usage... for their gpu's when they've been moving towards this point for 5 years.

AS for skipping 8000 cards, they are out, there have been OEM 8000 gpu's around for a while now and the 7790 could really be regarded as a 8770 realistically, a slightly changed, slightly improved, slight test run of some changes for AMD.

andybird123 · 9 May 2013 at 15:01

Marine-RX179 said:
Hm...so they are going to skip the labelling of 8000 series, but going straight to 9000 series? Now the question is...could this AMD Radeon 9000 series be once again be mighty bang for bucks cards like the original ATI Radeon 9700/9800, or would be be a case of "not affordable for mere mortals" (i.e. £500+)?

massive pinch of salt on that one - remember when all the early slides mentioned a GTX780 and everyone assumed they were going to skip GTX6**

we could be seeing the same here - that some info is being previewed on 2 different sets of cards and being lumped together as one piece of info with everything assumed to be the same thing

bru · 9 May 2013 at 18:12

They wont be arm cores, that would be like AMD putting Nvidia graphics cores in their first APU. AMD make CPU's as well as GPU's so using someone else's CPU's for their GPU's is ludicrous as Drunkenmaster has said.