• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Polaris architecture – GCN 4.0

Soldato
Joined
9 Nov 2009
Posts
24,910
Location
Planet Earth
CAT I consider you one of the most sensible posters on this forum but you are reading too much into this latest SA article.

In Nov SA (allegedly) had a source claiming Polaris "VG10" would be released in late Q1/early Q2. Or to put it another way, March/April 2016. The latest SA article is taking the info from AMD Caspaicin event that states "mid 2016/before back to school" as evidence of a slip by at least one quarter.

So the official info from AMD stating mid 2016 has not changed and any earlier dates were simply conjecture. This is just SA saying Polaris is not being release in March/April as their earlier Nov article claimed.

Maybe,but just hoping AMD does not screw up the launch - this year will be critical for them for both CPUs and GPUs IMHO.
 
Associate
Joined
14 Jun 2008
Posts
2,363
Even with just 8 the memory controllers will act very differently due to the way HBM works, they wouldnt use the same methods for controlling the chips as the GDDR memory controller would. (a bit of trivia is that the HBM memory controllers are smaller and less complex. :p ) And Nvidias approach is to use a single memory controller for every GDDR channel. The Hawaii architecture also uses 8 memory controllers with each controller managing 2 GDDR channels. The controllers themselves may act very differently to each-other so not exactly a guide to performance.

I'd be interested in reading up on that. Can you link to your sources on how the HBM memory controller works please.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
I'd be interested in reading up on that. Can you link to your sources on how the HBM memory controller works please.

There is not really any whitepapers on the HBM memory controller. But each memory controller has 8 channels connecting to a HBM stack. Half of a die in each HBM stack has its own channel. So each memory controller on fiji would probably be connected to half a HBM stack.

But a major simplification to the disign is how the memory controller directly drives the Ram, there is no Parallel to serial to parallel conversion between each memory chip and memory controller. so this simplifies the design even if there are more traces to the controller.

http://www.kitguru.net/components/g...ogue-set-to-start-mass-production-in-q1-2015/

has some more info on the HBM itself.

http://www.cs.utah.edu/thememoryforum/mike.pdf some other info there from a nvidia conference.

some more info here, point 6 talks about memory controller simplification. But it was mentioned a while ago by AMD that the HBM memory controllers on fiji are smaller and less complex than their GDDR5 controllers.

http://www.hardwareluxx.com/index.p...han-just-an-increase-in-memory-bandwidth.html
 
Associate
Joined
14 Jun 2008
Posts
2,363
Hmmm... yes I know how HBM itself is designed, the issue here is that Fiji memory subsystem itself only has 8 channels total, not per stack. If you look at the Fiji block diagram here: http://i.imgur.com/CR18KOb.png you can see that each stack is only connected to two of these controllers, and that's a single 512bit channel per controller from what I can gather.
 
Associate
Joined
31 Oct 2012
Posts
2,241
Location
Edinburgh
And yet cards with a lot less bandwidth like the TitanX and 980 Ti are faster @1080p.

HBM1 needs MHz to compete.

which don't have the drivers or rop limitations. Nothing to do with HBM again - as explained in the post you quoted.

Edit: The 390 / 390x both exhibit better performance at higher resolutions relative to their NVIDIA counterparts, without HBM, suggesting the problem lies elsewhere & backing Mauller's drivers point.
 
Last edited:
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
Hmmm... yes I know how HBM itself is designed, the issue here is that Fiji memory subsystem itself only has 8 channels total, not per stack. If you look at the Fiji block diagram here: http://i.imgur.com/CR18KOb.png you can see that each stack is only connected to two of these controllers, and that's a single 512bit channel per controller from what I can gather.

Not a single 512bit channel but 4 128bit channels. since a stack of HBM is 8 channels. so essentially the memory system has 32 channels. connected to 8 memory controllers. And then each HBM channel controls 8 banks of memory which can have independent timings, reads and writes.
 
Last edited:
Associate
Joined
14 Jun 2008
Posts
2,363
Not a single 512bit channel but 4 128bit channels. since a stack of HBM is 8 channels. so essentially the memory system has 32 channels. connected to 8 memory controllers. And then each HBM channel controls 8 banks of memory which can have independent timings, reads and writes.

Nothing I have seen or read indicates that the memory controller is as fine grained as you claim. The HBM stacks themselves are, but nothing from AMD indicates that they are making use of this at all. If anything it agrees with my assertion of a single 512bit channel per controller. So no matter what HBM itself has, the Fiji controller seems to bundle these up into relatively simple single 512bit channels.

I'll be happy to be corrected if you have a link.

Edit: Even mad chuck seems to think that there is something amiss in regards to this. http://semiaccurate.com/2015/06/22/amd-talks-fiji-fiji-x-odd-bits-tech/
AMD would not comment on average memory latency and dodged the issue every time we asked. They did say that bandwidth went way way up but would not comment on latency. None of the spec sheets SemiAccurate has access to have that information but you can do a bit of back of the envelope math. The transfer rates goes from ~7Gt/s to 1Gt/s when moving from GDDR5 to HBM. HBM is simpler and wider. Prefetch goes from 8 per I/O to 2 in HBM while access granularity goes up from 32B to 256B. Given all this the first byte likely gets returned slightly slower in HBM but every subsequent byte is much faster. From there things get really complex.
 
Last edited:
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
If you read back on the last page to my wall of text.

Ellesmere and Baffin will both support Per CU power gating, allowing them to shut down CU's for lower power usage when unneeded. But support is only in baffin first due to mobile part.

Both support 8 GDDR dies but Ellesmere has 8 channels while baffin has 4, making ellesmere 256bit to baffins 128bit.

Both will have 1152mhz core clock as default and downloack to 600 then 300 depending on load. No boost values given in the driver.

Ram has 6000mhz default which will downclock to lower powerstates. But could be higher in release parts, this is probably a safe default clock.

And Ellesmere XT looks to have around Fury-furyx performance if not better. but with around Hawaii number of CU's and shaders.

power usage is also far lower at those specs, somewhere in the range of up to 200watt.
 
Soldato
Joined
9 Nov 2009
Posts
24,910
Location
Planet Earth
Haven't kept up with this thread and the last few pages are full of walls of text :p :D

So... can someone summarise how polaris is looking atm?

Polaris 10 and 11 have been demonstrated. CEO said release is in the middle of the year in time for the back to school season. Gibbo said late summer released.

Me and a few others argued about what is defined as the back to school time period. I say it is mostly July and August and others say mostly August and September.

SA said Polaris was meant to be originally released in late Q1 2016 or early Q2 2016 but it was pushed back to summer.

Going from what AMD has said about the uarch,it appears Polaris has an improved command processor(which might mean better DX11 and lower resolution performance) and AMD are finally implementing primitive discard hardware like Nvidia meaning better tessellation.

Polaris 10 is also using GDDR5 and uses a 256 bit memory controller. Supposedly has 2304 shaders too.

If you read back on the last page to my wall of text.

Ellesmere and Baffin will both support Per CU power gating, allowing them to shut down CU's for lower power usage when unneeded. But support is only in baffin first due to mobile part.

Both support 8 GDDR dies but Ellesmere has 8 channels while baffin has 4, making ellesmere 256bit to baffins 128bit.

Both will have 1152mhz core clock as default and downloack to 600 then 300 depending on load. No boost values given in the driver.

Ram has 6000mhz default which will downclock to lower powerstates. But could be higher in release parts, this is probably a safe default clock.

And Ellesmere XT looks to have around Fury-furyx performance if not better. but with around Hawaii number of CU's and shaders.

power usage is also far lower at those specs, somewhere in the range of up to 200watt.

This too!

:p
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
Nothing I have seen or read indicates that the memory controller is as fine grained as you claim. The HBM stacks themselves are, but nothing from AMD indicates that they are making use of this at all. If anything it agrees with my assertion of a single 512bit channel per controller. So no matter what HBM itself has, the Fiji controller seems to bundle these up into relatively simple single 512bit channels.

I'll be happy to be corrected if you have a link.

Edit: Even mad chuck seems to think that there is something amiss in regards to this. http://semiaccurate.com/2015/06/22/amd-talks-fiji-fiji-x-odd-bits-tech/

The memory controller would need to be that fine grained due to how the HBM stacks are read and written.
 
Last edited:
Caporegime
Joined
17 Mar 2012
Posts
48,262
Location
ARC-L1, Stanton System
If you read back on the last page to my wall of text.

Ellesmere and Baffin will both support Per CU power gating, allowing them to shut down CU's for lower power usage when unneeded. But support is only in baffin first due to mobile part.

Both support 8 GDDR dies but Ellesmere has 8 channels while baffin has 4, making ellesmere 256bit to baffins 128bit.

Both will have 1152mhz core clock as default and downloack to 600 then 300 depending on load. No boost values given in the driver.

Ram has 6000mhz default which will downclock to lower powerstates. But could be higher in release parts, this is probably a safe default clock.

And Ellesmere XT looks to have around Fury-furyx performance if not better. but with around Hawaii number of CU's and shaders.

power usage is also far lower at those specs, somewhere in the range of up to 200watt.

From the info we have now I stand by my ~30% faster than Fury X/980Ti or at worst 20% faster.

Wow... really?
 
Soldato
Joined
30 Dec 2011
Posts
5,545
Location
Belfast
In current DX11 games I would say slightly faster than 980Ti/Fury X but with newer DX12 games up to 30% faster. Call it 20% overall and we will have a decent upgrade if the prices are ~£500 or less.

Of course this is based in most recent leaks which are not confirmed specs.
 
Soldato
Joined
30 Dec 2011
Posts
5,545
Location
Belfast
I still think about R9 390X/Fury level but around £200 to £250 if the 232MM2 die rumour is true. If it is better than a Fury X and uses a 232MM2 die it will be like the new HD4870/HD4890.

232mm2 die size with architectural improvements on a new node could give us 20-30% performance over Fury X IMHO with newer DX12 games.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
Vega is rumoured to have Fiji numbers of shaders. but with the improvements GCN 4.0 brings it will have far better theoretical max performance and be able to reach it the majority of the time. Should also be a compute and DP monster, better than Pascals DP unless their SP is massively greater which i doubt.
 
Last edited:
Soldato
Joined
9 Nov 2009
Posts
24,910
Location
Planet Earth
232mm2 die size with architectural improvements on a new node could give us 20-30% performance over Fury X IMHO with newer DX12 games.

It could,but I can see Polaris doing much better with tessellation and at lower resolutions in DX11 games,and things like minimums being improved too.

If they can get faster than Fury X performance from such a small chip,I would be impressed.

Edit!!

The improved command processor and the primitive discard hardware would be what I am looking at.
 
Back
Top Bottom