• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Polaris architecture – GCN 4.0

Associate
Joined
4 Nov 2013
Posts
1,437
Location
Oxfordshire
Vega isn't Polaris exactly, from what they've said. It's an argchitectural upgrade. They're claiming significantly higher performance per watt than Polaris, and HBM2 won't bring much in the way of performance per watt, if anything.

Something others seem to have missed is that AMD have stated that Polaris 10 & 11 are not the only chips - others will follow.

They also stated that Polaris has been made compatible with GDDR5 and HBM.

If the images of Polaris 10 & 11 chips in the slide were scaled somewhat correctly, then Polaris 10 must be around 350mm2 or slightly under. That's fairly big, and no surprise that it apparently matches or beats Fury X / 980Ti. Koduri hinted that the big Polaris would be shown later (with HBM).

Given the significant reduction in chip sizes with 14nmFF, it shouldn't be difficult for them to have 6 stacks (or maybe even 8 stacks) of HBM1, and therefore have a 6GB or 8GB Polaris HBM1 card. Tbh I don't even think it's needed in terms of capacity ... but I suspect the extra bandwidth is absolutely necessary.

One other thing ... the 2018 architecture. It'll be very interesting to see what that memory turns out to be. I assume it'll be either Wide IO or their own proprietary standard. If they need more bandwidth that early, then HBM isn't gonna work ... too much power consumption. Also very interesting that NVIDIA seem to have abandoned HMC entirely (no surprise), and will be keeping HBM2 for Volta, probably until 2020. That would be a major divergence.

A 4-8gb GDDR5 vram card could pull about 30W more than a HBM equiped.
 
Soldato
Joined
9 Nov 2009
Posts
24,929
Location
Planet Earth
HBM has way better perf/w than GDDR5(x). So this alone would make a big difference.

If Vega 10 and Vega 11 are the higher end GPUs with DP compute added,they should be less efficient than Polaris 10 and 11 which are probably compute light,so if that is the case,AMD HBM2 can't be the only saving grace IMHO.

Remember,AMD Fiji was much slower than AMD Hawaii in DP compute. I suspect Vega will replace Hawaii as the basis for the AMD professional cards,unless OFC AMD has another larger Polaris GPU in the works.
 
Last edited:
Associate
Joined
24 Nov 2010
Posts
2,314
A 4-8gb GDDR5 vram card could pull about 30W more than a HBM equiped.

It's not about the capacity, it's about the bandwidth. Memory bandwidth becomes a bottleneck as soon as the core clock on Fiji is raised above 1000Mhz. Imagine what big(ger than Polaris 10 / 11) Polaris is going to be like if it has GDDR5(X)? It needs HBM, and more than 4 stacks .. preferably also at higher than 500Mhz clocks. There's apparently a completely new L2 cache design** for Polaris, but I doubt it will work miracles re: bandwidth requirements, or they wouldn't be champing at the bit to get to HBM2, only to want to abandon it for something newer a year later

** I suspect its main purpose is to reduce latency further - very useful for VR.
 
Last edited:
Man of Honour
Joined
21 May 2012
Posts
31,922
Location
Dalek flagship
It's not about the capacity, it's about the bandwidth. Memory bandwidth becomes a bottleneck as soon as the core clock on Fiji is raised above 1000Mhz. Imagine what big(ger than Polaris 10 / 11) Polaris is going to be like if it has GDDR5(X)? It needs HBM, and more than 4 stacks .. preferably also at higher than 500Mhz clocks. There's apparently a completely new L2 cache design** for Polaris, but I doubt it will work miracles re: bandwidth requirements, or they wouldn't be champing at the bit to get to HBM2, only to want to abandon it for something newer a year later

** I suspect its main purpose is to reduce latency further - very useful for VR.

My TXs are faster than FXs and manage fine with crappy old GDDR5 @2160p.

The TXs also have 3x the memory to address compared to a fury FX which can mean a lot of work @2160p running ROTTR or XCOM 2 which can use 10.5gb when running.

The FXs real problem is HBM1 clockspeed.
 
Soldato
Joined
4 Feb 2006
Posts
3,226
ninja edit i think, im sure he said twice as fast originally lol

My last edit was 16.22 while your reply to it was 16.55 so I'm more inclined to believe that you misread. ;)

I didn't say it must be twice as fast but 'at least' as fast as a 980Ti/TitanX and judging by the Hitman benchmark it most probably is faster. If it comes in at around £300 then it would be around 50% faster than the current cards in that range such as the 390X/980.

However, the high end nextgen cards will almost certainly be twice as fast as the current top end though since they will be developed to do at least 60 fps at 4K which no current card can do in the latest games.
 
Associate
Joined
10 Jul 2009
Posts
1,559
Location
London
<...>
The FXs real problem is HBM1 clockspeed.

I think the problem is the immature technology. Matt said that clocks are locked at certain intervals. Even if you supposedly clock it to 600Mhz it really runs at 570mhz or so. This tells us that memory controller is still young. Who knows maybe Polaris will come with better mem controller and HBM1 will be refined enough to have more flexibility.

FX suffered more from driver overhead than anything related to HBM.
 
Soldato
Joined
7 Aug 2013
Posts
3,510
Agreed. I think that both AMD and Nvidia have been saving up big architectural changes because they've been waiting on the process shrink/transposer/HBM to take full advantage.

They've had longer to get ready for this process shrink than any other shrink in history, and I think they want to make the most of it after being stuck in 28nm limbo for years.
I would have thought the opposite. Being stuck on 28nm for so long, their only hope of providing continuing gains was to push ahead architectural improvements in order to satisfy consumer demand for more performance.
 
Soldato
Joined
7 Feb 2015
Posts
2,864
Location
South West
If Vega 10 and Vega 11 are the higher end GPUs with DP compute added,they should be less efficient than Polaris 10 and 11 which are probably compute light,so if that is the case,AMD HBM2 can't be the only saving grace IMHO.

Remember,AMD Fiji was much slower than AMD Hawaii in DP compute. I suspect Vega will replace Hawaii as the basis for the AMD professional cards,unless OFC AMD has another larger Polaris GPU in the works.

The gcn architecture works by fusing two 32bit units to perform a 64bit calculation. GCN has always been capable of 1/2 DP on all parts as there is no dedicated DP units in the architecture.

Fiji is also capable of 1/2 DP the same as Hawaii. So vega will gain improvements from architecture/ process tweaks and HBM2.

Only driver locks limit the DP capability of the parts.
 
Man of Honour
Joined
13 Oct 2006
Posts
91,762
I think the problem is the immature technology. Matt said that clocks are locked at certain intervals. Even if you supposedly clock it to 600Mhz it really runs at 570mhz or so. This tells us that memory controller is still young. Who knows maybe Polaris will come with better mem controller and HBM1 will be refined enough to have more flexibility.

FX suffered more from driver overhead than anything related to HBM.

It isn't uncommon for memory to run on set dividers going up in steps instead of individual units - it isn't necessarily an indication of any immaturity in the implementation.

Some minor issues aside HBM doesn't really do anything particularly positive or negative on the Fury cards (aside from limitations on the amount in situations where you really need it) its not so much that the memory tech is young but more that with current architectures in terms of performance it largely doesn't really offer anything tangible that GDDR5 doesn't do just as well.

What really harms Fiji is being on 28nm rather than sub 20nm planar - they just can't fit enough of certain things that are important to performance to pull away from nVidia's ability to brute force it on Maxwell.
 
Last edited:
Caporegime
Joined
18 Oct 2002
Posts
32,623
It isn't uncommon for memory to run on set dividers going up in steps instead of individual units - it isn't necessarily an indication of any immaturity in the implementation.

Some minor issues aside HBM doesn't really do anything particularly positive or negative on the Fury cards (aside from limitations on the amount in situations where you really need it) its not so much that the memory tech is young but more that with current architectures in terms of performance it largely doesn't really offer anything tangible that GDDR5 doesn't do just as well.

What really harms Fiji is being on 28nm rather than sub 20nm planar - they just can't fit enough of certain things that are important to performance to pull away from nVidia's ability to brute force it on Maxwell.


Yep, it is a bit if a misnomer that GPUs are starved for bandwidth. A quick overclock of the 980Ti memory and you quickly see this isn't the case, nor the fact the 980Ti is faster than the FuryX with far less bandwidth. If Bandwidth was a critical bottleneck then maxwell GM200 would be designed around 512bit interface but 384 works very well.

Increasing the memory clocks on the FuryX does increase performance but it shouldn't given the already much higher memory bandwidth. Some people have postulated the low clock speed may be hindering HBM on lower resolutions, there might be some truth in that. its likely not the increase in bandwidth but a reduction in latency or some who better clock synchronization or reduced aliasing etc. that helps.






Fiji's issues are they kept with 4 shader engines but added more compute units to the existing 4 shader engines. Therefore bottles necks at command processors, geometry engines, tessellation engines, rasterizers still exist. At a high level workload is getting divided fairly coarsely between 4 shader engines before being further subdivied between the different CUs. this can lead to under utilization of the shaders and thus the real-world performance a long way below the theoretical max.

GCN in its current form can only support 4 shader engines so they didn't have much choice in scalabaility, especially since being stuck on 28nm they didn't have an easy way to procedure a new architecture with 6-8 shader engines. 6 shader engines with slightly less CU per engine would alleviate the bottle-necking a lot and will automatically address areas AMD is weak at such as tessellation. THE ROP cont is also fixed at 64 for a 4 shader engien GCN, that likely impacts the card at 4K hen you would expect the addiitonal CUs and memory bandwidth to really shine. Fury's relative performance does increase at 4K but it never lives up to expectations.


If you look at the theoretical performance and numbers for Fiji vs Hawaii then there should be a big performance jump, that simply isn't the case. DX12 has seemingly reduced the performance difference even further. Its just a limitation of the architecture bought about by being stuck on 28nm. a GCN architecture with 6 or more engines would be much faster for the same number of compute units.
 
Soldato
Joined
9 Nov 2009
Posts
24,929
Location
Planet Earth
AMD going to smacked again, seen as they are using GDDR5, while Nvidias using GDDR5X ?

???

If Nvidia is using GDDR5X so would AMD. I expect they will be using GDDR5 still.

The gcn architecture works by fusing two 32bit units to perform a 64bit calculation. GCN has always been capable of 1/2 DP on all parts as there is no dedicated DP units in the architecture.

Fiji is also capable of 1/2 DP the same as Hawaii. So vega will gain improvements from architecture/ process tweaks and HBM2.

Only driver locks limit the DP capability of the parts.

I didn't know that - it seems quite flexible.

So since Nvidia is touting mixed precision from Pascal,isn't that what AMD is kind of already doing??

I wonder if Pascal,Polaris and Vega might be well matched in the end??

AMD trying to reduce power consumption of GCN and improve DX11 performance.

Nvidia adding back more compute and improving DX12 performance but already having demoed most of the power saving tips.
 
Last edited:
Soldato
Joined
26 May 2014
Posts
2,962
AMD going to smacked again, seen as they are using GDDR5, while Nvidias using GDDR5X ?
GDDR5X isn't even scheduled to enter mass production until "summer" according to Micron. I find it very hard to believe that either side's cards will be using it if they're due on shelves in late May/early June.
 
Caporegime
Joined
18 Oct 2002
Posts
32,623
???

If Nvidia is using GDDR5X so would AMD. I expect they will be using GDDR5 still.



I didn't know that - it seems quite flexible.

So since Nvidia is touting mixed precision from Pascal,isn't that what AMD is kind of already doing??

I wonder if Pascal,Polaris and Vega might be well matched in the end??

AMD trying to reduce power consumption of GCN and improve DX11 performance.

Nvidia adding back more compute and improving DX12 performance but already having demoed most of the power saving tips.



AMD also have to add back in a lot of DP compute performance:
Fiji has 1/16 DP performance.
Hawaii is 1/2
Maxwell is 1/32 (so 1/2 Fiji)
Consumer Kepler: is 1/24 (so 1/12 of Hawaii)
Tesla Kepler: is 1/3



People also go on about how Nvidia cut DP compute performance for Maxwell when AMD cut compute performance even more.
 
Caporegime
Joined
18 Oct 2002
Posts
32,623
GDDR5X isn't even scheduled to enter mass production until "summer" according to Micron. I find it very hard to believe that either side's cards will be using it if they're due on shelves in late May/early June.

Depends lot on the exact definition of mass production and the exact time frame.


Assuming mass production means there is going to be no stock constraints and micro can ship crate loads of chips to nvidia/AMD by May then for a end June released GPU there is no issue.

Mass production doesn't mean there was no production before hand, just highly constrained small batch production a few thousand chips say.

Also, mass production start date may be simple delayed until a big order is due, e.g. Micron might be able to have mass produced 4 months ago but AMD/Nvidia only need chips this May-June so Micron don't produce until then, there is no other customer so no need to mass produce early.
 
Back
Top Bottom