• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

AMD Polaris architecture – GCN 4.0

You can use all the fancy effects you want but realistic level design plays a part too.

For instance when you leave the vault in FO4 and come down to Sanctuary, the stream crossing area is very clearly micromanaged and look quite real as opposed to large tracts of the wasteland which are mostly neglected and look awful.
 
You can use all the fancy effects you want but realistic level design plays a part too.

For instance when you leave the vault in FO4 and come down to Sanctuary, the stream crossing area is very clearly micromanaged and look quite real as opposed to large tracts of the wasteland which are mostly neglected and look awful.

Certainly, Low abstraction API's should help play a massive part in allowing devs to create more realistic environments. Especially when it comes to making dense crowds in cities, making the area feel lived in. the new Deus ex already does this really well.
 
Lol, seriously you are telling me like I have not the foggiest. Carrizo on it's current low performance process is tuned for mobile, Ipc and clockspeed and headroom are very relevant if you wish to port this design onto desktop 65-95w (Bristol ridge). Also if benchmarks are going to be made upon performance of a throttling Excavator Apu, then that affects the comparsion being made to their ipc gains.
They cannot simply use the gf28lp for desktop, if they port it to 14nm then awesome, but I imagine they'll use the 28shp which to be honest is pointless.

The Stilt (a well know guru) proved that Carrizo loses it scaling and efficiency at higher frequencies.

'' Like I said before, these numbers are not perfectly accurate. Still, based on the numbers it seems that Carrizo cannot maintain it´s power efficiency at higher frequencies. Most likely the reason lies in the differences between the manufacturing process versions used for these designs. At low frequencies Carrizo is definitely more power efficient than Steamroller designs, however at ~2600MHz the two designs are already even. At frequencies higher than that Steamroller designs are more power efficient rolleyes.gif''


There is no magic in Carrizo: The design itself isn´t more power efficient at hardware level than the previous generations were. Carrizo simply has much more advanced, effective and most importantly finally functional power management. In fact at hardware level Carrizo is less power efficient than the previous generations due the changes in the design and the manufacturing process. At higher frequencies (>3200MHz) Kaveri / Godavari can easily outperform Carrizo in power efficiency.

http://www.overclock.net/t/1560230/jagatreview-hands-on-amd-fx-8800p-carrizo/520

Which again does not change IPC of the physical cores used(which is what AMD and Intel tend to talk about),which is what we are talking about(not other factors like uncore,etc). You are conflating the fact that the current implementation of the design does not scale as well at higher voltages to higher clockspeeds with core IPC which is a totally different thing.

If that was the case then the 4GHZ Core i7 4790K and a 2.5GHZ Core i7 4790T would have different IPC at the core level,and the various versions of the Core i3 in desktop would have different IPC to the special mobile sub 30W TDP Iris Pro versions which use different dies and are optimised for low power. Plus at 22NM Intel had multiple versions of the same process optimised for different power envelopes.

You are also making the assumption that Bristol Ridge is simply a porting over of the mobile Carrizo chips,which nobody knows currently - the implementation for mobile is actually focussed around reducing die size as well ,so L2 cache is halved over Kaveri so they could have room for the remaining chipset functions.

You are forgetting that Carizzo has a DDR4 memory controller as well,which if the chip uses AM4 will have enabled, and all mobile versions are using the chipset functions of the SOC which adds to the physical cooling when active,whereas even if the desktop version uses the same chips it will be mostly deactivated.

So the take home message,is that until Bristol Ridge is released we cannot really make a proper comparison between the two. The comparison you are linking to has far too many variables to mean anything - the A10 7870K is in a desktop board and the FX8800P is in a totally different mobile environment. You would need the boards and cooling to be equalised.

We can only do that when Bristol Ridge is released for desktop and for versions released in the FM2+ platform.

Anyway,as others have said this is the graphics card section and we should probably keep more discussion to the CPU section,and this is the last I will talk about CPUs in this thread.
 
Last edited:
I know we are talking about GPUs here, and the chart bellow also proves that 4K gaming is more GPU dependent than CPU even DX11.
So the next gen grunt power is needed. And also a reply to the "off topic" ongoing discussion :D

And also state that at 1080p atm my Nano, cannot shine. Waiting for BenQ to replace the 2730Z before I put any benchmarks out.


Credely put

I don't believe that the cpuy bottlenck is that bad at 1080p,It's always been my opinion that Fury/Tonga are too wide on the front end and with the 8 ace unit and global data share, as it's lacking a hardware scheduler it relys on the directx 11 software for their front end commanding, which doesn't help the feed the wider parallel architecture. The solution in Polaris seems to be a hardware scheduler and better shader throughput + direct x 12

Humbug is right mate. Look at this....

**No Hotlinking**

http://hexus.net/tech/reviews/cpu/85370-intel-core-i5-6600k-14nm-skylake/?page=8


4K
http://hexus.net/media/uploaded/2015/8/7d2c4f0a-1885-4594-9104-0f2410e554dc.png[/IMG

[IMG]http://hexus.net/media/uploaded/2015/8/d52a3cd4-0757-4ab6-900f-1f0711e6d33a.png[/IMG

[IMG]http://hexus.net/media/uploaded/2015/8/45f0ff9c-259d-4957-97a8-f01531654d77.png[/IMG

You see those A10 78xx?

See them at 1080p same GPU

[url]http://hexus.net/tech/reviews/cpu/85370-intel-core-i5-6600k-14nm-skylake/?page=7[/url]

[IMG]http://hexus.net/media/uploaded/2015/8/697d4cd9-cbb5-4135-be67-7f1aa4107bcb.png[/IMG


[IMG]http://hexus.net/media/uploaded/2015/8/8514bfb0-4915-4d5f-92b8-18a51c11592e.png[/IMG

[IMG]http://hexus.net/media/uploaded/2015/8/10d15c7f-f182-490d-888a-fca5f90d4027.png[/IMG

[U]As you see, at 4K a A10 7850/7870 works the same as the CPUs at 3 times it's cost. [/U]

Because a 7870 goes for £100 a pop and the 6700K almost £300 not costing DDR4 RAM, motherboard and coolers on top.
 
Last edited:
I know we are talking about GPUs here, and the chart bellow also proves that 2560x1440 and 4K gaming are more GPU dependent than CPU even DX11.


Humbug is right mate. Look at this....

http://hexus.net/tech/reviews/cpu/85370-intel-core-i5-6600k-14nm-skylake/?page=8



Because a 7870 goes for £100 a pop and the 6700K almost £300 not costing DDR4 RAM, motherboard and coolers on top.

Yes all that proves is that the amd cpu is weaker at less intensive resolutions, that at higher resolution the restriction is gpu bound rather than cpu bound. Once upon a time 1080p used to be gpu intensive and 1680x1050 was the favoured resolution to show up amd cpus in gaming.
If you want to talk gou's then bring up fury 1080p/1440p 4k scaling vs other cards.

Scheduler and shader throughput efficiency is a totally different subject.
 
Last edited:
Yes all that proves is that the amd cpu is weaker at less intensive resolutions, that at higher resolution the restriction is gpu bound rather than cpu bound. Once upon a time 1080p used to be gpu intensive and 1680x1050 was the favoured resolution to show up amd cpus in gaming.
If you want to talk gou's then bring up fury 1080p/1440p 4k scaling vs other cards.

Scheduler and shader throughput efficiency is a totally different subject.

I was running the Attila bench last night to test the Nano. At 1080p a mere 11% overclock on the 4820K (from 3900 to 4300) boosted the FPS (at max out settings) by 25%. No other overclock.

That says it all.
 
GCN actually has hardware scheduling like Fermi did. Nvidia moved to software scheduling from Kepler onwards(like ATI/AMD did upto the HD5000 series).

Which compounds onto the power use. One of the main reasons why the nvidia cards had a sudden power efficiency improvement. Data transfer uses far more power now than the actual processing itself. So more power used in fermi and gcn since it performs more data transferring on die compared to Kepler onwards.

We will more than likely see hardware scheduling again with Pascal. The power improvements with 14nm and fin fet offset the scheduling poweruse.
 
Don't try and weasel out like you always do.

You were talking about additions to the GPU tech itself, not in general. You have been trying to set yourself up as an all-knowing expert lately because you played around with a CryEngine tutorial a bit, tried to waffle and got corrected.

At least own up to it for once.

I'm sure a lot of cover-up blather will follow, trying to dodge around it and have it forgotten, including trying to turn it back on me when all I did was make a simple correction that can be confirmed by several people reading this very thread.

meow. :eek:

I was talking about how Memory handles texture and how Nvidia used a more efficient approach to AMD's brute force.

Which is exactly the same thing the article you linked talks about.

No one needs be an expert to see what they are talking about and what i am are the same thing and in agreement, if your the real expert what gives? a day off? :p


I want to add to this.

AMD do have ' or have had a bit of an issue with Tessellation and Texture Compression. when compared with Nvidia.

But, with Texture Compression on the AMD side its usually overcome with brute force rather than efficiency, AMD traditionally have a wider memory Bus, 384Bit vs 256Bit, 512Bit vs 385Bit, 4096Bit vs 384Bit.
The width of that bus dictates in combinations with memory speed the memory bandwidth, 250GB/s - 320GB/s - 512GB/s the memory bandwidth is what matters with texture LOD, the more bandwidth you have the higher the performance.

So while Nvidia have better and more efficient Texture compression, AMD have more muscle for it.

As for Tessellation, there isn't a lot AMD can do about that, other that just having more raw power.

AMD have now, by the looks of it addressed these things, Tonga and Fiji have far more efficient texture compression, Polaris looks like it will improve that more and address the Tessellation problem.

So what was your reaction here all about really?

Don't be so personal and b####y, good grief its not all that important.
 
Last edited:
Which again does not change IPC of the physical cores used(which is what AMD and Intel tend to talk about),which is what we are talking about(not other factors like uncore,etc). You are conflating the fact that the current implementation of the design does not scale as well at higher voltages to higher clockspeeds with core IPC which is a totally different thing.

You said '' IPC has nothing to do with clockspeed. Plus Bristol Ridge is being released for desktop with Excavator cores. Plus the 40% IPC statement is from an AMD slide. There is a good cumulative 25% to 35% IPC increase from Bulldozer to Excavator,so it probably isn't far off K10''.[/QUOTE]

I'm conflating the fact because that is the only excavator we have at the moment and there's too many unknowns.
What I am saying is what is the basis of this zen 40% increase in Ipc based on? not the affects of the ipc of the physical cores/arch changes. If it's the current Mobile Excavator then like I said in my previous post ''if benchmarks are going to be made upon performance of a throttling Excavator Apu, then that affects the comparsion being made to claim their zen ipc gains.'' If it's a fully customised tweaked desktop excavator as a comparison then ok kewl, but then it's very late in the game to be designing a Bristol chip, with larger l2 cache and improved internals to the mobile Carrizo, when zen was being designed and is yet to be mass produced, unless zen is to be delayed until 2017.

If that was the case then the 4GHZ Core i7 4790K and a 2.5GHZ Core i7 4790T would have different IPC at the core level,and the various versions of the Core i3 in desktop would have different IPC to the special mobile sub 30W TDP Iris Pro versions which use different dies and are optimised for low power. Plus at 22NM Intel had multiple versions of the same process optimised for different power envelopes.

You are also making the assumption that Bristol Ridge is simply a porting over of the mobile Carrizo chips,which nobody knows currently - the implementation for mobile is actually focussed around reducing die size as well ,so L2 cache is halved over Kaveri so they could have room for the remaining chipset functions. .

It's just clock scaling and leakage between 4790 and 4790t which affects the ipc of each processor. If clocked the same they would perform the same.
The same applies to a mobile Carrizo even if it was ported onto a different process. I'm talking about what is the baseline of an excavator that amd bases the zen improvements on.
You stating clockspeed doesn't have anything to do with ipc is wrong. Whilst my frequency scaling of the ipc of Carrizo could be considered off topic, it doesn't change the fact that ipc is still relevant to clock scaling.
Also Kaveri and Mobile Carrizo have the same die size but Carrizo has a higher density. The l2 was halved for die size, but more for power consumption and to allow the L1 to be doubled up too.

You are forgetting that Carizzo has a DDR4 memory controller as well,which if the chip uses AM4 will have enabled, and all mobile versions are using the chipset functions of the SOC which adds to the physical cooling when active,whereas even if the desktop version uses the same chips it will be mostly deactivated.

So the take home message,is that until Bristol Ridge is released we cannot really make a proper comparison between the two. The comparison you are linking to has far too many variables to mean anything - the A10 7870K is in a desktop board and the FX8800P is in a totally different mobile environment. You would need the boards and cooling to be equalised.

We can only do that when Bristol Ridge is released for desktop and for versions released in the FM2+ platform.

Anyway,as others have said this is the graphics card section and we should probably keep more discussion to the CPU section,and this is the last I will talk about CPUs in this thread.

I'm not forgetting the ddr4 controller, just never mentioned it, Just like I could say you didn't mention that Carrizo has a doubled L1 cache over Kaveri.
your points of on chip thermals is valid I agree, but I'm not the person who brought zen up in this thread I was responding to it. And it's not like I haven't posted relevant posts in this thread on the discussion of amd gcn's

My understanding of Carrizo is pretty good
http://forums.overclockers.co.uk/showpost.php?p=28138714&postcount=242
 
Last edited:
Like I said in post 272 I was spot on with my point that depending on the process used will determine if Carrizo is worth porting to high frequency/high tdp, and you have just confirmed it in your own explanation, so bugger off I was not incorrect !

I'm not sure how I was agreeing with you, your post was all over the place. You quoted 'proof' that Carrizo was inefficient at higher speeds, the proof actually placed this likely on the process differences and you're explanation of this quote was....

There is no magic in Carrizo: The design itself isn´t more power efficient at hardware level than the previous generations were. Carrizo simply has much more advanced, effective and most importantly finally functional power management.

You're using the difference between efficiency of two chips on different processes as proof there is no power improvement in the chip. This is completely wrong and we aren't in agreement. On the same process Carizo may be 10% lower power at the same high clock speeds. The proof you posted was someone saying that it loses power efficiency at high speed down to the process, ie, not lack of any efficiency gains in the architecture.


I said Ipc is relevant to clockspeeds and scaling so glad you were agreeing with my point.

You responded to a post saying IPC isn't related to clock speed with the words "you're telling me like I haven't got the foggiest" I took that as "you're telling me like I don't already know", which I took to me you agreed that IPC wasn't related to clock speed.


Agreed which is my whole entire point, If people are stating zen will bring 40% Ipc over excavator then what version of Exavator are we using for a comparison?

We shall see when the times comes. but the only baseline of an Excavator at the moment is a Carrizo which is limited in a 15w tdp oem.

Regardless of what happens, AMD is making the claim based upon the architecture outside of a specific implementation, that is true and will always be true. This was the point of my post so I fail to see where you 'agreed' with me because I was saying nothing similar to what you again state. The architecture is the architecture, a specific implementation is a specific implementation. An implementation does not change the way the chip works, it just can alter the specific power/clock speed ratio a particular implementation will follow.
 
I'm not sure how I was agreeing with you, your post was all over the place. You quoted 'proof' that Carrizo was inefficient at higher speeds, the proof actually placed this likely on the process differences and you're explanation of this quote was.....

''There is no magic in Carrizo: The design itself isn´t more power efficient at hardware level than the previous generations were. Carrizo simply has much more advanced, effective and most importantly finally functional power management.''.....


No you are using a paragraph from the post I made and tailoring everything to suit you own agenda, and not including from that post info that I stated was relevant to process node and Carrizo.
Let's put that post below here,

I said this
Lol, seriously you are telling me like I have not the foggiest. Carrizo on it's current low performance process is tuned for mobile, Ipc and clockspeed and headroom are very relevant if you wish to port this design onto desktop 65-95w (Bristol ridge). Also if benchmarks are going to be made upon performance of a throttling Excavator Apu, then that affects the comparsion being made to their ipc gains. They cannot simply use the gf28lp for desktop, if they port it to 14nm then awesome, but I imagine they'll use the 28shp which to be honest is pointless.

The Stilt (a well know guru) proved that Carrizo loses it scaling and efficiency at higher frequencies.

'' Like I said before, these numbers are not perfectly accurate. Still, based on the numbers it seems that Carrizo cannot maintain it´s power efficiency at higher frequencies. Most likely the reason lies in the differences between the manufacturing process versions used for these designs. At low frequencies Carrizo is definitely more power efficient than Steamroller designs, however at ~2600MHz the two designs are already even. At frequencies higher than that Steamroller designs are more power efficient rolleyes.gif''


There is no magic in Carrizo: The design itself isn´t more power efficient at hardware level than the previous generations were. Carrizo simply has much more advanced, effective and most importantly finally functional power management. In fact at hardware level Carrizo is less power efficient than the previous generations due the changes in the design and the manufacturing process. At higher frequencies (>3200MHz) Kaveri / Godavari can easily outperform Carrizo in power efficiency.

http://www.overclock.net/t/1560230/j...0p-carrizo/520

You're using the difference between efficiency of two chips on different processes as proof there is no power improvement in the chip. This is completely wrong and we aren't in agreement. On the same process Carizo may be 10% lower power at the same high clock speeds. The proof you posted was someone saying that it loses power efficiency at high speed down to the process, ie, not lack of any efficiency gains in the architecture.

No I am saying if the Ipc gain of 40% is based on mobile excavator then I am saying Carrizo does not scale as well on the 28lp process and are the ipc gains based on a realworld throttling excavator, Carrizo still has an Ipc increase over Kaveri when the Stilt tested both different apu's at clock to clock. What I am saying in the upper part of my post is that if Amd cannot simply port to 28lp for desktop excactor, if they use 14nm great but 28shp would be pointless too.

You responded to a post saying IPC isn't related to clock speed with the words "you're telling me like I haven't got the foggiest" I took that as "you're telling me like I don't already know", which I took to me you agreed that IPC wasn't related to clock speed.

Don't try twist and turn, I said Ipc and clockspeed are very relevant.
'' Carrizo on it's current low performance process is tuned for mobile, Ipc and clockspeed and headroom are very relevant if you wish to port this design onto desktop 65-95w (Bristol ridge)''.

Regardless of what happens, AMD is making the claim based upon the architecture outside of a specific implementation, that is true and will always be true. This was the point of my post so I fail to see where you 'agreed' with me because I was saying nothing similar to what you again state. The architecture is the architecture, a specific implementation is a specific implementation. An implementation does not change the way the chip works, it just can alter the specific power/clock speed ratio a particular implementation will follow.

The only part of my post you took notice and fixated on was the process and clock scaling of excavator, and no the main gains from excavator's efficiency come from the avfs power management and refined voltage/clock tables not purely process. My main point out of all this is
''If benchmarks are going to be made upon performance of a throttling Excavator Apu, then that affects the comparsion being made to claim their zen ipc gains.''. and that depending on the process used will determine if Carrizo is worth porting to high frequency/high tdp. It's no different me saying that, to you saying AMD is making the claim based on the architecture outside of a specific implementation, as you have no evidence to support such a claim until desktop Excavator is taped out.
 
Last edited:
WHY are you talking about CPU's?!?! It has NOTHING to do with the GPU's.

what has that got to do with GPU's?:confused:


Firstly I did not bring the cpu discussion up it was Insanties_birth/ pcm post 244 245, which led onto more zen discussion in page 9 of this thread, so Don't single me out like i'm deraling the thread. I've posted on topic material in this thread and I help out on the forum giving people advice, in particular amd cpu advice.

If you feel so strongly about it then talk to a mod and get it moved into this thread,
http://forums.overclockers.co.uk/showthread.php?t=18633772&page=9

If not mind you own business and don't get involved in something you don't understand, you could just as easily ignore it and withhold a comment.
 
Last edited:
Agreed which is my whole entire point, If people are stating zen will bring 40% Ipc over excavator then what version of Exavator are we using for a comparison?

You compare with a generational improvement for like products, that's the only way it can work properly.

We just have to assume Zen is expected to have a 40% average higher IPC for any given SKU, there's really no alternative.

A great example of an improper comparison is the IPC between the FX-4300 and a Steamroller APU. You can't do this without first accounting for the difference in supporting infrastructure (namely the lack of L3 cache in all Steamroller implementations). Steamroller averages about a 6.7% IPC improvement if you take that into account (by comparing APU to APU), but has no improvement at all (and sometimes backtracks) if you don't.

For Excavator, we have to compare the mobile Steamroller chips. The average improvement here is a more healthy 9.85%.

If we track all the AMD construction core performance from Bulldozer onward, we have the following:

Code:
Bulldozer:    100% (abysmal IPC)
Piledriver:   109% (~10% lower IPC than K10)
Steamroller:  116% (near-parity with K10)
Excavator:    128% (parity with Penryn (Core 2), finally!)
Zen (est):    179% (near-parity with Haswell)

Average IPC, of course, doesn't tell the whole story as we don't use the same applications or instruction sets as we once did and for reasons of comparison these are not always taken into account (the same exact code runs on generation after generation, new capabilities being ignored).

In the end, Zen will jump past Sandy/Ivy Bridge's IPC while including all the modern niceties... and the joys of owning an AMD platform that will likely mature far more gracefully than Intel platforms.
 
You compare with a generational improvement for like products, that's the only way it can work properly.

We just have to assume Zen is expected to have a 40% average higher IPC for any given SKU, there's really no alternative.

A great example of an improper comparison is the IPC between the FX-4300 and a Steamroller APU. You can't do this without first accounting for the difference in supporting infrastructure (namely the lack of L3 cache in all Steamroller implementations). Steamroller averages about a 6.7% IPC improvement if you take that into account (by comparing APU to APU), but has no improvement at all (and sometimes backtracks) if you don't.

For Excavator, we have to compare the mobile Steamroller chips. The average improvement here is a more healthy 9.85%.

If we track all the AMD construction core performance from Bulldozer onward, we have the following:

Code:
Bulldozer:    100% (abysmal IPC)
Piledriver:   109% (~10% lower IPC than K10)
Steamroller:  116% (near-parity with K10)
Excavator:    128% (parity with Penryn (Core 2), finally!)
Zen (est):    179% (near-parity with Haswell)

Average IPC, of course, doesn't tell the whole story as we don't use the same applications or instruction sets as we once did and for reasons of comparison these are not always taken into account (the same exact code runs on generation after generation, new capabilities being ignored).

In the end, Zen will jump past Sandy/Ivy Bridge's IPC while including all the modern niceties... and the joys of owning an AMD platform that will likely mature far more gracefully than Intel platforms.

Great post for ipc scaling, I hope the mods will move this into the thread and not delete it with the others.
I'm hoping for ivy-haswell performance and it would be nice to go back to amd for my main pc,
 
Those 4K benchmarks are unplayable anyway, so doesn't mean the AMD CPU is a good buy.

Try looking at benchmarks for FuryX 2-4 way Crossfire, or 980i 2-4way SLI - the Intel CPU's will once again murder the AMD CPU's.

He wasn't using that graph to claim the AMD CPUs were on parity with Intel. He was using it to demonstrate that at 4K the bottleneck is the GPU while at 1080p it becomes the CPU.
 
Back
Top Bottom