ATI cuts 6950 allocation

mack-attack99 · 9 Dec 2010 at 23:21

Well I have to say that I am somewhat dissapointed after reading through the whole "rollercoaster" of which one will be faster, (ravenxxx2 im looking at you). Unfortunatly I think that once again amd/ati will have to play second fiddle to Nvidia.....I mean its not that i dont appreciate the whole perfomance per watt thing but I really wanted amd to just put nvidia in its place once and for all, that being amd having the fast single gpu no if and buts about it......

drunkenmaster · 9 Dec 2010 at 23:21

Rroff said:
Probably get flamed for it although its true... with the R600 ATI tried to implement a ton of features that the GPU had no hope of actually using with decent performance - stuff thats only just becoming feasible now, even tho it meant nerfing the DX spec nVidia concentrated on features that were useful for that generation of GPUs. While I'm all for progress the time just wasn't right for things like tessellation, advanced deferred rendering, etc. if they'd concentrated on the stuff that actually mattered it would probably have been a reasonaly good performing, thermal/power efficent GPU even tho it wasn't on a great process.

yes and no, remember if it was on 65nm as it was designed for, it would likely have a lot more shaders. But tesselation unit isn't about doing everything in every scene tesselated. Most fundamental completely intergrated, can't live without them features in GPU's started off several generations early as tiny almost unusable features.

The 2900xt was mostly(if not completely) dx10.1 compatible, which itself showed better performance than dx10 in the few games that used it.

Remember a lot of dx10.1(original dx10) features were about better efficiency not higher quality. The tesselation unit took up fairly little space and no, it wouldn't have enabled full scene tesselation, but if Nvidia had crap tesselation as far back as 8800 series cards, then one game back then would have used it marginally, and a benchmark would also have used it.

The key thing is, after it starts to make an appearance in games, the NEXT generation would have a reason to up tesselation performance significantly.

It doesn't really matter when you start adding a new feature, just it will take time, if the 8800 had tesselation, DX10 included tesselation the last 3dmark would have included tesselation, and the 4870 would have had twice the tesselation power it did have, and the 5870 would have 4 times what it has now, thats how features work. Nvidia/AMD implement them in a , look you can use it, just don't get excited about it, when you guys use it a little we'll implement it properly. Think about things like HDR, first cards performances were killed with it enabled, Nvidia could do it, but not with AA and some other problems, now its common as muck and both cards can handle it without breaking a sweat, the same is true of most features.

It doesn't matter if the card can handle it, new features only matter that both companies have it, its supported in drivers/dx/hardware, and they show they are willing to make it work if dev's show they want to use it.

Rroff · 9 Dec 2010 at 23:25

While the basic shaders on Fermi share very similiar basic arithmetic functionality to the G80 cores everything else is done quite differently the geometry engine is completely different, threading, scheduling, cache, etc. is all vastly improved over previous designs the TMUs are very different too. Granted in terms of efficency for rendering it doesn't work out much more efficent than a scaled up GT200 but architecture wise it only bares passing similiarity.

If you compare the R600 against cayman however theres a vast amount of similiarity, same thread dispatcher, similiar setup for handling geometry the only difference is the design is now somewhat more modular and some things have been moved around for better efficency i.e. z-stencil cache.

EDIT: As I said before and people ignored/laugh at me... the exercise with the 6 series has been about increasing the efficency of the design and making DX11, etc. features more "integrated" so they work better with the whole rendering pipeline rather than a redesign.

Rroff · 9 Dec 2010 at 23:29

drunkenmaster said:
yes and no, remember if it was on 65nm as it was designed for, it would likely have a lot more shaders. But tesselation unit isn't about doing everything in every scene tesselated. Most fundamental completely intergrated, can't live without them features in GPU's started off several generations early as tiny almost unusable features.

The 2900xt was mostly(if not completely) dx10.1 compatible, which itself showed better performance than dx10 in the few games that used it.

Remember a lot of dx10.1(original dx10) features were about better efficiency not higher quality. The tesselation unit took up fairly little space and no, it wouldn't have enabled full scene tesselation, but if Nvidia had crap tesselation as far back as 8800 series cards, then one game back then would have used it marginally, and a benchmark would also have used it.

The key thing is, after it starts to make an appearance in games, the NEXT generation would have a reason to up tesselation performance significantly.

It doesn't really matter when you start adding a new feature, just it will take time, if the 8800 had tesselation, DX10 included tesselation the last 3dmark would have included tesselation, and the 4870 would have had twice the tesselation power it did have, and the 5870 would have 4 times what it has now, thats how features work. Nvidia/AMD implement them in a , look you can use it, just don't get excited about it, when you guys use it a little we'll implement it properly. Think about things like HDR, first cards performances were killed with it enabled, Nvidia could do it, but not with AA and some other problems, now its common as muck and both cards can handle it without breaking a sweat, the same is true of most features.

It doesn't matter if the card can handle it, new features only matter that both companies have it, its supported in drivers/dx/hardware, and they show they are willing to make it work if dev's show they want to use it.

Theres a difference between forward looking features, and vast swathes of the DX10 spec as pushed by ATI which realistically was a long way away from viable performance - a lot of those features are now seeing light in the new 3D Marks 11... look how that brings even the latest and greatest down to 10fps... seriously it was a complete and utter waste of die space and time back then.

EDIT: On the flipside it did mean that when it came to DX11 ATI were able to steal a march on nVidia as they already had a good idea what was what.

jayextreme · 9 Dec 2010 at 23:52

that's hope this is not overly expensive for a Christmas pressie

drunkenmaster · 10 Dec 2010 at 00:02

Rroff said:
While the basic shaders on Fermi share very similiar basic arithmetic functionality to the G80 cores everything else is done quite differently the geometry engine is completely different, threading, scheduling, cache, etc. is all vastly improved over previous designs the TMUs are very different too. Granted in terms of efficency for rendering it doesn't work out much more efficent than a scaled up GT200 but architecture wise it only bares passing similiarity.

If you compare the R600 against cayman however theres a vast amount of similiarity, same thread dispatcher, similiar setup for handling geometry the only difference is the design is now somewhat more modular and some things have been moved around for better efficency i.e. z-stencil cache.

EDIT: As I said before and people ignored/laugh at me... the exercise with the 6 series has been about increasing the efficency of the design and making DX11, etc. features more "integrated" so they work better with the whole rendering pipeline rather than a redesign.

There is smeg all in R600 that bears a similarity to Cayman, top to bottom its different, schedualer is VASTLY different, AMD showed/said this, due to simplified shaders(one type vs 2) that makes things massively easier and saves a LOT of core logic, thats from AMD's mouths(and something I suggested weeks before we saw those slides). The shaders are a 4 way identical shaders, not a 5 way 4 simpler shaders and one uber complex one.

The 2900xt had a 512bit external memory bus, and a 1024bit internal ringbus, 512bit memory bus, down to 256bit, works with different kinds of memory and is vastly more efficient(while Nvidia's seems to struggle badly with gddr5, some might suggest they haven't come close to updating it to match latest memory speeds).

THe internal memory bus, well, that was a HUGE die space expense, huge, absolutely insane(though should have been a lower percentage on a 65nm die with more shaders). Thats gone, thats one of the biggest changes in architectures from either side in, well, a long time, its a hugely immense change.

The front end has changed dramatically from the 5870 to the 6870, let alone 2900xt to 6970.

Its fairly simple, Nvidia rather operates a 1:1 design, equate it to people, AMD has 100 people but 100 doors for them to pass through, its efficient and very simple, this hasn't changed since 8800. It doesn't really matter what type of people go through those doors, theres 100 available and they can each do the same thing.

AMD has 400 people, but 40 doors, and it has to work VERY hard to get them through as quickly as possible, and different people have to go through different doors. The general route input to output through the Nvidia core is essentially easy because of those, schedualing is pretty easy, everythings predictable and simple, the only cost for an efficient(in terms of code and getting full power) architecture, is size.

WIth every generation I'd be expecting pretty fundamental and large changes right through the AMD design, because when trying to get 400 people through 40 doors, theres a bunch of work to be done to improve every generation, for Nvidia, you can't easily improve on, one door for everyone philosphy, it pretty much solves itself.

Yes, shaders change, functions they can perform change, thats not "really" architecture. AMD's shaders, what they can do, functions they can perform change.

As for the functions and wasted die space, the only thing "wasted" on the R600 die, was the tesselator, almost every other feature of dx10 as it was, was efficiency, wasting die space on efficiency improvements only fails when someone rips out the software to use the efficiency improvements.

Overusing features in a new dx11 benchmark doesn't in any way equate to waste on a die 3 years ago, or is it 4 now, suggesting so is just ridiculous to be perfectly honest.

By your own account and description of DX11 features that would have been a waste several years ago, equates to saying that right now, because 3dmark over uses several features, that the 480/580/6970 having tesselation is also a waste, because its simply not fast enough to overuse them in a completely ridiculous was......... no, that also makes entirely no logical sense.

Hardware support comes before software implementation, comes before further hardware support and continual increase in software implementation of a feature, it HAS to start somewhere, if it doesn't, no one would ever use it.

It will still be YEARS, maybe 3-4 years before full scene tesselation as in Uniengine+ levels are everywhere in a game, across almost everything is completely standard, does that mean tesselation in the 480gtx is a complete waste, well no, it won them uniengine.....

As it gets used more, hardware for tesselation will improve, efficiency will improve, knowledge of coding for it more efficiently will improve and hardware work arounds to reduce overheads and make it more usable with lower performance hardware around it WILL happen, it only happens quicker the earlier its introduced.

EDIT:- other than the tesselator, can you name another feature of the original dx10 spec, that was in the R600, that was wasted and would only have hurt performance if it was used? Then just for the heck of it, any idea how much die space it "wasted". Also for the record, tesselation is actually an efficiency improving device, being used to offset an increase in quality. IE tesselate one fantastically detailed image, and producing the same image without tesselation, not a flat image without tesselation but the same quality same detail image and tesselation is massively faster.

DX 1 through "till MS die" is 99% about making it easier and faster to implement new things people come up with, its rarely about doing something you could never ever do before and therefore kills performance.

Rroff · 10 Dec 2010 at 00:43

Sorry wasn't clear on that point - I didn't mean it was actually stuff implemented and a waste of die space - I mean many of the features ATI was designing for and pushing for would have been a waste of die space if they had been implemented as there wasn't a hope of actually making use of them with any kinda decent performance.

mack-attack99 · 10 Dec 2010 at 01:01

which one?

So are u guys starting to lean towards the 6970 having 1536 shaders?
If so do you think that ravenxxx2 is full of bull honky???? :confused:

seena · 10 Dec 2010 at 03:53

Gibbo said:
Hi there

All I can say is they are bloody fast.

For the money I am amazed.

I can see this launch being hugely sucessul for ATI, easily on par with 58xx series but better because I literally have enough stock to prevent running out which will also mean our pricing will be excellent and I should be able to keep the price low low, no price hikes.

r u sayin that for the price of 496 Euro's its Bloody fast?

if it is so, then its gonna Whoop gtx 580"s ***

straxusii · 10 Dec 2010 at 05:33

I'll be disappointed if it isnt the 6950 that has ~1500 shaders

magicroundabout · 10 Dec 2010 at 07:32

mack-attack99 said:
Well I have to say that I am somewhat dissapointed after reading through the whole "rollercoaster" of which one will be faster, (ravenxxx2 im looking at you). Unfortunatly I think that once again amd/ati will have to play second fiddle to Nvidia.....I mean its not that i dont appreciate the whole perfomance per watt thing but I really wanted amd to just put nvidia in its place once and for all, that being amd having the fast single gpu no if and buts about it......

but no one actually knows if that is true yet.
Gibbo's words are "they're bloody fast".

now is that compared to a 580 or an old 7900gt? i would imagine the 580.

at the end of the day if AMD/ATI match the 580 in performance and are over £100 cheaper then well done and a win for ATI

magicroundabout · 10 Dec 2010 at 07:34

mack-attack99 said:
So are u guys starting to lean towards the 6970 having 1536 shaders?
If so do you think that ravenxxx2 is full of bull honky????

i don't think they really know. it's all guess work at the moment on how they're calculated properly as certain calculations show 1920 whilst others 1536.

GoodStuff · 10 Dec 2010 at 08:27

magicroundabout said:
it's all guess work at the moment on how they're calculated properly as certain calculations show 1920 whilst others 1536.

I don't think 1536 stream processors would make sense. The leaked slides that were photographed (click here) say that the Radeon HD 6970 has at least 21 SIMD engines (greater than but not equal to 20 or >20) and doesn't each SIMD engine contain 80 stream processors due to the architecture? (eg: Radeon HD 6870 had 14 SIMD engines for 1120 stream processors, Radeon HD 5870 had 20 SIMD engines for 1600 stream processors, see they all have 80 stream processors for each SIMD engine). So according to AMD's own slides, the Radeon HD 6970 should have at least 21 SIMD * 80 stream processors per engine = 1680 stream processors total minimum.

1536 doesn't divide evenly by 80. The other guess floating around, 1920 stream processors, however is divisible by 80, meaning the HD 6970 would have 24 SIMD engines, which is 10 more than the 6870. That seems pretty reasonable.

There's also rumor that the Radeon HD 6990 has 3840 stream processors and that's a dual GPU card. 3840/2 = 1920 stream processors, which is what the 6970 is rumored to have.

But then maybe the Radeon HD 6970 initially had less stream processors (maybe 19 SIMD engines were only activated initially, totaling 1520 stream processors then). Perhaps, anticipating the Nvidia 580, they delayed the card and went to their backup plan which was to update the BIOS to activate more SIMD engines, thus more stream processors?

Duff-Man · 10 Dec 2010 at 08:48

GoodStuff said:
doesn't each SIMD engine contain 80 stream processors due to the architecture? (eg: Radeon HD 6870 had 14 SIMD engines for 1120 stream processors, Radeon HD 5870 had 20 SIMD engines for 1600 stream processors, see they all have 80 stream processors for each SIMD engine).

All previous architectures have taken a 4+1 ("5D") SP configuration. Cayman uses a "4D" SP configuration. If the same number of SP clusters are grouped into each SIMD (i.e. 16), then this would lead to 64 SP per SIMD, and a 1536-shader core would represent 24SIMDs.

But, we still don't know how the SP groups are arranged into SIMDs. All we know for sure so far is that there are 4 (rather than 5) SP per cluster.

But then maybe the Radeon HD 6970 initially had less stream processors (maybe 19 SIMD engines were only activated initially, totaling 1520 stream processors then). Perhaps, anticipating the Nvidia 580, they delayed the card and went to their backup plan which was to update the BIOS to activate more SIMD engines, thus more stream processors?

The delay was for around 4 weeks - not nearly enough time for a respin. Whatever the cause of the delay, it did not involve any kind of redesign of the silicon.

wolvers · 10 Dec 2010 at 09:01

I really hope that story about the stock clock being 880mhz is a load of bull.

GoodStuff · 10 Dec 2010 at 09:02

Duff-Man said:
All previous architectures have taken a 4+1 ("5D") SP configuration. Cayman uses a "4D" SP configuration. If the number of SP clusters are grouped into each SIMD, this would lead to 64 SP per SIMD, and a 1536-shader core would represent 24SIMDs.

Ah, ok. That probably might explain why there are two different numbers floating around and all the confusion. It's funny that same number of SIMDS are used to derive the two different numbers.

So then even though the 6970 might have less stream processors, it's probably safe to assume that these stream processors are more powerful than those used previously in earlier Radeons?

Duff-Man said:
The delay was for around 4 weeks - not nearly enough time for a respin. Whatever the cause of the delay, it did not involve any kind of redesign of the silicon.

What I was trying to say was that maybe that AMD did have extra hardware on the silicon, it just wasn't usable initially. For example, there are AMD Phenom II X4 processors that have an extra 2 cores on the silicon. If you had a motherboard with a BIOS that supported unlocking, it would activate an extra 2 cores, giving you a 6 core processor for a price of a 4 core. Now there's various reasons why there disabled, so they could sell them as 4 core or maybe the cores are defective and need to be deactivated. In 6970's case, maybe AMD threw in extra SIMD engines just in case something like the NVidia 580 GTX would come out and all you would probably need is a BIOS update to the graphics cards to let the driver know that there are extra cores. Just my guess.

But if AMD has using a new type of stream processors on these cards, they're probably faster than previous gens.

Duff-Man · 10 Dec 2010 at 09:10

GoodStuff said:
So then even though the 6970 might have less stream processors, it's probably safe to assume that these stream processors are more powerful than those used previously in earlier Radeons?

AMD is claiming that these 4SP clusters offer "similar performance to" the old 5D clusters, yet take up 10% less die area.

What I was trying to say was that maybe that AMD did have extra hardware on the silicon, it just wasn't usable initially. For example, there are AMD Phenom II X4 processors that have an extra 2 cores on the silicon. If you had a motherboard with a BIOS that supported unlocking, it would activate an extra 2 cores, giving you a 6 core processor for a price of a 4 core. Now there's various reasons why there disabled, so they could sell them as 4 core or maybe the cores are defective and need to be deactivated. In 6970's case, maybe AMD threw in extra SIMD engines just in case something like the NVidia 580 GTX would come out and all you would probably need is a BIOS update to the graphics cards to let the driver know that there are extra cores. Just my guess.

As you increase the size of a design, the cost of manufacturing the GPU increases (die size increases, so less GPUs from a wafer). You also reduce your expected yield (percentage of chips experiencing a manufacturing fault). There is no benefit to producing a design with "room to spare" that you do not plan to use.

It's true that certain chips will often disable certain clusters, due to manufacturing faults, or due to the need to cover an entire product space (the "spare cores" CPU example you give, or derivative GPU products like the GTX470 / 570, or the 5850/6850), but there is no benefit to holding back from your full chip. Only if yields are so bad as to make releasing a "full-fat" product unrealistic would you consider disabling parts of the full chip (this happened with the GTX480, but I don't know of any other time that it has occurred). If AMD were in this situation (which I highly doubt) then things would be looking extremely bad for them.

linaaslt · 10 Dec 2010 at 09:40

a bit more rumors... i've found somw interasting info on one lithuanian IT forum, sites admin claims that he has 6970 and in discusion about 3dmark 11 scores he says that he scored P6759, although his cpu is only amd [email protected]. one guy asked him for a links to prove his result, he answered that he can't show anymore than that. i was asking him if he's not afraid if his info will spread round internet, he looks kinda confident saying that there's loads of leaked info on internet allready, and that noeone would believe what one guy says. i need to agree with that. and yeah he's preparing review of 6970 for the next week.
so it's for you to believe or not to believe. here's the link http://technews.lt/read/1/start-bendros_diskusijos-3dmark_11_rezultatai.html

555BUK · 10 Dec 2010 at 09:46

wolvers69 said:
I really hope that story about the stock clock being 880mhz is a load of bull.

Agreed. Cayman will be a pretty large 40nm die and may struggle to overclock past 1000MHz. If the 6970 only ends up being ~10% faster than Fermi at stock speeds, Fermi may well be the faster card when both are overclocked. Likewise, if 6950 is crippled too much, it may not compete with the GTX570, even at stock speeds. Maybe this is why the 6950 allocation was cut?

edit:
Prediction number #7431 - 6970 will have 1920SP's (64x30clusters) and will be slightly faster than GTX580.
Prediction number #7432 - 6950 will have 1536SP's (64x24clusters) and will be slightly slower than GTX570.

On Monday shall see.

Greebo · 10 Dec 2010 at 10:47

linaaslt said:
a bit more rumors... i've found somw interasting info on one lithuanian IT forum, sites admin claims that he has 6970 and in discusion about 3dmark 11 scores he says that he scored P6759, although his cpu is only amd [email protected]. ]

If true, that is a veryt impressive score and puts it faster than a gtx580. 4.2Ghz i7 get 6000 with a gtx580. That would put the 6970 about 20% faster than a gtx580 in 3dmark11 at least.