ATI cuts 6950 allocation

555BUK · 12 Dec 2010 at 00:32

TropicLightning said:
Remember, these benches are on Pre-December 1 'Test' drivers.

But Cayman was ready for release in November until AMD postponed it at the last minute, so these drivers should be pretty much complete.

Klo · 12 Dec 2010 at 00:40

555BUK said:
But Cayman was ready for release in November until AMD postponed it at the last minute, so these drivers should be pretty much complete.

Maybe drivers are the reason they delayed it??

555BUK · 12 Dec 2010 at 00:45

Klo said:
Maybe drivers are the reason they delayed it??

They spend billions designing, manufacturing and testing a new GPU. AMD finally begins distributing them and then someone puts his hand up and says "errr, excuse me, did anyone remember the driver?". That sounds too much like the MoD to me.

easyrider · 12 Dec 2010 at 01:29

So the idea was the 6970 was going to be faster than a 580 by 40% was indeed Like I predicted FUD.

Duff-Man · 12 Dec 2010 at 01:58

drunkenmaster said:
Predictable, angry, anti-Nvidia rant

Lets go back over just a few of the changes from GT200 to GF100.

1. Re-arrangement of the "front-end". To quote Anandtech:

Prior to GF100 NVIDIA had a large unified front end that handled all thread scheduling for the chip, setup, rasterization and z-culling. ... In GF100, the majority of that unified front end is chopped up and moved further down the pipeline. With the exception of the thread scheduling engine, everything else decreases in size, increases in quantity and moves down closer to the execution hardware

So, greatly increased modularity in every aspect of the front-end pipeline. Great for scalability (reducing the load on individual processing components), but not neccesarily great for transistor-use efficiency (significantly more components required). Nevertheless, a natural progression for a design which must be reusable for several generations to come.

A byproduct of moving the scheduler down to the SM level is that threads from multiple kernels can be executed in parallel. The GF200 design required that all SMs operate on the same kernel, whereas Fermi allows multiple kernels to be executed at once. This isn't something that is particularly useful in gaming, but is a godsend when it comes to writing efficient GPGPU code (ans is the result of a fairly significant architectural change).

2. Cache structure:

GT200 has a small cache associated with each SM (16kb), which allows the individual "cuda cores" (or "streaming processors" as they were called back then) within an SM to communicate efficiently. Intra-SM communication, where required, must be performed through the GPU VRAM.

In addition to the small L1 cache within each SM (increased in size to 64Kb by the way...), Fermi introduced a global L2 cache to allow rapid inter-SM and inter-GPC communication. This has very little benefit in games, but offers a great deal of extra flexibility in programming GPGPU applications. You can argue that this was not the best move as far as gaming performance is concerned, but it was a major architectural change nonetheless.

3. Increased global modularity:

The GT200 design takes a "dual-heirachy" approach. Cores are arranged into "texture processing clusters" (TPCs), each with three blocks of 8 cores and a single block of texture processing units (8 units per TPC). There are ten of these in total.

The GF100 design adds in another level of heirachy: The "GPC". These divide the GPU into four distinct blocks (...you can see the physical distinction when looking at a photo of the GF100 core). Within each GPC lies four SMs, with 32 cores each (which is yet another change from GT200). The raster engine is associated with each GPC, rather than with the entire set of TPCs.

4. Different shader core arrangement:

The GF100 design did away with the GT200's "emulated 32-bit integer multiplies", including instead Fused Multiply Add capability. You can draw parallels between this change, and the switch to a "4D" shader arrangement in the Cayman architecture (removing redundant processing capabilities that are very rarely used).

5. Texture processing:

In GT200, there was a single texture processing unit associated with every TPC (three SMs). This was located at the backend of the pipeline, after the shader processing. The raster engine itself was a global object, the back-end of the chip consisted of a single raster engine.

In Fermi the raster engine is instead associated with the GPC. So, there are four semi-independent raster engines that communicate (where required) through the global L2 cache. Again, a large architectural change, and nothing to do with "drawing boxes in different places", or "marketing speak".

... I haven't even mentioned the "polymorph engine" yet, which groups vertex fetch, tessellation, and transform into small modular units that communicates with the SMs (one per SM), feeding geometric information through the pipeline at around 8 times the performance of GT200 (leading to Fermi's fabled tessellation performance). Again, a major departure from GT200, where vertex fetch was done globally, instead of within each of the 16 SMs where it can feed back directly to the cuda cores.

There are many more, but you get the point. Whether you like it or not, Fermi was a fairly dramatic architectural change from GT200. You can argue that it didn't take the design in the right direction. You can argue that it was too big, or that there were too many GPGPU-specific features implemented. You can argue that it should never have seen the light of day at 40nm. But to try and argue that it is "essentially the same design" as GT200 is kind of silly

I know you want to believe that Fermi was somehow a "simple" design, or a minor evolution, but that is simply not the case. You really can't compare the architectural changes that were made from GT200 -> GF100 with those that took place from Cypress -> Barts.

Also, it's not about "where you draw the boxes" on the diagram - that would be just ridiculously simplistic. It's about communication; which blocks are discretised, and in what way. Where they are placed in the pipeline. What units can communicate with what others, and where in the pipeline this communication takes place. How many stream processors feed into each other component.

While there has definitely been a strong evolution of the core AMD GPU design since the 2900xt (as well as interfacing features such as memory bus design), the fundamental pipeline process is not so dramatically changed. This is, in itself, testament to the long-term potiential of the pipeline design that went into the 2900xt. We will have to wait a few years to see whether the Fermi-type pipeline design can somehow eek out that kind of longevity (I doubt it...).

... I will discuss the potential impact of GPU design on yields at another time since this post is already far too long. Suffice to say, there is more to it than just die size (and Fermi does itself no favours by having a large global L2 cache

).

Rroff · 12 Dec 2010 at 02:10

Duff-Man said:
snip...

I'm not sure why theres any disagreement on this subject even 2 minutes on google and its very very evident that cayman is fundementally very similiar to R600 and that aside from the basic arithmetic operation of the "CUDA" cores theres little more than token similiarity in how Fermi goes about its job compared to G80.

mack-attack99 · 12 Dec 2010 at 02:20

from what iv been reading on other forums is there might be somthing with the drivers. What i mean is that only the new 10.12 driver will be able to use the new power function. any other driver will put the cards power function into safe mode.
Also, theres the whole thing with the new 4d spus, old drivers are not coded to use the new efficant spus thus the old drivers would run as if the 6970 was using old 5d shaders.

Last, think guys, if the 6970 is going for around $450 and the rumors that it cant even beat a $350 570???? Cmon. that make NO sense. That will not happen.

I still think that the cayman will be either tied or better than 580. When using the appropriate driver to use the power function properly.

Duff-Man · 12 Dec 2010 at 02:22

Rroff said:
I'm not sure why theres any disagreement on this subject even 2 minutes on google and its very very evident that cayman is fundementally very similiar to R600 and that aside from the basic arithmetic operation of the "CUDA" cores theres little more than token similiarity in how Fermi goes about its job compared to G80.

Who knows... I'm sure drunkenmaster will be along shortly to tell us all that GF100 was born when Jen-Hsun Huang took a sharpie and drew a few extra boxes around a GT200 diagram, with some side-notes saying "more megglehertz and biggar chip!"

Duff-Man · 12 Dec 2010 at 02:25

mack-attack99 said:
from what iv been reading on other forums is there might be somthing with the drivers. What i mean is that only the new 10.12 driver will be able to use the new power function. any other driver will put the cards power function into safe mode.

That's a very realistic possibility... I would have expected that such a thing would have been handled at BIOS level, but I suppose it could require compatible drivers in order to enable.

Mr Krugga · 12 Dec 2010 at 02:25

Duff-Man said:
Who knows... I'm sure drunkenmaster will be along shortly to tell us all that GF100 was born when Jen-Hsun Huang took a sharpie and drew a few extra boxes around a GT200 diagram, with some side-notes saying "more megglehertz and biggar chip!"

I'm not trying to be smart or anything but there are better uses for sharpies.

jigger · 12 Dec 2010 at 02:27

Duff-Man said:
Who knows... I'm sure drunkenmaster will be along shortly to tell us all that GF100 was born when Jen-Hsun Huang took his favorite Crayon and drew a few extra boxes around a GT200 diagram, with some side-notes saying "more megglehertz and biggar chip!"

Fixed

mack-attack99 · 12 Dec 2010 at 02:51

READ..........was ment for 15 of dec.
http://benchmarkreviews.com/index.php?option=com_content&task=view&id=13177&Itemid=8

hmmmm.... well, well that sure does put a damper on my hopes....

I just dont want to believe it but it seems like cayman may very well be a big flop...

So then the 6970 will be 300 to 350 dollars?? huh well i guess that would be a good deal.
and the 6950 would be .. what 280 to 300 dollars?????????
what the hell is going on!

I know nothing is offical but my hopes have gone from the highest just a few days ago that the 6970 will be 15-30% faster than 580 (ravenxxx2??) to now barely taking on the 570....

this sucks.

Rroff · 12 Dec 2010 at 02:54

If thats right... ouch!

mack-attack99 · 12 Dec 2010 at 02:58

heres somthing interesting from another forum
"I have seen that performance result also, it is true, but they forgot to set the slider to the middle, this will enable the turbo boost function and will move the power from 190W to 225W and therefore boost performance by another 20% providing a result of 1968 in futurmark11 just ahead of the 580 by 0.01%, When 590 will launch (or 680-780 whatever nv will call it) they have to move the slider to the 3th position which will have another boost towards 250W and again marginally faster then the nv card, but this will remain under nda untill nv releases such card, so don't tell anyone"
http://67.90.82.13/forums/showthread.php?p=4660630#post4660630

Mr Krugga · 12 Dec 2010 at 02:59

What's that article based upon? On December the 15th we will finally see proper reviews, this article is based on nothing but the tests that were posted on the German forum before, I assume.

Nothing to see here, move along.

Rroff · 12 Dec 2010 at 03:06

mack-attack99 said:
heres somthing interesting from another forum
"I have seen that performance result also, it is true, but they forgot to set the slider to the middle, this will enable the turbo boost function and will move the power from 190W to 225W and therefore boost performance by another 20% providing a result of 1968 in futurmark11 just ahead of the 580 by 0.01%, When 590 will launch (or 680-780 whatever nv will call it) they have to move the slider to the 3th position which will have another boost towards 250W and again marginally faster then the nv card, but this will remain under nda untill nv releases such card, so don't tell anyone"
http://67.90.82.13/forums/showthread.php?p=4660630#post4660630

It needs more than a 20% boost for the 6970 to shine tho :S even a 40% boost from the scores we've seen so far wouldn't exactly set the world on fire.

CAT-THE-FIFTH · 12 Dec 2010 at 03:07

mack-attack99 said:
READ..........was ment for 15 of dec.
http://benchmarkreviews.com/index.php?option=com_content&task=view&id=13177&Itemid=8

What a rubbish article.

Quote from the article:

"The AMD Radeon HD 6870 failed to reach top-level performance, and was relegated to fighting off an army of factory overclocked GeForce GTX 460's that sell at a better price point. "

The GTX460 has 334MM2 die size and is competing against the HD6850 and HD6870 which have a 255M2 die size. The HD6870 even competes against the GTX470 which has even a larger die size.

What a victory! :rolleyes:

The HD5770 and HD5750 have much smaller dies than the GTS450.

The HD6970 has a 389MM2 die which is much smaller than the 520MM2 GF110 found in the GTX570.

AMD and ATI has not held the fastest single GPU card slot for yonks and it does not seem to have adversely affected them TBH.

Duff-Man · 12 Dec 2010 at 03:09

From the article that mack-attack linked to:

Cayman GPU can offer up to 24 SIMD engines and 96 Texture Units

So 1536 SP in all likelihood...

Based on Catalyst driver 8.790.6.2000 (8.79.6.2 RC2), the Radeon HD 6970 delivers approximately the same performance as NVIDIA's GeForce GTX 570
...
Recently launched at the $350 price point, the GTX 570 and Radeon HD 6970 go back and forth between tests but at no point does the Radeon HD 6970 ever approach GeForce GTX 580 performance levels. According to AMD this won't occur until Q1 2011, when they unveil the Radeon HD 6990 X2 video card.

Sadpanda.

I suppose I knew it was coming if the card was 1536SP, but I still thought that it might trade blows with the GTX580 in one or two scenarios.

CAT-THE-FIFTH · 12 Dec 2010 at 03:15

Someone posted a criticism of the article in the comments section and they deleted it.

Mr Krugga · 12 Dec 2010 at 03:19

CAT-THE-FIFTH said:
Someone posted a criticism of the article in the comments section and they deleted it.

Seen that too. They're just trying to get some attention whilst the news of Radeon 6970s getting smacked by GTX580/570s are still hot.

It will change in a few days, who cares