Possible delay to Cayman

TheRealDeal · 8 Nov 2010 at 22:18

Broken Hope said:
This is more than likely a load of bull. Come on AMD have much more experience with 40nm than Nvidia, packing in more transistors per area, they didn't suddenly forget how to make a workable design for Cayman, plus it's very convenient the timing of this rumour.

This is most likely the truth the rumour comes out 1-2 days before nvidia release the gtx580 how good is that for nvidia yet amd say nothing. For me amd are being pretty silent which for me means they are confident. Look at nvidia last year 6 months before release of fermi or ati with the 2900 series all you heard was how great it was gonna be both companies giving as much good signs as possible even the card with wooden screws which was a new low.

The point being if there is trouble they usually try some kind of cover up amd don't seem to be doing this.

rounddodger · 8 Nov 2010 at 22:24

@Duff-Man - Quality post. Interesting stuff.

Cleeecooo · 8 Nov 2010 at 22:45

obviously AMD will have tested this and also learnt from nVidia.

Cleeecooo · 8 Nov 2010 at 22:46

no company would be stupid enough to release something like this when there was a similar problem with their rivals' chip. It doesn't make much sense

Duff-Man · 8 Nov 2010 at 22:49

rounddodger said:
@Duff-Man - Quality post. Interesting stuff.

Thanks...

At least now I can feel that I did something useful while my pizza was cooking

Duff-Man · 9 Nov 2010 at 14:18

Well it seems like Toms Hardware are predicting delays as well... Still nothing concrete, but certainly it's coming from a more reliable source than Fudzilla:

http://www.tomshardware.co.uk/geforce-gtx-580-gf110-geforce-gtx-480,review-32043-16.html

tomshardware said:
Back when the Radeon HD 6870/6850 launched, we were given a date for the Cayman debut. As we inch closer to it and nobody anywhere knows anything about it (board partners haven't seen cards, system builders are still in the dark), it begins to look like there may be delays. From what I've heard, AMD won't even be briefing its partners until after that original embargo date passes.

We can only hope the delays aren't too severe. It would be great to see Cayman before the end of the year.

Baboonanza · 9 Nov 2010 at 14:29

Duff-Man said:
Well it seems like Toms Hardware are predicting delays as well... Still nothing concrete, but certainly it's coming from a more reliable source than Fudzilla:

http://www.tomshardware.co.uk/geforce-gtx-580-gf110-geforce-gtx-480,review-32043-16.html

We can only hope the delays aren't too severe. It would be great to see Cayman before the end of the year.

That seems directly contradictory to everyone else saying that AiBs have cards though. As noted there have even been pictures.

Duff-Man · 9 Nov 2010 at 14:31

Baboonanza said:
That seems directly contradictory to everyone else saying that AiBs have cards though. As noted there have even been pictures.

Well, fingers crossed that's the case

I guess we'll find out soon enough though...

Is it the 22nd of Nov that the release is scheduled?

Skyrocket · 9 Nov 2010 at 14:33

AMD just needs to show us some indication of what the card is capable of.
One bechmark would do it. The silence from their camp is deadening.
Till then I will keep my £379 580 gtx running.

Voltage · 9 Nov 2010 at 14:37

yes the 22nd, 2.5 weeks to hold on to my £400 that Nvidia seem to desperate to get off me

Mr Paul · 9 Nov 2010 at 14:47

According to Micromart, Cayman will have a 384 bit memory bus, this is probably true as the 5870 was bandwidth limited IMHO.

Skyrocket · 9 Nov 2010 at 14:49

^^^ no it wont because it will have 2GB ram.

drunkenmaster · 9 Nov 2010 at 15:15

Duff-Man said:
Chill out - I wasn't insulting your beloved AMD, I was just explaining why increasing the number of transistors neccesarily leads to a larger die size

A more modular design will always come with a related cost in transistors, since an increased amount of control logic is required to connect the various modular units efficiently (I'll explain this in more depth shortly). The reason for taking a more modular approach is, as always, to improve scalability (e.g. to allow you to increase the number of SPs while maintaining as close to a linear performance increase as you can). This idea of increased modularity improving scalability is a general data-flow efficiency concept that reaches far outside the design of semiconductors.

---------

Anyway, getting back to why a more modular design costs extra transistors: Cypress and Barts both have two "RPEs" (render and processing engines), which are a modular block of SIMDs, TMUs, local cache and data-share logic (see pic below). Each of these has its own dispatch processor and modular cache. To link the two RPEs, a "global data share" is required, which like anything else on the die, costs transistors. My understanding is that Cayman has three RPEs (up from two on Barts and Cypress). To link the three blocks it is neccesary to at least increase the amount of "global data share" logic by a factor of two (one data share to link block1 to block2, and one to link block2 to block3). So, you're seeing a 50% increase in the number of RPEs for a 100% increase in the number of transistors that connect them.

Secondly: Consider that, in comparison to Cypress, each SPU now only has 4 SPs instead of 5. Each SPU has associated with it some control logic to link it to the rest of the SIMD core. So, in Cypress you have five SPs for every chunk of control logic, whereas in Cayman you have four. So, for a fixed total number of SPs (say 1600) you have more pieces of local (intra-SIMD) control logic in the Cayman design than in Cypress (400 instead of 320).

[Barts core]

I'm not going to argue rumour-by-rumour, they change every day. As I understand it, Cayman has (natively) 1680 SPs, 96TMUs, and 48ROPs, arranged into three RPEs (with 560SP/32TMU/16ROP in each RPE - just like Barts). Perhaps if the rumours of yield issues are correct then we could see some of the SIMD cores disabled to account for manufacturing errors (which could account for the 1536 number), but deactivating these clusters would not reduce the overall die size.

I said that there is a strict limit on the number of TRANSISTORS you can fit in a given area on a given process, not "shaders". But there is no dramatic change in the architecture from Cypress to Caymann, and so no reason that transistor density will have improved dramatically either. It's also very reasonable to assume that the transistor density will continue to be better than Nvidia's GTX580 (since their architecture has not changed dramatically either).

Your opinion of "what should be easy a year later" is utterly irrelevant. These considerations should be made based on the semiconductor physics, and the logic of GPU design, not on your personal perception of how GPUs have improved historically.

The only problem with this post is, none of its relevant, and thats a real shame, because it was really quite long.

You're ignoring so many key factors as to make everything you said irrelevant.

lets take your analogy for example, 100 soliders, 10 groups of 10, on commanding officer, thats great.

But in this situation we have 1600 soldiers, and not 1600 equal soldiers, but two VERY different types of soldiers in each group. Which would mean, lets say each group is 4 plain old riflemen and one such fantastically well equiped elite force unit who carrys a mortar, a RPG, a gun, some explosives, etc, etc, you need one guy to control the 4 basic guys and tell them what to do, and you need another guy whose able to properly direct the far more advanced unit how to do things.

Now, instead of just increasing the amount of groups by dividing 1600 by 4 instead of 5, you've also taken out the requirement for the ultra complex unit, and the complexity of telling the different guy how to perform. Unfrtunately what you're supposing is 1600/5 clusters of shaders + the core logic to control them would use less transistors than 1600/4 clusters of shaders + the core logic to control them, if that core logic was the same and used the same amount of transistors, sure. Unfortunately thats the part you got wrong, the core logic won't be the same, much of the reason to move to 4 identical shaders rather than 4 + 1 VERY different shader is the simplication at EVERY stage of controlling those shaders. The schedualler, the dispatcher, everything can be made more streamlined with one type of shader to control rather than two completely different shaders. Which means a 4way shader + all its core logic probably uses the same or even less transistors than a 5way shader and more complex core logic at every stage of the pipeline, inside and outside the RPE.

Each 4 way shader is smaller than a 5 way shader, and the core logic to control and balance the workflow, and schedual the work is FAR more simplified as you're no longer waiting on the much more complex shader to do something far more slowly some of the time.

So you've increased the number of clusters, but reduced the complexity. You said an increasingly modular design WOULD use more transistors, I said it didn't have to, not it couldn't, but its incorrect to say it HAS to, which is infact completely incorrect.

AS fot the 1680 shader rumour, there is no such rumour, 1536 is the ONLY rumour around, well that and 1920 shaders.

You simply posted that it was using more transistors, and would be much bigger, etc, etc when infact theres no rumours to back that up at all, unless you count Fud.

Increasing transistor would almost always mean a larger die size on the same process, again, you didn't say this, you said it did have more transistors, thats not a fact. Again what if they've reduced the die size, the 6870 in several situations outperforms a 2.15billion transistor 336mm2 core, with its 1.7billion transistor 255mm2 core. Which by the way, shows a very slight increase in transistor density in doing so.

Whose to say they won't make a 2.15billion transistor core at 330mm2, thats 35% faster, or a 2billion transistor core thats 28% faster at 300mm2, no one basically.

You stated rumours no ones heard, as fact, and other things that are often true, but aren't without question fact.

AS for RPE's, Cayman being 3 "RPE" is incredibly unlikely, likewise, you have entirely nothing to suggest that a global data share, needs doubling with one more RPE, likewise the RPE global data share is in NO WAY the only thing connecting the RPE's to the rest of the core, there are MANY more connections besides the global data share so doubling the global data share would in no way increase the amount of core logic connecting the RPE's by 100%, these are all things that aren't factual, some are guesses, some are possibilities and some are flat out wrong.

Cayman isn't a big architectural change, again, utter rubbish, a new front end, a new type of shader........ those are the two BIGGEST things in the entire architecture that will be completely different, Cayman is set to be all but a completely different architecture.

You also talked about transistor density like I said the word shaders, anywhere, when I talked about transistor density, I didn't, and by the fact that the 6870 shows a higher transistor density, and the fact that AMD have a significant lead on Nvidia on the area, the simple fact is you stated it in a matter of fact way that basically transistor density was a constant that was unavoidable, its not, hence me stating that, why you brought shaders up I don't know.

While I agree with the fact that there will likely be more core logic in a more modular design, that doesn't mean a more modular, but simplified design will use more transistors, and again fundamentally thats what you claimed. More modular, or the same design with more shaders = more transistors, sure, thats not what you said.

Personally it should be bigger, but not "that" much bigger at all, and transistor count my bet would be around the 2.6billion mark, with a marginally increased transistor density but not much.

But please read what I said, and what you initially said, you made several claims about "dramatic increase in rops", and various other things, and used that as a reason to claim several other things. We have NO CLUE how many rops it has, we have no clue if the rops will be in the same place, the same size, or if they haven't doubled the performance of each rop and got the same amount. 3RPE's makes zero sense, theres a reason GF100 didn't have 15 clusters in the design but 16, and why almost every GPu design I've seen trys to remain symmetrical, a 3RPE design would be, completely and utterly impracticle. Its possible, theres no reason it couldn't work, but for not least just pure timing control cores are generally design symmetrically to keep everything equidistance from each other, otherwise you have one RPE on the opposite side of the core to the ROPS and, etc, etc. The smallest Fermi chip is a 96 shader cluster design TWO of their "rpe's" despite the desparate need for a smaller core, because 48shaders just doesn't work, they now have a 48 shader 420GT but its a 96 shader part with one cluster disabled and you can be certain they don't want to disable 40% of a core to sell a part in a price bracket, its economically worthless.

Remember the reason for the thread, Fud claiming yields are in the tank because, losely implied from having read their other BS articles, that its a HUGE core, which it simply won't be.

As for what my opinion on what should be easy a year later, again you're talking out of your behind. Nvidia found a 512sp core impossible to make a year ago, and now, they managed to make one, with okay yields(no idea what they are, but its moved from non releaseable, to releaseable) a year later on the same process thats improved over time.

Every single process, ever, in the history of the universe, has had higher yields towards the end of its life than the beginning, and almost every company whose ever had chips built has had no problems making a bigger core a year later than they managed fine a year earlier. Its not my wish or opinion, its solid fact based on TSMC's results, and Intels, and AMD's/GloFo's over the past decade.

drunkenmaster · 9 Nov 2010 at 15:31

Skyrocket said:
AMD just needs to show us some indication of what the card is capable of.
One bechmark would do it. The silence from their camp is deadening.
Till then I will keep my £379 580 gtx running.

The 5970 is rarely far behind the 580gtx, its 30-40% ahead in some situations, it costs £440, a 6990 is going to smash a 5970, and shouldn't cost much more, how much more do you need to know?

Serial45 · 9 Nov 2010 at 15:40

Voltage said:
yes the 22nd, 2.5 weeks to hold on to my £400 that Nvidia seem to desperate to get off me

aye i'm tempted by Nvidia. But bah, want the 6970 or better yet 6990.

Skyrocket · 9 Nov 2010 at 15:48

drunkenmaster said:
The 5970 is rarely far behind the 580gtx, its 30-40% ahead in some situations, it costs £440, a 6990 is going to smash a 5970, and shouldn't cost much more, how much more do you need to know?

Dude. Were talking about 6970, here.
AMD should have brought something like sneak peak just before 580 gtx launched. They had 1year and 2 months ffs, instead of concentrating on midrange 6870...

RavenXXX2 · 9 Nov 2010 at 15:50

580 is a beast, out doing the 5970 in an AMD endorsed DX11 title.

Skyrocket · 9 Nov 2010 at 15:53

RavenXXX2 said:
580 is a beast, out doing the 5970 in an AMD endorsed DX11 title.

5970 is the quicker card. But try and find one for £ 370. Ain't happening.

=VTA=MANFACE · 9 Nov 2010 at 15:56

My 5970 is a beast, and I love it.

Lightnix · 9 Nov 2010 at 16:16

But in this situation we have 1600 soldiers, and not 1600 equal soldiers, but two VERY different types of soldiers in each group. Which would mean, lets say each group is 4 plain old riflemen and one such fantastically well equiped elite force unit who carrys a mortar, a RPG, a gun, some explosives, etc, etc, you need one guy to control the 4 basic guys and tell them what to do, and you need another guy whose able to properly direct the far more advanced unit how to do things.

Also the elite guy doesn't like doing anything whilst the other four are doing something, he thinks it cramps his style. If there were four moderately well equipped guys, who knows what they could accomplish?