Possible delay to Cayman

Jibber55 · 8 Nov 2010 at 11:11

As reported on FUD:
http://www.fudzilla.com/graphics/item/20770-cayman-reportedly-plagued-by-manufacturing-issues

As we reported on Friday, AMD's upcoming Cayman appears destined for a delay or a paper launch, as some partners have still not received the boards, or final bios versions.

According to TechEye, the delay was caused by poor yields, which is supposedly in the single digit range. This is somewhat surprising, as the 40nm process is rather mature. Nonetheless, building a 3 billion transistor chip was never going to be easy, a fact clearly demonstrated Nvidia's GF100.

Taiwanese sources told TechEye that there was only a slight chance of AMD keeping its schedule and shipment targets in Q4. Cayman was originally scheduled for a November 22 launch, just in time for the holiday season. We can only hope that AMD and TSMC will manage to work out the kinks plaguing the new GPU in time to start volume shipments in early December.

We'll see what happens on the 22nd I guess......

GJ02 · 8 Nov 2010 at 11:17

580 is looking more and more interesting to me

Voltage · 8 Nov 2010 at 11:22

old news and denied by ati last week

RavenXXX2 · 8 Nov 2010 at 11:23

Old news updated just today, where's this denial by AMD at?

GJ02 · 8 Nov 2010 at 11:24

RavenXXX2 said:
Old news updated just today, where's this denial by AMD at?

this

Voltage · 8 Nov 2010 at 11:27

meh, its a recycled version of the same story on Friday, and the Friday one was denied some where (twas posted in another thread)

mmj_uk · 8 Nov 2010 at 11:33

Voltage said:
old news and denied by ati last week

It can't possibly be true then? lol :confused:

NVidia have had to sell their 512 core GF100 as only 480 cores maximum due to poor yields, if Cayman is indeed 3 billion transistors the same as GF100 then why would yields be any better for ATI? it's only now with an improved design that NVidia are contemplating 512 core cards.

Jibber55 · 8 Nov 2010 at 11:35

It could be complete BS and it could be spot on..... I haven't seen the old news from Friday.

SirCanealot · 8 Nov 2010 at 11:54

Wait a sec, are ATI making a 3 billion transistor card? Eg, as big as a GTX 480? 0_o

Jibber55 · 8 Nov 2010 at 12:03

SirCanealot said:
Wait a sec, are ATI making a 3 billion transistor card? Eg, as big as a GTX 480? 0_o

Where have you been?

Marine-RX179 · 8 Nov 2010 at 12:05

Jibber55 said:
Where have you been?

Paris with his GF

Martini1991 · 8 Nov 2010 at 12:23

They've cut down on die size.. Erm.. The 6870's smaller than the 5850, although a lower shader count.. Why on Earth would the 6970 be massive?

drunkenmaster · 8 Nov 2010 at 13:01

Well Fud's the only one saying its 3billion transistors, and likewise, the same time last year he was saying 3billion transistors is easy for TSMC and that Nvidia confirmed they weren't having very poor yields with GF100, despite the fact that we still haven't seen a 512sp GF100, and now never will.

He also, remember, last September through January was saying Nvidia cards would be out, in October, then November, then before Xmas, then for CES, etc, etc.

He lied, through his teeth on a daily basis for 6 months about Fermi, now all of a sudden a year later AMD can't make 6970 because GF100 was a 3 billion transistor chip?

Fud = the source of the worst rumours on the internet, he's never once afaik posted anything accurate before anyone else and posts the least accurate information of all the websites I've seen, BSN are more accurate, wikileaks is more accurate.

A 3billion transistor chip from AMD would be around 420mm2 at a guess, as AMD's 2.15 billion 5970 is over 10% smaller than Nvidia's 1.95billion transistor 460gtx, IE, they've got higher transistor density than Nvidia's more efficient Fermi, IE 10% more transistors, 10% less space, about 20% higher transistor density than GF100, so theres entirely no reason at all an AMD 3billion transistor chip would be anywhere near the same 530mm2 size.

Almost everyone is saying 6970 is sub 400mm2, some by quite a lot, I'd be surprised if it was over 2.6million transistors, 2.8million tops and sub 400mm2 shouldn't have any problems.

We've seen cards out in the wild, we never saw Fermi's out in the wild while Fermi had its yield problems, not a single Fermi was seen(except that tray of burnt ones

). We've seen two, from people who aren't worried about posting pictures, IE they've sent enough out to AIB's/specific high up people that they've also sent a few out to other people lower down, meaning theres quite a lot out there, meaning single digit yields is simply not true.

Duff-Man · 8 Nov 2010 at 13:10

Martini1991 said:
They've cut down on die size.. Erm.. The 6870's smaller than the 5850, although a lower shader count.. Why on Earth would the 6970 be massive?

Why? Because it's on the same 40nm process as the 5-series, has an increased shader count, dramatically increased number of ROPs and TMUs, as well as more a more modular design and more interconnectivity. All this takes transistors, and there is a very strict limit to the number of transistors you can place in a given area on a given process size.

The die will necessarily be bigger than that of the 5870. By how much remains to be seen, and will depend on the exact specs. If it's true that it's close to 3Bn transistors, then it will be approaching the size of the Fermi die, but we don't yet know the final transistor count.

SirCanealot · 8 Nov 2010 at 15:06

Marine-RX179 said:
Paris with his GF

Managed to browse the forums a little on my phone through the free wifi in the hotel, but it's not so great on the 320x240 screen my phone has

I'm really wondering what the 6970 is going to do if they've pushed up the transistor count that much too! 0_o

I can't wait

Martini1991 · 8 Nov 2010 at 16:08

Duff-Man said:
Why? Because it's on the same 40nm process as the 5-series, has an increased shader count, dramatically increased number of ROPs and TMUs, as well as more a more modular design and more interconnectivity. All this takes transistors, and there is a very strict limit to the number of transistors you can place in a given area on a given process size.

The die will necessarily be bigger than that of the 5870. By how much remains to be seen, and will depend on the exact specs. If it's true that it's close to 3Bn transistors, then it will be approaching the size of the Fermi die, but we don't yet know the final transistor count.

I get it being bigger than the 5870, that's a given.
But it's a more efficient core now with the changes they've made.
There's no reason for it to be THAT big.

drunkenmaster · 8 Nov 2010 at 16:49

Duff-Man said:
Why? Because it's on the same 40nm process as the 5-series, has an increased shader count, dramatically increased number of ROPs and TMUs, as well as more a more modular design and more interconnectivity. All this takes transistors, and there is a very strict limit to the number of transistors you can place in a given area on a given process size.

The die will necessarily be bigger than that of the 5870. By how much remains to be seen, and will depend on the exact specs. If it's true that it's close to 3Bn transistors, then it will be approaching the size of the Fermi die, but we don't yet know the final transistor count.

Would you care to prove anything you just said, its got a more modular design and that costs transistors, says who, you?

Its got more transistors and is bigger than the 5870, because you say so, its got more shaders, again according to you, the biggest running rumour right now is it has 1536 shaders, does that somehow count as more than 1600 in your world?

You can only fit a given number of shaders into an area on a process, wow, care to explain how AMD already on the 5870 had more than 10% more transistors in a die size over 10% smaller than Nvidia?

Dramatically increase Rops and TMU's, again, care to share your links to confirmation of these specs?

It will necessarily be bigger, because a similar speed 6870 to a 5870 was 25% or so smaller, give or take, yet not 25% slower, so its perfectly possible they make something the exact same size and 25% faster.

Either way, the simple fact is that at 530mm2 Nvidia eventually managed with no care for the process, to get parts out with 480shaders in high enough quantities for people to buy them when required. I think personally the 530mm2 is what I'd call "massive" for the 40nm process, if ANY cores can be made at 530mm2, it should be almost easy a year later to make something 450mm2, and every single rumour suggests a core way under 400mm2, thats by no means "massive".

Martini1991 · 8 Nov 2010 at 16:57

My "being bigger" statement was at the previous 1860 shaders or whatever rumour.

Duff-Man · 8 Nov 2010 at 19:16

Chill out - I wasn't insulting your beloved AMD, I was just explaining why increasing the number of transistors neccesarily leads to a larger die size :rolleyes:

drunkenmaster said:
its got a more modular design and that costs transistors, says who, you?

A more modular design will always come with a related cost in transistors, since an increased amount of control logic is required to connect the various modular units efficiently (I'll explain this in more depth shortly). The reason for taking a more modular approach is, as always, to improve scalability (e.g. to allow you to increase the number of SPs while maintaining as close to a linear performance increase as you can). This idea of increased modularity improving scalability is a general data-flow efficiency concept that reaches far outside the design of semiconductors.

-- aside--

To give a nice simple example of data flow and modularity in action, consider a military heirachy: If you have only a small "army" of 100 fighting men, you can divide your troops into squads of 10 men, and attach a single officer to each group, who answers to a single overall commander. However, if you have an army of ten thousand and you try the same approach, you have 100 squads of 100 men, and a commander who must control 100 officers. By doing this you introduce a massive inefficiency (one commander cannot control 100 men as efficiently as he can control ten), and the whole army grinds to a halt. To get around this you introduce additional levels of heirachy; you maintain your squad-size of ten, with a single officer, and have each of the 1000 officers report to one of 100 "captains", who then report to one of ten "colonels", who then report to a single general. Now, you have maintained the efficiency in that each commander has only to pass orders down through ten men, but you have introduced an overhead in terms of extra men (captains and colonels) who do not add directly to your fighting force. Extending the analogy to GPU design, the SPs, ROPs and TMUs are the 'fighting men', while the cache, interconnects and other control logic are the officers of various types. When you increase the size of your army (total processing capacity of the GPU), you need more officers (control logic).

Anyway, this is a general concept of data-flow that is repeated all over the place... Those of you with experience using object-oriented programming languages will be familiar with this concept in a different way: The increased modularity of OO languages (in comparison to sequential languages) generally comes with a slight overhead in terms of runtime efficiency, but allows much larger programs to be written without the code becoming intractable (as it quickly does with sequential languages).

---------

Anyway, getting back to why a more modular design costs extra transistors: Cypress and Barts both have two "RPEs" (render and processing engines), which are a modular block of SIMDs, TMUs, local cache and data-share logic (see pic below). Each of these has its own dispatch processor and modular cache. To link the two RPEs, a "global data share" is required, which like anything else on the die, costs transistors. My understanding is that Cayman has three RPEs (up from two on Barts and Cypress). To link the three blocks it is neccesary to at least increase the amount of "global data share" logic by a factor of two (one data share to link block1 to block2, and one to link block2 to block3). So, you're seeing a 50% increase in the number of RPEs for a 100% increase in the number of transistors that connect them.

Secondly: Consider that, in comparison to Cypress, each SPU now only has 4 SPs instead of 5. Each SPU has associated with it some control logic to link it to the rest of the SIMD core. So, in Cypress you have five SPs for every chunk of control logic, whereas in Cayman you have four. So, for a fixed total number of SPs (say 1600) you have more pieces of local (intra-SIMD) control logic in the Cayman design than in Cypress (400 instead of 320).

[Barts core]

Its got more transistors and is bigger than the 5870, because you say so, its got more shaders, again according to you, the biggest running rumour right now is

it has 1536 shaders, does that somehow count as more than 1600 in your world?

I'm not going to argue rumour-by-rumour, they change every day. As I understand it, Cayman has (natively) 1680 SPs, 96TMUs, and 48ROPs, arranged into three RPEs (with 560SP/32TMU/16ROP in each RPE - just like Barts). Perhaps if the rumours of yield issues are correct then we could see some of the SIMD cores disabled to account for manufacturing errors (which could account for the 1536 number), but deactivating these clusters would not reduce the overall die size.

You can only fit a given number of shaders into an area on a process, wow, care to explain how AMD already on the 5870 had more than 10% more transistors in a die size over 10% smaller than Nvidia?

I said that there is a strict limit on the number of TRANSISTORS you can fit in a given area on a given process, not "shaders". This limit is determined largely by the physical size of the transistors, but also by the spacing that is required between them to stop bleed-through of current. The "shader-powerhouse" design AMD implements allows them to pack the transistors slightly closer together (on average) than the "heavy encapsulation" approach that Nvidia takes. But there is no dramatic change in the architecture from Cypress to Caymann, and so no reason that transistor density will have improved dramatically either. It's also very reasonable to assume that the transistor density will continue to be better than Nvidia's GTX580 (since their architecture has not changed dramatically either).

Either way, the simple fact is that at 530mm2 Nvidia eventually managed with no care for the process, to get parts out with 480shaders in high enough quantities for people to buy them when required. I think personally the 530mm2 is what I'd call "massive" for the 40nm process, if ANY cores can be made at 530mm2, it should be

almost easy a year later to make something 450mm2, and every single rumour suggests a core way under 400mm2, thats by no means "massive".

Your opinion of "what should be easy a year later" is utterly irrelevant. These considerations should be made based on the semiconductor physics, and the logic of GPU design, not on your personal perception of how GPUs have improved historically.

Far longer post than I intended - nevermind

Broken Hope · 8 Nov 2010 at 22:01

This is more than likely a load of bull. Come on AMD have much more experience with 40nm than Nvidia, packing in more transistors per area, they didn't suddenly forget how to make a workable design for Cayman, plus it's very convenient the timing of this rumour.