Kepler by the end of the year

opethdisciple · 3 Aug 2011 at 11:28

Article here.

Duff-Man · 3 Aug 2011 at 11:36

Good news

From the double-precision power-efficiency claims, I'm guessing Kepler will have a larger ratio of DP to SP cores. As far as single precision goes, it would surprise me if they have improved by any more than a factor of 2x over the previous generation.

Cleeecooo · 3 Aug 2011 at 11:42

Duff-Man said:
Good news

From the double-precision power-efficiency claims, I'm guessing Kepler will have a larger ratio of DP to SP cores. As far as single precision goes, it would surprise me if they have improved by any more than a factor of 2x over the previous generation.

Would you mind saying the same thing in laymans terms please?

Thanks

Good news that kepler is coming 2012, atleast I get to keep my 580 for some time

Tute · 3 Aug 2011 at 11:46

If we get a GT610 type card on the 31st December, i'm going to laugh at all of you.

Seriously though, this is good news.

Cleeecooo · 3 Aug 2011 at 12:04

What's GT610?

Rroff · 3 Aug 2011 at 12:06

As I said nVidia still under the impression they can make a late November launch. (tho its increasingly looking like they actually are gonna be talking about a paper launch as other sources don't seem to indicate TSMC are gonna be able to produce).

KiiYzOo · 3 Aug 2011 at 12:06

I think he means the lowest end kepler card

drunkenmaster · 3 Aug 2011 at 12:09

Duff-Man said:
Good news

From the double-precision power-efficiency claims, I'm guessing Kepler will have a larger ratio of DP to SP cores. As far as single precision goes, it would surprise me if they have improved by any more than a factor of 2x over the previous generation.

dp/sp ratio is 1/2, its very unlikely they've increased that, its creative accounting and BS from Nvidia, nothing more or less.

Going from 1/2, in their architecture essentially means going to a 1/1 ratio at this stage which isn't likely. Remember 6 months ago they were saying 4x the DP performance per watt, and thats the key, performance per watt, not 3x the actual performance.

Factor in them using the 480gtx as the baseline which due to the clock and earlyness simply had the worst DP/w efficiency of the entire Fermi lineup, add in the fact that you'd expect not far off 2x the DP/w performance on a new process. Then when you consider 40nm was basically the worst process yet and 28nm is vastly improved with much less leakage and no 480gtx scale screwup and 3x is pretty damn easy.

But as with everything Nvidia, 18 months out you get one story, a year out another story, 6 months out another story, launch day.... wood screws and 6 months later you get the real product.

So we'll see exactly when, and what performance we get on the launch day and basically not before(not from Nvidia's mouths anyway). They do have a slim chance of launching this year but, if they need any respins(and even a good chip normally needs a minimum of one) I can't see it being this year.

Duff-Man · 3 Aug 2011 at 12:10

Cleeecooo said:
Would you mind saying the same thing in laymans terms please?

Well from the process change (40nm to 28nm) I would expect less than 2x the overall power efficiency of 40nm Fermi. Design changes and tweaks can account for a little extra improvement, but 3x or more seems a bit unrealistic.

Nvidia have stated that the double precision power efficiency is 3x better than Fermi, not overall power efficiency, so I suspect that the number of double-precision-capable cores will have been increased, at least on the HPC version.

The HPC (Tesla) version of Fermi can perform 1/2 as many double precision computations as it can single precision, while the retail Fermi (GTX480 / 580 etc) can peform only 1/8th as many (double precision is not important for gaming). If I have only a small number of double precision units relative to single precision, then my double-precision power efficiency will be low. The most straightforward way to improve DP power efficiency is to increase the proportion of DP units relative to SP.

Anyway, from these claims I'm expecting a 1:1 ratio between single and double precision, at least in the HTC version of the chip. I don't know whether we will get a cut-down version for the GTX680, I imagine it will depend on how they have adjusted the architecture to handle double precision. Certainly if it allows the retail unit to run at higher clockspeeds and/or use less power, they will go for it.

Cleeecooo · 3 Aug 2011 at 12:13

Thanks all ^^^

drunkenmaster · 3 Aug 2011 at 12:54

Duff-Man said:
Well from the process change (40nm to 28nm) I would expect less than 2x the overall power efficiency of 40nm Fermi. Design changes and tweaks can account for a little extra improvement, but 3x or more seems a bit unrealistic.

Nvidia have stated that the double precision power efficiency is 3x better than Fermi, not overall power efficiency, so I suspect that the number of double-precision-capable cores will have been increased, at least on the HPC version.

The HPC (Tesla) version of Fermi can perform 1/2 as many double precision computations as it can single precision, while the retail Fermi (GTX480 / 580 etc) can peform only 1/8th as many (double precision is not important for gaming). If I have only a small number of double precision units relative to single precision, then my double-precision power efficiency will be low. The most straightforward way to improve DP power efficiency is to increase the proportion of DP units relative to SP.

Anyway, from these claims I'm expecting a 1:1 ratio between single and double precision, at least in the HTC version of the chip. I don't know whether we will get a cut-down version for the GTX680, I imagine it will depend on how they have adjusted the architecture to handle double precision. Certainly if it allows the retail unit to run at higher clockspeeds and/or use less power, they will go for it.

A 1:1 would give a lot more than 3x the dp/w increase, so thats simply not going to happen. Likewise they only talk about dp/w because no one on earth cares about sp/w. Home users don't really care about power, gpgpu buyers don't care about performance, they care about performance/watt, its not some hint about how dp/w has increased more than sp/w, its just the later is an irrelevant number when talking about the professional/gpgpu type markets.

AS said, a bog standard drop from a bog standard process would probably give a little less than 2x the increase, but a truly awful 40nm, with leakage at the higher clockspeeds being insane to a much better quality, less problematic 28nm. Also when taking on board the very worse of Nvidia's 40nm lineup then the increases will be much higher.

Also remember, as I said, they were stating 4x the performance/w 6 months ago, its already down to 3x, by launch, it won't even be surprising if its lowered again.

Random Guy · 3 Aug 2011 at 13:02

[Likely] paper launch does not mean 'Kepler by the end of the year'. OP was a bit optimistic with the thread title methinks.

Freddie1980 · 3 Aug 2011 at 13:03

This goes along with what AMD said at their earnings call the other day as they said their on track to deliver 28nm Radeon cards before the end of year.

The advantage of getting your product to the market first is so important for both companies, being able to take advantage of that window where your margins and profits at their peak is key to turning all the expensive R&D ($2 billion has been spent on Kepler development according to JHH) into profit.

Duff-Man · 3 Aug 2011 at 13:40

drunkenmaster said:
A 1:1 would give a lot more than 3x the dp/w increase

Not neccesarily... Remember, by using the DP rather than SP shaders you are not running the GPU at full capacity. Power draw is not as high at "100% DP load" as it is with 100% SP.

no one on earth cares about sp/w.

Sure they do - Nvidia and AMD

Over the last few generations, power draw (and heat removal which is directly related) have increasingly become the limiting factor in GPU performance. With both AMD and Nvidia investing in driver-level power containment, performance-per-watt is fast becoming the most important metric in GPU design: With a hard limit for power draw and heat-removal, power efficiency determines the potential performance of the card. As you know, AMD still have the advantage here (as of last generation). It will be interesting to see if this continues.

I would love to see a 3x improvement in performance-per-watt for single precision, but I'm a little skeptical as to whether it's possible. Transistor power draw does not tend to scale as well as transistor packing-density, so I am not expecting a full 2x improvement simply from the manufacturing process. Anyway, I guess we'll see.

Also remember, as I said, they were stating 4x the performance/w 6 months ago, its already down to 3x, by launch, it won't even be surprising if its lowered again.

Nvidia has been stating "three to four times" the DP performance per Watt for Kepler since September 2010 (see, for example, here).

Rroff · 3 Aug 2011 at 13:43

Random Guy said:
[Likely] paper launch does not mean 'Kepler by the end of the year'. OP was a bit optimistic with the thread title methinks.

Thread title was taken from the "headline" for the article I don't think the OP was implying he thought it likely.

An Exception · 3 Aug 2011 at 13:46

Hands up for mobile GPU's using the 28nm LP process...

Tute · 3 Aug 2011 at 14:04

Cleeecooo said:
What's GT610?

A GT610 type card - i.e nVidia promise Kepler in 2011 but we end up getting a low-end, HTPC type card based on the architecture like AMD did with the HD4770.

drunkenmaster · 3 Aug 2011 at 17:08

Duff-Man said:
Not neccesarily... Remember, by using the DP rather than SP shaders you are not running the GPU at full capacity. Power draw is not as high at "100% DP load" as it is with 100% SP.

Sure they do - Nvidia and AMD

I would love to see a 3x improvement in performance-per-watt for single precision, but I'm a little skeptical as to whether it's possible. Transistor power draw does not tend to scale as well as transistor packing-density, so I am not expecting a full 2x improvement simply from the manufacturing process. Anyway, I guess we'll see.

The first part is irrelevant, that was as true for a 280gtx as a potential 680gtx, if DP doesn't use the whole core, then it won't on new or old architectures. But thats ignoring the fact that your argument is incorrect, Fermi pairs two 32bit shaders to complete a single 64bit operation, so it still uses ALL the shaders, and most of the rest of the core aswell, same as gaming.

As for if either care about SP power efficiency, no neither do, the slide was in a talk about professional Nvidia gpu's, server builders and the professional market in general buys and specs system based on power budget. Home users don't, they buy a bigger or smaller PSU, it doesn't matter. Data processing farms have to change £10k's of worth of power equipment if they buy 2000 cards that have 30% more power than they have specced for.

Desktop usage doesn't see any SP/w advertising, marketing, anything at all, hence its irrelevant. AMD and Nvidia do not talk about SP/w at all, but you were insinuating that because they ONLY said DP/w that it would increase disproportionately to SP/w thereby showing a likely architecture change, it doesn't. They didn't show sp/w numbers, because they never do, no one they aimed those slides at gives the slightest damn about them.

As for ultimate power usage, there is no real single card limit, theres guidelines but PCI-E guys basically state, if it works, and is safe, we'll approve it. 6990/590gtx, theres a shedload more room for single card power at the moment. The main issue is leakage, and frankly the move away from TSMC is likely to do more than anything else as they are essentially the cheap end of scale in terms of processes used.

100W or so of the top end Fermi's is supposed to be leakage, AMD won't be that far behind. We'll have to see if the HP HKMG will make a difference but, quite likely not, 28nm leakage should be higher than 40nm, so HKMG is a tool like most processes have most new nodes, to fight the increase in leakage. I think it will come down a little because the 40nm sucked frankly so badly but the talk is that Nvidia are still complaining badly about leakage on 28nm.

Duff-Man · 4 Aug 2011 at 14:30

drunkenmaster said:
The first part is irrelevant, that was as true for a 280gtx as a potential 680gtx, if DP doesn't use the whole core, then it won't on new or old architectures.

Are you suggesting it's not possible to adjust how double precision processing is handled? :confused:

Apart from that suggestion being a bit... odd... we've already seen several modifications to the design. As an example, the GTX280 had a single dedicated 64-bit processing unit in each SM (leading to one double precision unit for every 8 SP cores). The GTX480, on the other hand, performs both single- and double-precision computations using the same "CUDA cores", taking two clock cycles to perform each DP computation, and one clock cycle per SP computation.

Fermi pairs two 32bit shaders to complete a single 64bit operation, so it still uses ALL the shaders, and most of the rest of the core aswell, same as gaming.

Be that as it may, power consumption is notably lower when operating in double precision mode - presumably due to only a small subset of the GPU being active during alternate clock cycles (i.e. the "second bite" of the 64-bit cherry).

We did some tests on matrix-matrix multiplication using a Tesla C2050 (we were testing performance rather than power draw but still...). With large matrices, single-precision mode would top out at around 88% fanspeed whereas double precision would top out in the mid-low 70s (using the same fan profile and target temperature).

For what it's worth runtime was pretty much as expected (double precision mode takes very close to twice as long), but there was a difference in heat output (= power draw).

As for if either care about SP power efficiency, no neither do

I'll make it simple for you:

* Power draw = heat output
* More heat output => better cooler required
* Better cooler => extra R+D, and/or a bigger and noisier fan
* Hot space-heater GPUs put customers off, as do noisy fans (hence the importance given to noise and power draw in reviews).

... and apart from all this, the more energy you put through the GPU the more difficulties you will have in maintaining stability.

Maximum power draw is becoming an ever more important metric (hence why nvidia and AMD are investing heavily in driver-level power containment systems). For a given power-draw cap, power efficiency determines performance... And performance is good.

100W or so of the top end Fermi's is supposed to be leakage, AMD won't be that far behind. We'll have to see if the HP HKMG will make a difference but, quite likely not, 28nm leakage should be higher than 40nm, so HKMG is a tool like most processes have most new nodes, to fight the increase in leakage.

What exactly are you trying to say? That 28nm will be MORE leaky than 40nm, but that this will somehow translate into improved per-unit-area power efficiency? :confused:

As a general rule of thumb, the smaller you make the manufacturing process the more difficult it is to prevent current leakage, since you need greater relative precision in manufacturing. I see no reason (yet) to convince me that will change with 28nm.

Cleeecooo · 4 Aug 2011 at 14:42

Duff-Man said:
As a general rule of thumb, the smaller you make the manufacturing process the more difficult it is to prevent current leakage, since you need greater relative precision in manufacturing. I see no reason (yet) to convince me that will change with 28nm.

What happens when we get to 1nm?