• Competitor rules

    Please remember that any mention of competitors, hinting at competitors or offering to provide details of competitors will result in an account suspension. The full rules can be found under the 'Terms and Rules' link in the bottom right corner of your screen. Just don't mention competitors in any way, shape or form and you'll be OK.

Kepler by the end of the year

What happens when we get to 1nm?

Heh - I'm not sure we'll get that far.

The unfortunate reality is that we are already approaching molecular scale in silicon chip manufacturing. As a point of reference, the diameter of a hydrogen atom is just over 0.1nm - this is the smallest useful lengthscale in physical construction.

We will never reach this level (0.1nm), but exactly where we will need to stop is uncertain. Ten years ago some people were predicting that we would run into a limit at ~50nm, but it seems that there is still some way to go yet. Whether we will ever reach a 1nm manufacturing process though, I don't know.

The exact reasons for why are extremely complex - you need to study quantum mechanics to fully understand it. But the broad strokes are as follows:

Atomic-scale particles (atoms, protons, electrons etc) DO NOT behave as you would expect from everyday life... They are governed by quantum mechanics rather than Newtonian mechanics, and have some very odd behaviour. For example, they act as both a wave and a particle (see wave particle duality), their position and momentum CANNOT be precisely determined (see uncertainty principle), and they can jump spontaneously through solid objects (see quantum tunnelling). The smaller and more energetic the particle, the greater the propensity for "quantum weirdness".

It's this final property (quantum tunnelling) that causes the most problems. The smaller the 'jump distance' the less energy is required for the electron to 'jump'. So, as you reduce the size of the process, you reduce the distance that the particle must travel in order to jump from one part of the transistor another (where it isn't supposed to be). So, in order to stop this from happening you need to be ever-more precise with the amount of power you supply to each transistor.

Unfortunately, Moore's law can't go on for ever. The size and behaviour of Silicon atoms is to blame for that!
 
Last edited:
So to get more refined we need to move away from silicon?

I can just imagine the threads when we get to hydrogen :p
"my GTX 1780 just flew away" :D:p
 
So to get more refined we need to move away from silicon?

I can just imagine the threads when we get to hydrogen :p
"my GTX 1780 just flew away" :D:p

Nah, you need certain semi-conductor properties in order to make a transistor. Silicon is one of the few elements with that property. Anyway, Silicon atoms are fairly small anyway - around twice the radius of a hydrogen atom.

On a related note: Diamond is a potential alternative to Silicon, but it would need to be of an incredible purity (i.e. manufactured artificially) in order to be effective. The nice thing about using diamond (apart from it being totally awesome) is that the chip could operate at several hundred degrees without issue. This would obviously make heat removal a lot easier!



edit: If you're interested, a company has already produced prototype diamond-based transistors, which can operate at 81Ghz (click me...).
 
Last edited:
I thought its only electrons that have quantum tunneling, not protons or neutrons. Or am I being daft?

Well in theory any particle can experience tunnelling, but low mass, small size and high energy make tunnelling "easier". Therefore electrons are far more likely to exhibit it in practice. There aren't too many cases where you see protons / neutrons / other particles moving freely at high velocity (the obvious exception being a particle accelerator), so tunnelling with other particles is a lot less common.
 
We're still thinking INSIDE the box here!

I'm sure back in the day of vacuum tubes, no one could fathom a future of 60M+ transistors on a 28nm die (they probs couldn't even dream the transistor)
 
Are you suggesting it's not possible to adjust how double precision processing is handled? :confused: Apart from that suggestion being a bit... odd... we've already seen several modifications to the design. As an example, the GTX280 had a single dedicated 64-bit processing unit in each SM (leading to one double precision unit for every 8 SP cores). The GTX480, on the other hand, performs both single- and double-precision computations using the same "CUDA cores", taking two clock cycles to perform each DP computation, and one clock cycle per SP computation.




Be that as it may, power consumption is notably lower when operating in double precision mode - presumably due to only a small subset of the GPU being active during alternate clock cycles (i.e. the "second bite" of the 64-bit cherry).

We did some tests on matrix-matrix multiplication using a Tesla C2050 (we were testing performance rather than power draw but still...). With large matrices, single-precision mode would top out at around 88% fanspeed whereas double precision would top out in the mid-low 70s (using the same fan profile and target temperature).

For what it's worth runtime was pretty much as expected (double precision mode takes very close to twice as long), but there was a difference in heat output (= power draw).




I'll make it simple for you:

* Power draw = heat output
* More heat output => better cooler required
* Better cooler => extra R+D, and/or a bigger and noisier fan
* Hot space-heater GPUs put customers off, as do noisy fans (hence the importance given to noise and power draw in reviews).

... and apart from all this, the more energy you put through the GPU the more difficulties you will have in maintaining stability.

Maximum power draw is becoming an ever more important metric (hence why nvidia and AMD are investing heavily in driver-level power containment systems). For a given power-draw cap, power efficiency determines performance... And performance is good.




What exactly are you trying to say? That 28nm will be MORE leaky than 40nm, but that this will somehow translate into improved per-unit-area power efficiency? :confused:

As a general rule of thumb, the smaller you make the manufacturing process the more difficult it is to prevent current leakage, since you need greater relative precision in manufacturing. I see no reason (yet) to convince me that will change with 28nm.

DO you read, at all, because you make a point, someone points out where you're wrong, then you change the argument.

Did I suggest you couldn't change the architecture anywhere, no theres not a HINT of that in what I said. I should also point out, I told you how THe 480gtx works, it takes two SP cores and uses them in PAIRS, to do a single DP clock, and you manage to then change this into one core over two clocks to do a DP operation. I can link to the Nvidia whitepaper that tells you this.

Read what you said and what I replied to it. You suggested that for no apparent reason, DP wouldn't necessarily increase by 3x "easily" as DP doesn't use all the SP shaders(even though it DOES) but this has no relevance on the DP/w at all.

IF you have a 1:8 ratio of DP to SP or a 1:1, or the current 1:2, you're in each situation comparing the DP/w ratio, SP has NOTHING to do with it. I said what you were saying was irrelevant as increasing the DP/w ratio by going to a 1:1 architecture...... wouldn't somehow change the DP/w you're comparing it to from the old chip.


You then talk about power consumption, why did I bring it up, because you said power consumption is lower because it ONLY uses the DP shaders and NOT the SP shaders, this is completely incorrect.. I say so, and you bang on about how YOUR CODE(but not all code) utilises the shaders differently, these aren't the two same arguments, one, you're wrong, I point out you're wrong, and you make the argument something else, where you're still wrong.

You then go on to talk about your code and how that effects power consumption further, again, your initial statement was DP basically doesn't use the SP shaders, thats not correct. Its really this simple, DP code uses almost every single part of the core that SP code does, its slower, and ultilisation is harder, and power draw might go down, this has literally nothing to do with your claim, that DP uses less power because it doesn't use the SP shaders.

Making the power usage simple for me, well done, you stated the complete obvious, and nothing even close to relevant.

Here again you're changing the argument, try and stick with me here. You used the simple idea that they ONLY stated DP/w and NOT SP/w, as an indication that DP/w was increasing significantly which was an indication of a major architectural change. This is what you said, this was your arguement, nothing more or less. YOu didn't state power draw was important for any reason, you didn't mention why a reduction in power is good, you used the appearance of only one of these figures, to say there is a likely architecture change.

Now read my argument, for the third time. I didn't at any stage say power draw was unimportant, I said, AMD, Nvidia do not market SP/w, they never have, it "missing" is an indication of precisely nothing, its not a hint to a new architecture or disproportionate DP/w increase, it means nothing, neither company market this number.

Thats it, thats the entire argument, you're wrong again, and then you again changed the argument. Your point was never about power, or SP power, but the lack of them talking about it meant something....... which it simply doesn't, I really can't see how that is hard to understand.


AS for your last point, I really don't know what or why you're argueing.

If leakage increases by 50% per node, but one node you add relatively nothing to combat leakage, so it goes up 100%, and the next node you add HKMG, which actually drops 80% of the leakage, then overall you could have less leakage. So we don't know if TSMC's HKMG is a big enough tool to combat the overall leakage to bring it lower than 40nm. I don't recall anywhere saying that it being more leaky would lead to better performance/w. You quoted what I said, where in that did I equate more leakage to better "per-unit-area power efficiency", I didn't, yet again you seem to be randomly assuming things from things that haven't been said.
 
Last edited:
Wall of blather.

Wow. I'm not even going to bother to respond to that. In fact, I don't see a single point in there - all I see is a guy standing on a soapbox, shouting "YOU'RE WRONG" at the top of his voice in the hope that will somehow make it true :confused:

If you want to discuss something related to GPU design then great, do so. Otherwise, please stop hounding people and being so aggressive. It's tiresome. I post in this forum to discuss new GPU technology, and on occasion, to help others understand the various features, when I can. I don't want to engage in trivial arguments and pass veiled insults around - if I did, then I'd call my ex.

Honestly, sometimes I don't know what your issue is...
 
What are SP's and DP's?

In this context is single precision and double precision arithmetic units.

Typically single precision IIRC is a single byte AKA 32bits which means that after ~7 decimal places you run out of precision in your maths (values are rounded), however single precision is relatively fast to process compared to double precision operation which allows for a far great accuracy but a bigger performance hit.

For most gaming and even gaming physics useage single precision is accurate enough for everything to work properly, for using a GPU for compute tasks, especially if its for medical, industrial, data/environment modelling, etc. uses you often need a far greater degree of precision but performance isn't as essential as your usually not requiring real time feedback.
 
Last edited:
I think that maybe Nvidia have a slight advantage this time round, obviously releaseing after your opponant gives you a chance to gage their performance first, but also with the reitively small jump to this generation from the last generation due to being on the same process, and Nvidia being ahead in the performance stakes anyway i think they have more wigle room this time round.

cue a huge wall of text to generally deride and belittle this post.
 
I think its too early to call atm.

AMD having been smart enough to pre-empt the diminishing returns they are running into with their architecture and move towards making pipelines more efficent, potentially they could end up similiar to nVidia (tho very different in implementation) shader wise with a more complex and higher clocked shader domain on 28nm giving them some pretty impressive theoretical peak performance.

Meanwhile we get to see the revised Fermi architecture on a process thats a better fit, the design itself is very strong despite what some people claim so we could see another leap like the 8800GTX here.
 
Wow. I'm not even going to bother to respond to that. In fact, I don't see a single point in there - all I see is a guy standing on a soapbox, shouting "YOU'RE WRONG" at the top of his voice in the hope that will somehow make it true :confused:

If you want to discuss something related to GPU design then great, do so. Otherwise, please stop hounding people and being so aggressive. It's tiresome. I post in this forum to discuss new GPU technology, and on occasion, to help others understand the various features, when I can. I don't want to engage in trivial arguments and pass veiled insults around - if I did, then I'd call my ex.

Honestly, sometimes I don't know what your issue is...

At least you speak with an unbiased opinion, very interesting/informative also :)
 
Last edited:
AMD having been smart enough to pre-empt the diminishing returns they are running into with their architecture and move towards making pipelines more efficent, potentially they could end up similiar to nVidia (tho very different in implementation) shader wise with a more complex and higher clocked shader domain on 28nm giving them some pretty impressive theoretical peak performance.

Meanwhile we get to see the revised Fermi architecture on a process thats a better fit, the design itself is very strong despite what some people claim so we could see another leap like the 8800GTX here.

Yes, it will be very interesting this time around :)

Some of the developments that AMD are making for their next-gen GPUs are very Fermi-esque, and should make for a highly scalable architecture. You've probably seen it already, but for others who are interested there's a great article about it on anandtech.

I agree that Nvidia would seem to have the advantage this time around - they're building on an architecture which was designed to last at least three "true" generations (Fermi, Kepler, Maxwell), whereas AMD are making the biggest overhaul of their GPU design since R600.

That being said AMD has not been rushed into this GPU design the way that Nvidia was with Fermi (...due to market pressure from AMD of course!), and if the 32nm process at TSMC hadn't been cancelled we could well have seen some of these features in the 6-series. I'm not expecting any of the teething troubles we saw with Fermi, so who knows - maybe AMD will come up with something special.

About the only thing that's certain is that the cards will be fast! :D I'm not sure we'll see such a big jump as 40nm -> 28nm again - smaller increments seem to be on the horizon - so this could be the biggest jump forward for a long time to come...


At least you speak with an unbiased opinion, very interesting/informative also :)

Thanks - I appreciate you saying so :)
 
Hadn't seen that article, tho seen most of the info in dribs and drabs. Seems they've gone even further than I was told with regard to the shaders.
 
Back
Top Bottom