GTX 780 performance revealed (kepler)

Jakus · 18 Dec 2011 at 16:48

Scougar said:
Twice the performance... sounds suspect to me.

I personally like the % ! 125% better.....they really mean 25%

DarrenM343 · 18 Dec 2011 at 16:57

JAKUS said:
No the GPU's have way more transistors than CPU's. I'm not the man to ask about architecture comparisons but I would say that = more complex

Maybe it's hard to compare. GPU has only one purpose (graphics), CPU has to be a better allrounder. So a CPU might have less transistors overall but maybe more complex in other ways to be a better all-rounder.

I am however guessing

TheRealDeal · 18 Dec 2011 at 18:37

bhavv said:
http://www.techpowerup.com/156709/GeForce-Kepler-104-and-100-GPU-Specifications-Compiled.html

Seems true, GK104 will be anywhere around 25-50% faster than a GTX 580, GK100 will be 100% faster.

This is why I dont buy high end cards, the next midrange is always better, and / or much more efficient.

Unless the price on the Gk104 is <£200 for the full 384 bit and maximum shader version, I'll deffo be skipping. 2 x £180 cards is the maximum I am willing to spend for some lovely Gk104 SLI goodness.

I'd also have to wait until April as my earliest upgrade time, I need my ISA interest.

Doubling the specs usually does not double the performance. They would need to have made a lot of optimisations for that to happen. Thats why dual gpu cards from the last generation are usually still slightly faster than the next gen single gpu. Crossfire 4870 or the 4870x2 was faster than a 5870 even though the 5870 was double the spec of a 4870. Scaling was around 70-75% average for crossfire back then. Also take into account that the 5870 had 100mhz extra core speed.

Jakus · 18 Dec 2011 at 19:31

DarrenM343 said:
Maybe it's hard to compare. GPU has only one purpose (graphics), CPU has to be a better allrounder. So a CPU might have less transistors overall but maybe more complex in other ways to be a better all-rounder.

I am however guessing

That's why people are using the GFX cards for folding etc, making use of that massive computing power and it will be more so in the future !

Ryan-3 · 19 Dec 2011 at 16:40

bhavv said:
This is why I dont buy high end cards, the next midrange is always better, and / or much more efficient.

I find the mid-range good enough tbh. No need to be too 'extreme', and the way new tech comes out now, your £500 purchase is old fashioned by the time you've paid it off.

xsistor · 19 Dec 2011 at 19:49

StonedPenguin said:
Probably a horrendous question but I've always wondered why GPU designers cant just stick the equiv of say a q9550 in there, Would it be 400% faster?

Sorry for random (probably v.stupid question)

Sticking a Q9550 core on a GPU will not make things faster. In fact it will make your games ridiculously slower. For the short answer as to why, compare software renderers with 3D hardware acceleration. that software renderer is running on a general-purpose CPU while the hardware accelerator has dedicated hardware for compute-intensive problems.

That's the short answer, here is the (grossly simplified) long answer:

It's down to architecture. CPU design engineers have quite different design goals from GPU designers. CPUs are designed for general purpose processing. The majority of their processing deals with I/O, string manipulation, integer arithmetic, logical operations etc. So much so that the original x86 processors (8086 to 80386) could not even do floating-point arithmetic (on single precision and double precision variables, as defined by IEEE Standard 854 and 754). They were only capable of integer arithmetic, and even now the standard 8086 instructions only support integer arithmetic. Back then you needed to buy an expensive co-processor called the math-co processor or Floating Point Unit (FPU) if you wanted to carry out arithmetic on numbers iwth decimal points (floating point variables). These corresponded to processor numbers like 8086, 286 and 386 CPUs in the form of: 8087, 80287 and 80387 FPUs etc. (I've omitted 80186 because its primary use was in embedded systems).

On hardware that did not have it, software floating point emulators were used. These were software programs, often written in highly-optimized assembly language (both for speed and because it required low-level access to the hardware traps). And they would use the integer instruction set of the computer to perform FP arithmetic.

Then when the 80486 DX arrived it came with a built in FPU. The cheaper 80486 SX was essentially a DX with its FPU disabled.

(As a side note: The FPU is a completely separate architecture from the general-purpose register organised machine the x86 is. In fact it was a stack-organised architecture consisting of 80-bit doubles which allows for very interesting ways to perform arithmetic easily -- it can directly compute arithmetic expressions in Reverse Polish Notation (RPN).)

This should give you some indication as to how far removed floating-point arithmetic is from what personal computers conventionally do. The lack of need for fast FP arithmetic in normal computer use gave rise to its relegation, at least until the 486 onwards, as a separate optional co-processor (much in the manner 3D accelerators are nowadays).

The FPU expanded in functionality from the days of yore when the 8087 reigned supreme. Fast forward to the time of the Pentium III: a new way to carry out floating point arithmetic on the CPU Was introduced. And this was originaly because multimedia applications needed access to fast hardware-based FP arithmetic. Pentium IIIs and their AMD counterparts therefore introduced a new set of instructions called SSE which also incorporated a form of Data-level parallelism called SIMD (infact SSE stands for Streaming SIMD Extensions). This is important for understanding where GPUs have the advantage over CPUs. SIMD allowed single instructions to operate on large sets of data -- why is this important? Because much numerical work is expressed and carried out in linear algebra which uses matrices. SIMD allowed fast calculations on matrices (the reasons for this become obvious if you delve into linear algebra to a moderate depth). SSE continued to expand in the form of SSE2, SSE5, etc. Though primarily created for multimedia that is not its only application. In fact the reason such arithmetic is necessary in multimedia has to do with a field of engineering called Digital Signal Processing (or DSP). DSP is the core scientific basis for audio and video processing, and it is almost entirely about matrices and Z-transforms. So these advances make it very useful for scientific/engineering computing which is numerically intensive.

Fast forward again to the time of Sandybridge: A new set of instructions were introduced for the burgeoning requirement of ever faster floating point arithmetic called AVX or Advanced Vector Extensions (Vector being a different name for a type of matrix). This also functions similarly to the FPU and SSE by providing ways of performing FP arithmetic on large data sets in a rapid manner. Now keep in mind that the FPU, SSEx and AVX are all great for both single and double precision. In fact the FPU only ever carries out arithmetic on 80-bit long doubles which it then truncates to 64-bit doubles or 32-bit singles. SSE and AVX do it a bit differently. Example an AVX instruction on a 128-bit register can be either 2 double ops or 4 single ops.

Now here's where GPUs come in. GPUs are single-minded number crunchers. They are designed from the ground up to have data-level parallelism, and even instruction level parallelism. Their strength lies in their ability to perform floating-point arithmetic. Along the way the companies (prominently, NVIDIA) realised that what makes GPUs great for graphics also makes it a great general-purpose FPU. As such they began to move towards a more general architecture that is compute heavy. This change mainly began between the Geforce 7000 and 8000 series.

However, graphics only requires single-precision floats. Sadly for science and engineering this not enough as computational problems are often performed exclusively on doubles. This is the cause of the recent trend you see with NVIDIA and ATI trying to make their graphics cards stronger in double precision floating point arithmetic. Because NVIDIA in particular is pushing its GPU as a general purpose processor.

Generally speaking the GPU is a processor designed for general-purpose linear algebra, while the CPU is a general purpose processor in the truest sense and therefore not nearly as optimised for compute-heavy problems like graphics, DSP and optimization theoretic problems.

Duff-Man · 19 Dec 2011 at 20:27

^^^ cracking post

As xistor says, CPUs and GPUs have very different functionality. Within a modern-day computer the CPU is designed to perform complex, serial tasks very quickly. The GPU on the other hand is designed to perform relatively simple arithmetic on a massively parallel scale.

In a given clock, your GPU may perform several hundred different add-multiply operations, whereas your CPU core may perform just one or two. However only a very limited type of computations can be broken down into the hundreds of simple parallel components you need to feed into a GPU, and have it operate efficiently. For the vast majority of "stuff" you come across in computing, a GPU would be either useless or incredibly inefficient. But for that small class of problems that can be discretised for parallel processing, nothing touches the performance of a GPU.

Thankfully, most heavy number-crunching applications (like graphics processing, or scientific / financial simulations) can be discretised to some degree. For some (like graphics), GPUs are perfectly suited, as individual pixels and geometry components can be operated on independently. It is, after all, what they were originally designed for.

For scientific and financial simulations, the effectiveness of the GPU depends mainly on the complexity of the algorithm you're using to solve the problem (specifically on the amount of communication required between individual components). Only a very small percentage are accelerated by GPUs presently, mostly in academia and other research institutions, but this is slowly changing. It's a huge market, but GPUs don't yet have the sophistication to justify porting huge and complex codes.

In short, you need both a CPU and GPU in order to tackle the full range of everyday computing problems effectively. As for integrating the two functionalities into one chip... well, that's coming. All of the big silicon manufacturers are heading in that direction.

xsistor said:
However, graphics only requires single-precision floats. Sadly for science and engineering this not enough as computational problems are often performed exclusively on doubles.

Yes, without double-precision support GPUs are not much use in the simulation market. But it is entirely useless for games... I'd very much like to see the big GPU players producing entirely separate product lines for gaming GPUs and computing-GPUs, rather than taking an "all in one" approach and rebranding. I'm sure that valuable die space could be saved...

On another note, with the numerical method I'm playing with at work right now, even double precision isn't enough to get stable results! I need high-precision solutions to ill-conditioned matrices, so I'm working with quad precision (128-bit) right now. It's really slow...

xsistor · 19 Dec 2011 at 20:50

^^ Says things my post left unsaid. "Massive Parallelism" is a term I had to use too often when I used to work on VSLI massively parallel analogue processor design, and surprisingly never even popped into my head while writing that. It expresses the GPU advantage succinctly.

Out of curiosity, Is your CFD work with increasing requirements for precision due to finer granularity of navier-stokes eqs required in the finite element method? Is this for commercial work or academia/research?

Duff-Man · 19 Dec 2011 at 21:40

I'm solving linear and nonlinear elasticity at the moment, rather than Navier-Stokes. The problems I'm solving are fairly basic 2D and 3D benchmark examples to stress test the numerical method I'm developing. Development of the numerical method is the main focus of my research, rather than its application to commercial-scale projects (I'm in academia btw).

For what it's worth, the method is a new meshless method that I developed during my PhD, based on collocation with radial basis functions. In traditional finite difference / element / volume methods you use fairly simple polynomials as shape functions to describe the variation of the solution field over an 'element'. With the method I'm developing we're using radial basis function collocation, where the shape functions are themselves solutions of the underlying PDE (rather than just being polynomials).

In many cases the method can give fantastic accuracy (several orders of magnitude lower errors than traditional FE/FV/FD methods using equivalent numbers of elements). However in others cases it requires a sophisticated tuning of various numerical parameters in order to access the high convergence rates it's capable of, without experiencing instability. It's a very promising numerical technique, but it's still very much in the development stage. We're mainly testing its capabilities at solving a variety of different engineering PDEs (convection diffusion, Navier-Stokes, elastic / plastic deformation etc), and trying to learn more about how it behaves.

The reason I need such high precision is that the 'elements' can often produce very ill-conditioned collocation matrices. In order to solve them without losing stability, I need quad-precision. I'm sure there are more sophisticated matrix solvers that could do the job in double precision, but at this stage in the research process it's better to use a reliable solver and a higher precision arithmetic than add another layer of complexity in terms the solver.

um... kind of went on longer than I intended there!

garsands · 19 Dec 2011 at 21:44

ummm yup was just going to post what the two above have ^^^ :confused:

Wow I feel so simple now

ALXAndy · 19 Dec 2011 at 21:47

garsands said:
ummm yup was just going to post what the two above have ^^^

Wow I feel so simple now

Never mind mate. Just look at it the way I do. GPUs are more powerful than their CPU counterparts.

Put simply if a 990X can hold back a GPU configuration then the GPU is obviously far more powerful

The Slayer_UK · 19 Dec 2011 at 21:50

two many big words on a monday night for me haha :confused:

Duff-Man · 19 Dec 2011 at 22:00

The "slower and with shorter words" version

CPU: Perform complex calculations one after the other
GPU: Perform hundreds of simple calculations at the same time

Without the CPU you really can't do anything. Without the GPU, heavy number crunching tasks (like graphics processing) will grind to a halt.

'CPU limitation'
When you're gaming, to render each frame, some computations are given to the CPU (core game logic, AI, all the data setup), and some to the GPU (geometry processing, rendering, maybe physics). If the CPU finishes its workload first then you wait on the GPU, and you're GPU-limited for that frame. Vice-versa and you're CPU-limted for that frame.

'adding a CPU into the GPU design'
Combining a CPU and a GPU into one behemoth chip could well improve overall performance, and all the big chip makers are heading in this direction. But combining the two parts only adds to the die size, complexity and heat output problems that are already limiting factors, so it's a complex problem.

The Slayer_UK · 19 Dec 2011 at 22:09

^^^ thats a lot easier to understand and it makes sense when i think about it how long do you reckon it will be before we see a graphics card witha GPU and a CPU then? im guessing this would increase costs dramatically

Jakus · 19 Dec 2011 at 22:35

The Slayer_UK said:
^^^ thats a lot easier to understand and it makes sense when i think about it how long do you reckon it will be before we see a graphics card witha GPU and a CPU then? im guessing this would increase costs dramatically

Sandybridge has GFX on board and there are others, I suppose it could be a while before the high end stuff that your thinking about shows up. I would think AMD would be the first as it has a good GFX tech + CPU tech

Duff-Man · 19 Dec 2011 at 22:38

The Slayer_UK said:
^^^ thats a lot easier to understand and it makes sense when i think about it how long do you reckon it will be before we see a graphics card witha GPU and a CPU then? im guessing this would increase costs dramatically

Hard to say... Current-gen CPUs already have a small onboard GPU, but they can't rival gaming GPUs for performance.

Personally, I think it will be a long time before high-end gaming GPUs are fully integrated. I imagine we'll reach a point in a couple of CPU generations time (so, say 5 yrs or so) where integrated GPUs cover the low-end and mainstream GPU market, and only high-end GPUs are sold as separate add-on cards.

I think it will be a long time before high-end cards disappear though - there will always be a market for those who want the extra performance, and it will be difficult for a fully integrated system to match a dedicated GPU, when heat output and die-size are limiting factors.

I don't have any special insight here though, I'm just speculating...

xsistor · 19 Dec 2011 at 22:48

Duff-Man said:
I'm solving linear and nonlinear elasticity at the moment, rather than Navier-Stokes. The problems I'm solving are fairly basic 2D and 3D benchmark examples to stress test the numerical method I'm developing. Development of the numerical method is the main focus of my research, rather than its application to commercial-scale projects (I'm in academia btw).

Academia is where it's at

... Industrial research is often too pedestrian and mundane.

I imagine you treat the nonlinear elastic problems and navier stokes equations as constrained optimization problems. If so I'd be curious to know if the particular instances your solving numerically are potentially chaotic.

Duff-Man said:
For what it's worth, the method is a new meshless method that I developed during my PhD, based on collocation with radial basis functions. In traditional finite difference / element / volume methods you use fairly simple polynomials as shape functions to describe the variation of the solution field over an 'element'. With the method I'm developing we're using radial basis function collocation, where the shape functions are themselves solutions of the underlying PDE (rather than just being polynomials).

PDE's are fascinating. My own core interest was in dynamical systems theory, and I almost ended up in that field, probably would've ended up working on navier-stokes and elliptic PDEs and the like. In fact there's a top man in Radial Basis Functions where I am at and at one point almost ended up having him on as a supervisor. It would've been mostly work on non-smooth dynamical systems.

But right now I'm working on topology and metric spaces and I'm pretty happy with this niche. There are promising results in group homology that I think will have interesting applications in the near future. It's rather abstract for the moment, but the plan is to eventually tie it in with a niche area in microelectronics and nanotechnology -- self-assembly.

Duff-Man said:
In many cases the method can give fantastic accuracy (several orders of magnitude lower errors than traditional FE/FV/FD methods using equivalent numbers of elements). However in others cases it requires a sophisticated tuning of various numerical parameters in order to access the high convergence rates it's capable of, without experiencing instability. It's a very promising numerical technique, but it's still very much in the development stage. We're mainly testing its capabilities at solving a variety of different engineering PDEs (convection diffusion, Navier-Stokes, elastic / plastic deformation etc), and trying to learn more about how it behaves.

The reason I need such high precision is that the 'elements' can often produce very ill-conditioned collocation matrices. In order to solve them without losing stability, I need quad-precision. I'm sure there are more sophisticated matrix solvers that could do the job in double precision, but at this stage in the research process it's better to use a reliable solver and a higher precision arithmetic than add another layer of complexity in terms the solver.

Yeah I can see why you'd want higher than double/quad double precision if you're working with high condition numbers. You'd end up having to develop specialist numerical libraries though for such high precision work. Hardware based FPGA solutions might work out better than CUDA if that can be achieved within you team. Planning to remain with PDEs and academia or sell out and take a job paying hundreds of K for work on Stochastic Differential Equationss/quant analysis?

Edit: Btw I just found a paper on a meshless RBF and Wavelets method for solving PDEs (possibly what you described). Very interesting stuff!

xsistor · 19 Dec 2011 at 22:53

Duff-Man said:
Hard to say... Current-gen CPUs already have a small onboard GPU, but they can't rival gaming GPUs for performance.

Personally, I think it will be a long time before high-end gaming GPUs are fully integrated. I imagine we'll reach a point in a couple of CPU generations time (so, say 5 yrs or so) where integrated GPUs cover the low-end and mainstream GPU market, and only high-end GPUs are sold as separate add-on cards.

I think it will be a long time before high-end cards disappear though - there will always be a market for those who want the extra performance, and it will be difficult for a fully integrated system to match a dedicated GPU, when heat output and die-size are limiting factors.

I don't have any special insight here though, I'm just speculating...

I think Intel is pushing for this, but were not too happy with the progress when they realised their GPU technology is behind NVIDIA's. Still, they are such an enormous company, and given that they have the most cutting edge process in the world, and all the top semiconductors people in industry, they should be able to pull something extraordinary. I for one would love to see Intel buy NVIDIA.

edscdk · 19 Dec 2011 at 23:27

I bet they are comparing a 1.5gb card to a 3gb card and they set the res so hi the older card is crippled due to lack of VRAM, the the fps is probably something stupid like 10 fps v 15 fps ie no one would play at that res/ detail

There is no way they will release a card 1.5 to 2 x faster than the previous gen

StonedPenguin · 19 Dec 2011 at 23:36

Thanks for that xsistor and Duff-Man, A lot of it went over my head but I get the general idea

cheers