Physics hardware makes Kepler/GK104 fast

iB CoMpL3XiTy · 4 Feb 2012 at 21:01

Not sure if this has already been posted (sorry if it has)

http://semiaccurate.com/2012/02/01/physics-hardware-makes-keplergk104-fast/

Makes for an interesting read.

Thoughts?

mmj_uk · 4 Feb 2012 at 21:29

So in summary NVidia beef up physics performance in hardware and are the evil empire, whereas AMD slack with hardware accelerated tessellation (instead preferring to cheat in drivers) as well as failing to engage with developers over FSAA support, which makes them the good guys.

Worst still AMD expect to get away with charging NVidia prices when they cut all of these corners, what happened to the brand of physics that AMD were behind? are they expecting NVidia to push that for them too?

Rroff · 4 Feb 2012 at 21:31

yes its already been posted - about 5 times lol.

Pottsey · 4 Feb 2012 at 21:49

mmj_uk said:
what happened to the brand of physics that AMD were behind? are they expecting NVidia to push that for them too?

Intel bought out that company and dropped hardware support.

metalmackey · 4 Feb 2012 at 21:54

I thought Intel bought Havok and AMD was behind OpenCL which I thought started off as Bullet physics. I may be wrong though LOL

Rroff · 4 Feb 2012 at 22:14

Open CL is nothing to do with physics itself its just a general purpose compute platform. AMD was supposed to be porting the Bullet physics API to run on *any* GPU that supported Open CL.................................................................................................................................................................................................................................................

Bullet itself is actually fairly good now - tho I still think its far more suited to movie type effects than use in games. But latest versions can do most of the physics effects that run on the GPU seen in Batman AA on the CPU with very good performance.

xsistor · 4 Feb 2012 at 22:25

metalmackey said:
I thought Intel bought Havok and AMD was behind OpenCL which I thought started off as Bullet physics. I may be wrong though LOL

AMD isn't behind anything. Bullet just works on OpenCL, CUDA etc so it will run on any cpu/gpu

metalmackey · 4 Feb 2012 at 22:37

xsistor said:
AMD isn't behind anything. Bullet just works on OpenCL, CUDA etc so it will run on any cpu/gpu

Ah right, thanks for putting me straight.

Do we really need hardware physics for games. 8 core CPU's are around the corner, why can't we just have optimized CPU physics. Its not like they have to be 100% accurate.

xsistor · 4 Feb 2012 at 22:43

many-body physics simulation problems are all linear algebra and multivariate calculus problems -- usually systems of differential equations. So you need high performance floating-point vector processing to do physics effectively and CPUs are not designed to do that. GPUs are designed for massive parallelism which makes them ideal for physics simulations. You can do it on CPUs but it will be orders of magnitude slower. I remember writing a post about this some time ago...

It was this one:
http://forums.overclockers.co.uk/showthread.php?p=20834713&highlight=SSEx#post20834713

xsistor said:
Sticking a Q9550 core on a GPU will not make things faster. In fact it will make your games ridiculously slower. For the short answer as to why, compare software renderers with 3D hardware acceleration. that software renderer is running on a general-purpose CPU while the hardware accelerator has dedicated hardware for compute-intensive problems.

That's the short answer, here is the (grossly simplified) long answer:

It's down to architecture. CPU design engineers have quite different design goals from GPU designers. CPUs are designed for general purpose processing. The majority of their processing deals with I/O, string manipulation, integer arithmetic, logical operations etc. So much so that the original x86 processors (8086 to 80386) could not even do floating-point arithmetic (on single precision and double precision variables, as defined by IEEE Standard 854 and 754). They were only capable of integer arithmetic, and even now the standard 8086 instructions only support integer arithmetic. Back then you needed to buy an expensive co-processor called the math-co processor or Floating Point Unit (FPU) if you wanted to carry out arithmetic on numbers iwth decimal points (floating point variables). These corresponded to processor numbers like 8086, 286 and 386 CPUs in the form of: 8087, 80287 and 80387 FPUs etc. (I've omitted 80186 because its primary use was in embedded systems).

On hardware that did not have it, software floating point emulators were used. These were software programs, often written in highly-optimized assembly language (both for speed and because it required low-level access to the hardware traps). And they would use the integer instruction set of the computer to perform FP arithmetic.

Then when the 80486 DX arrived it came with a built in FPU. The cheaper 80486 SX was essentially a DX with its FPU disabled.

(As a side note: The FPU is a completely separate architecture from the general-purpose register organised machine the x86 is. In fact it was a stack-organised architecture consisting of 80-bit doubles which allows for very interesting ways to perform arithmetic easily -- it can directly compute arithmetic expressions in Reverse Polish Notation (RPN).)

This should give you some indication as to how far removed floating-point arithmetic is from what personal computers conventionally do. The lack of need for fast FP arithmetic in normal computer use gave rise to its relegation, at least until the 486 onwards, as a separate optional co-processor (much in the manner 3D accelerators are nowadays).

The FPU expanded in functionality from the days of yore when the 8087 reigned supreme. Fast forward to the time of the Pentium III: a new way to carry out floating point arithmetic on the CPU Was introduced. And this was originaly because multimedia applications needed access to fast hardware-based FP arithmetic. Pentium IIIs and their AMD counterparts therefore introduced a new set of instructions called SSE which also incorporated a form of Data-level parallelism called SIMD (infact SSE stands for Streaming SIMD Extensions). This is important for understanding where GPUs have the advantage over CPUs. SIMD allowed single instructions to operate on large sets of data -- why is this important? Because much numerical work is expressed and carried out in linear algebra which uses matrices. SIMD allowed fast calculations on matrices (the reasons for this become obvious if you delve into linear algebra to a moderate depth). SSE continued to expand in the form of SSE2, SSE5, etc. Though primarily created for multimedia that is not its only application. In fact the reason such arithmetic is necessary in multimedia has to do with a field of engineering called Digital Signal Processing (or DSP). DSP is the core scientific basis for audio and video processing, and it is almost entirely about matrices and Z-transforms. So these advances make it very useful for scientific/engineering computing which is numerically intensive.

Fast forward again to the time of Sandybridge: A new set of instructions were introduced for the burgeoning requirement of ever faster floating point arithmetic called AVX or Advanced Vector Extensions (Vector being a different name for a type of matrix). This also functions similarly to the FPU and SSE by providing ways of performing FP arithmetic on large data sets in a rapid manner. Now keep in mind that the FPU, SSEx and AVX are all great for both single and double precision. In fact the FPU only ever carries out arithmetic on 80-bit long doubles which it then truncates to 64-bit doubles or 32-bit singles. SSE and AVX do it a bit differently. Example an AVX instruction on a 128-bit register can be either 2 double ops or 4 single ops.

Now here's where GPUs come in. GPUs are single-minded number crunchers. They are designed from the ground up to have data-level parallelism, and even instruction level parallelism. Their strength lies in their ability to perform floating-point arithmetic. Along the way the companies (prominently, NVIDIA) realised that what makes GPUs great for graphics also makes it a great general-purpose FPU. As such they began to move towards a more general architecture that is compute heavy. This change mainly began between the Geforce 7000 and 8000 series.

However, graphics only requires single-precision floats. Sadly for science and engineering this not enough as computational problems are often performed exclusively on doubles. This is the cause of the recent trend you see with NVIDIA and ATI trying to make their graphics cards stronger in double precision floating point arithmetic. Because NVIDIA in particular is pushing its GPU as a general purpose processor.

Generally speaking the GPU is a processor designed for general-purpose linear algebra, while the CPU is a general purpose processor in the truest sense and therefore not nearly as optimised for compute-heavy problems like graphics, DSP and optimization theoretic problems.

metalmackey · 4 Feb 2012 at 22:52

So that leaves one problem. Until AM|D/Nvidia use the same physics API, there will never be hardware physics in a multiplayer game (except for eye candy) as people will have different gcard brands.

d_brennen · 4 Feb 2012 at 23:00

metalmackey said:
So that leaves one problem. Until AM|D/Nvidia use the same physics API, there will never be hardware physics in a multiplayer game (except for eye candy) as people will have different gcard brands.

Physics simulation used to be a massive problem for CPU time wise, back the day of XP and single core CPUs...

drunkenmaster · 4 Feb 2012 at 23:20

mmj_uk said:
So in summary NVidia beef up physics performance in hardware and are the evil empire, whereas AMD slack with hardware accelerated tessellation (instead preferring to cheat in drivers) as well as failing to engage with developers over FSAA support, which makes them the good guys.

Worst still AMD expect to get away with charging NVidia prices when they cut all of these corners, what happened to the brand of physics that AMD were behind? are they expecting NVidia to push that for them too?

THey haven't cut corners OR cheated in drivers, they STOPPED Nvidia cheating with an option in their drivers.

Nvidia adds tessellation GEOMETRY to a level you physically can not see, a level it is IMPOSSIBLE TO SEE A DIFFERENCE. The ONLY reason you do that is to hurt a card not designed to run calculations to a detail you can not tell the difference, which.... errm, is the bog standard level of performance you'd ever require.

Nvidia are over tesselating..... here's the real kicker thats brilliant and makes all the arguements against it sound truly pathetic...... FLAT SURFACES with so many triangles of which you can't see any IQ improvement at all.

Yeah, turn one big surface into millions of triangles with no IQ performance and hurt Nvidia performance doing it, genius. I'm SOOOO angry AMD can't over tessellate a flat surface that has no reason to ever be tessellated, those numpties, they should totally add hardware to deal with that situation.

In terms of the power required for graphics, there isn't any need for ultra accurate physics. for true modelling, for scientific work a very high degree of accuracy is the difference between work being useful and worthless.

In gaming, you can't tell the difference between accurate and inaccurate, its entirely not necessary for truly accurate either, numbers of physics modelling. Estimations and simplifed models are all that is required. The biggest improvement in designing more interactive games with better "physics", not effects, is design time, and realistically that is more likely to be effected by, consoles getting enough power to take games to the next level so dev's being willing to put the extra effort in. It will likely also come down to dev's, the engines they decide to make and the tools. At the moment dev's are all about quick as possible turn over of, well frequently repeated games, yearly releases and caring more for constant releases than real increase in game quality.

Mario · 4 Feb 2012 at 23:46

drunkenmaster said:
THey haven't cut corners OR cheated in drivers, they STOPPED Nvidia cheating with an option in their drivers..

stopped reading here

how they (amd) can STOP cheating other company,when they (amd) drivers very well known for performing better in synthetic benchmarks instead of real life,aka games?
typical DM - Nvidia=evil,amd=good guys,nothing new here,another anti Nvidia psot from DM

Rroff · 4 Feb 2012 at 23:55

drunkenmaster said:
Nvidia are over tesselating..... here's the real kicker thats brilliant and makes all the arguements against it sound truly pathetic...... FLAT SURFACES with so many triangles of which you can't see any IQ improvement at all.

Actually there is a good technical reason for this (tho in the cases such as crysis where this has been brought up its not the case - rather shoddy console porting where they've basically gone high res = slap tessellation on ad hoc) if you have a large flat surface adjacent to an area of high detail there are often cases where with increased sub-division of the high detail area you need to also increase the sub-division of the flat area to match or otherwise you end up with t-junction seam issues along the edge of polygons.

spain · 5 Feb 2012 at 00:03

Mario said:
stopped reading here
how they (amd) can STOP cheating other company,when they (amd) drivers very well known for performing better in synthetic benchmarks instead of real life,aka games?
typical DM - Nvidia=evil,amd=good guys,nothing new here,another anti Nvidia psot from DM

Actually DM's post was informative and has been well covered beyond these forums.

Is there any chance that the childishness that has griped these forums for a while now could stop. The mods really need to start taking firmer actions.

tommybhoy · 5 Feb 2012 at 00:44

Rroff said:
AMD was supposed to be porting the Bullet physics API to run on *any* GPU that supported Open CL......................

AMD are fantastic at talking the talk, pity they can't do the walk with it.

xsistor said:
You can do it on CPUs but it will be orders of magnitude slower.

It would be great if we got that level of physics you talk about in a game but we don't.

For the level of physics/physX we do get in current games, todays cpu's would probably handle it if it was scalable, JC2/BF3 to name a few are just as effective looking as the likes of Bataman imo.

(Hope BAC is just as good looking as the first as I'm due taking a card out of CrossFire to put in the old rusty 9800GT simply for the candi.

)

Regarding the 'cheating' in the drivers, the option in the drivers is to combat the likes of over heavy tesselation not only on the concrete but also for the water that covers every part of a level if there is any water visible, very naughty naughty Crytek/Nvidia(both worked on it, so both get the blame in my book).

CAT-THE-FIFTH · 5 Feb 2012 at 00:54

The underground tessellated waves were quite funny!

The performance hit though on all DX11 cards was not so funny! :mad:

Having said that the default textures used in Crysis2(lower than than Crysis IIRC) were terrible and even the high resolution pack was not that great. I ended up using a custom texture pack and it looked so much better!!

Crytek really did a poor job with the PC version of Crysis2!

Rroff · 5 Feb 2012 at 01:00

CAT-THE-FIFTH said:
The underground tessellated waves were quite funny! The performance hit though on all DX11 cards was not so funny!

Funny yes but a fairly acceptable reason for them being like that - for surfaces of water like that its usually more efficent just to use one large water plane than lots of unique instances and the processing requirements of checking visibility can be more heavy than just rendering the water all the time even if its not visible.

tommybhoy · 5 Feb 2012 at 01:04

I just knew you were going to have an answer.

CAT-THE-FIFTH · 5 Feb 2012 at 01:10

Rroff said:
Funny yes but a fairly acceptable reason for them being like that - for surfaces of water like that its usually more efficent just to use one large water plane than lots of unique instances and the processing requirements of checking visibility can be more heavy than just rendering the water all the time even if its not visible.

Not if it extends for a long distance underground and is heavily tessellated even in the extensive non-visible areas(it can be reduced in such regions). That is a performance hit for any card, especially for the majority of gamers who don't have massive graphics card budgets, and have cards like the GTX460 or a tad higher. If Crytek want to save money then good for them - perhaps they can save some more money and start launching their titles at £15. If they want to save money then so will I when it comes to their titles, and this is what I did with Crysis2. I paid much more for the previous games.

Even the textures originally used for the PC version of Crysis2 were worse than for the original game which is frankly pathetic. The "high resolution" texture pack was only slightly better. Then some bloke in his spare time ends up releasing a texture pack which looks fantastic, yet Crytek could not be arsed to even do that job properly. No DX11 at launch and poor textures. Then it takes them months to complete a DX11 patch and add only slightly better textures. As the CEO mentioned it was "a gift" for PC gamers. It does seem their budget for the "gift" was probably the same amount as their nan put in their christmas card(this part is in jest). The game was fun but in the end Crytek were more worried about the console version it seems,and then made a half arsed effort to appease PC gamers after the horse had bolted.

At least titles like BF3 and Metro2033 bothered to do a decent job for PC gamers.