"AMD - 5870 is better than fermi"

Greebo · 19 Oct 2009 at 17:44

Lightnix said:
512 FMA ops /clock as per the whitepaper here:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiArchitectureWhitepaper.pdf

FMA = 2 FLOP.
GTX 285 clock speed (so a reasonable, and judging by past trends in terms of Nvidia's GPU releases, somewhat optimistic estimate at Fermi's shader clock speed) = 1476MHz
1476*512*2 = 1511424 MFLOP/S
= 1511.424 GFLOP/S

Tah dah?

Exaxtly. I notice Duff-man has gone quiet? I wonder what the starred out word was that he is going to eat?

kylew · 19 Oct 2009 at 17:54

Each company are obviously going to insist that their products are better, but at least they're trying to show some sort of fair comparison.

It's not like nVIdia's "OMG, ARE GTS250 IS ARE FASTER THEN ATi 5870 IS in Batman when PhysX is enabled because we've gimped it to run badly on any computer that doesn't have an nVidia card doing the main rendering"

Yamahahahahaha · 19 Oct 2009 at 17:58

Yeah but guys, Nvidia's card is going to shoot unicorns and lasers.

Let's all think about how many gigaflops that can achieve.

Suck it AMD!

Lightnix · 19 Oct 2009 at 18:07

Greebo said:
Exaxtly. I notice Duff-man has gone quiet? I wonder what the starred out word was that he is going to eat?

I hope it was pies.

Rroff · 19 Oct 2009 at 18:10

Something doesn't make sense here... either I'm missing something or nVidia are playing coy with the specs.

285GTX at 240SP is ~1TFlop, the SPs on the GT300 are approx. 30-33% faster clock for clock compared to the 200 series due to architecture changes... which should put it theoretically at ~3TFlop baring the shader clock on fermi in mind... and the numbers I've seen banded about, baring in mind I can't account for how much of the performance comes from other improvements as I'm just reading between the lines on things mentioned by people under NDA, in single precision heavy tests, would indicate performance in the 3.5-3.8TFlop range...

EDIT: Oh the shader clock was apparently running 1600MHz on these tests but I dunno if thats any indication of what we will actually see on retail cards.

EDIT2: Oh and they apparently have some bad power leakage around this level so that might indicate first generation cards atleast won't overclock much.

Martini1991 · 19 Oct 2009 at 18:12

Rroff said:
Something doesn't make sense here... either I'm missing something or nVidia are playing coy with the specs.

285GTX at 240SP is ~1TFlop, the SPs on the GT300 are approx. 30-33% faster clock for clock compared to the 200 series due to architecture changes... which should put it theoretically at ~3TFlop baring the shader clock on fermi in mind... and the numbers I've seen banded about, baring in mind I can't account for how much of the performance comes from other improvements as I'm just reading between the lines on things mentioned by people under NDA, in single precision heavy tests, would indicate performance in the 3.5-3.8TFlop range...

Math fail?

33% increase wouldn't equate to 3, you're thinking 300% increase.

Lightnix · 19 Oct 2009 at 18:16

Rroff said:
Something doesn't make sense here... either I'm missing something or nVidia are playing coy with the specs.

285GTX at 240SP is ~1TFlop, the SPs on the GT300 are approx. 30-33% faster clock for clock compared to the 200 series due to architecture changes... which should put it theoretically at ~3TFlop baring the shader clock on fermi in mind... and the numbers I've seen banded about, baring in mind I can't account for how much of the performance comes from other improvements as I'm just reading between the lines on things mentioned by people under NDA, in single precision heavy tests, would indicate performance in the 3.5-3.8TFlop range...

I think the word you're looking for is more efficient, not 'faster'. If the cores do 1 FMA per cycle that's what they can achieve, no? I mean, GT200 can do a MADD and a MUL (3 FLOP) op each shader per cycle (I think?), but there's no mention of that MUL operator for Fermi as yet (similarly to GT200's DP shader cluster?). Also, I heard a lot of the time the MUL operator on GT200 isn't used. If that's the case, removing it would potentially make it up to 33% more efficient.

ETNiES · 19 Oct 2009 at 18:22

Serious thread is serious

Rroff · 19 Oct 2009 at 18:22

Could be efficent - you seem to know more about the hardware level than I do - I'm just going on what I've heard (not just reading forums/news online

) and filling in the blanks.

Orangey · 19 Oct 2009 at 19:08

Amazed this is still on page 1 tbh

Monkeynut · 19 Oct 2009 at 19:14

ETNiES said:
Serious thread is serious

Good! I'm tired of fanboy threads tbh.

Greebo · 19 Oct 2009 at 20:31

I#'m glad this thread has actuak facts and numbers to justify peoples points rather than nvidia beats ati or ati beats nvidia tbh

It's like a breath of fresh air

Rroff · 19 Oct 2009 at 20:33

Yeah numbers like 1600 v 512 :rolleyes:

2.72 v 1.5 :rolleyes:

atleast its refreshing to see ATI stooping to nVidias level.

kylew · 19 Oct 2009 at 20:37

Rroff said:
Yeah numbers like 1600 v 512 2.72 v 1.5 atleast its refreshing to see ATI stooping to nVidias level.

While it's BS, it's not anywhere near nVidia's level. At least AMD's using "fair specs" instead of using benchmarks with proprietary technology which only nVidia cards can run to demonstrate that nVidia are the better company is extremely low.

It's like AMD releasing a statement that slams nVidia's cards for multi screen gaming and says "but on the otherhand, our Eyefinity technology gives ATi cards the edge, ATi cards are better at running games across multiple screens without investing in third party solutions."

Greebo · 19 Oct 2009 at 20:37

Rroff said:
Yeah numbers like 1600 v 512 2.72 v 1.5 atleast its refreshing to see ATI stooping to nVidias level.

I was talking about the people on this forum not ati

The core comparison is rubbish, agreed but the single precision is true.

However, how much difference does single and double make in games performance is what I don't know?

Rroff · 19 Oct 2009 at 20:42

At the moment not much... when things start using compute shaders, gpu physics, etc. it will make a bigger difference.

kylew · 19 Oct 2009 at 20:44

Rroff said:
At the moment not much... when things start using compute shaders, gpu physics, etc. it will make a bigger difference.

Apparently at least.

What I want is rendering to be moved from the CPU on to stream shaders/compute shaders.

That'd surely speed up 3D rendering a massive amount.

I don't understand why it's not been done yet either.

Xelene · 19 Oct 2009 at 21:38

kylew said:
I don't understand why it's not been done yet either.

To much out of order, branching code required for this sort of thing. GPU's are utter crap at that kind of workload.

I'm expecting a shift from Rasterisation to Raytracing at some point, but that depends on architectural changes, and how long performance hacks like tessellation delay it. Nvidia looked to have bet the farm on it, whether they have released Fermi too early at too high a price point to capitalise on it remains to be seen.

Duff-Man · 19 Oct 2009 at 22:22

Greebo said:
I notice Duff-man has gone quiet? I wonder what the starred out word was that he is going to eat?

You know, some of us actually have things to do and can't spend all day browsing forums! At least wait until the next day before you accuse someone of skipping out on a debate...

Anyway, it looks like I forgot to factor in the "phantom MUL" from the GT200 architecture into my calculations (the potential third floating point op per clock). Hence the reason I was getting around 50% higher values for SP and DP floating point throughput. My bad.

Still, it's worth remembering that the GTX280 architecture, without the useless "phantom MUL" operation, is only capable of 622GF. Yet, it is able to outperform the 4890 in most cases, which has a theoretical capacity of 1360GF. Since Fermi (supposedly) brings further efficiency improvements to the architecture, and does away with the "phantom mul" we could still be looking at good real-world performance even at the quoted floating-point throughput. Still, ~1.5TF is not really what I was expecting.

As for what it was on that starred out part of my earlier post... It was cake. A nice sweaty fruitcake I made last week. I'm eating it now, so you can all be quiet

kylew · 19 Oct 2009 at 22:23

sldsmkd said:
To much out of order, branching code required for this sort of thing. GPU's are utter crap at that kind of workload.

I'm expecting a shift from Rasterisation to Raytracing at some point, but that depends on architectural changes, and how long performance hacks like tessellation delay it. Nvidia looked to have bet the farm on it, whether they have released Fermi too early at too high a price point to capitalise on it remains to be seen.

I'm not talking realtime though. I know a GPU would utterly suck at doing that in real time, but the way things seem to be, a GPU would offer some benefit over a CPU.