"AMD - 5870 is better than fermi"

straxusii · 19 Oct 2009 at 07:05

http://vr-zone.com/forums/496780/amd-ati-hd-5870-is-better-than-nvidia-fermi.html

Fred · 19 Oct 2009 at 08:41

Welcome to the wonderful world of PR :rolleyes:

Fenris · 19 Oct 2009 at 08:56

Holy unreleased card, Batman!

SfnX · 19 Oct 2009 at 09:18

rofl@ GFLOPs, id like to see the fermi released first.

drunkenmaster · 19 Oct 2009 at 10:17

SfnX said:
rofl@ GFLOPs, id like to see the fermi released first.

Nvidia have already stated their numbers, and honestly the only way it will change is if they go down due to releasing at lower clocks than expected.

IT is rather hilarious that Nvidia have gone all out to build a GPGPU with massive double precision power, and AMD with little to no extra effort have got a card with around two thirds the double precision power, and higher single precision power with no sacrifices being made.

Considering GPGPU is 1-2% of Nvidia's business and GPU the massive majority, its odd that Nvidia have shifted so heavily so quickly. AMD/Intel can make cards for that market, when its worth more than the $78 in revenue and almost no profit that Nvidia made last year from GPGPU's.

mmj_uk · 19 Oct 2009 at 10:20

Total guesswork.

Duff-Man · 19 Oct 2009 at 10:26

Nvidia "shader cores" aren't comparable with AMDs. The 240-core GTX280 was faster than the 800-core 4800 after all. AMD is well aware of this fact, and apparently chooses to ignore it.

Also, if Fermi has a maximum throughput of 1.5TF I will eat my own ****. Not sure where they get that number from...

I guess this is one of the pitfalls of announcing your product and not providing any performance numbers - it lets your rivals run wild with speculation!

MikeHunt79 · 19 Oct 2009 at 10:27

Fenris said:
Holy unreleased card, Batman!

Batman should be good on the Fermi at least...

If Nvidia go on about an unreleased card so much, no harm in ATI at least comparing it

Greebo · 19 Oct 2009 at 10:40

Duff-Man said:
Nvidia "shader cores" aren't comparable with AMDs. The 240-core GTX280 was faster than the 800-core 4800 after all. AMD is well aware of this fact, and apparently chooses to ignore it.

Also, if Fermi has a maximum throughput of 1.5TF I will eat my own ****. Not sure where they get that number from...

I guess this is one of the pitfalls of announcing your product and not providing any performance numbers - it lets your rivals run wild with speculation!

I thought it was Nvidia who have themselves stated that the Fermi has 8x the single precision power of the GTX280?

In which case that does indeed equate to 1.5TF.

Toastor · 19 Oct 2009 at 11:03

So, AMD have 'estimated' that their card is better. Whoopee-doo.

hominid · 19 Oct 2009 at 11:10

mmj_uk said:
Total guesswork.

Toastor said:
So, AMD have 'estimated' that their card is better. Whoopee-doo.

ATi have based their predictions on a pdf of the specs for Fermi on NVidia's web site. It's true that not all the specs are there, so some of the performance estimates have to be inferred, but mostly it's correct according to NVidia's own released spec.

Duff-Man · 19 Oct 2009 at 11:30

Greebo said:
I thought it was Nvidia who have themselves stated that the Fermi has 8x the single precision power of the GTX280?

In which case that does indeed equate to 1.5TF.

The GTX280 had a single precision throughput of 0.93TF, so no, that doesn't add-up.

Greebo · 19 Oct 2009 at 12:17

Duff-Man said:
The GTX280 had a single precision throughput of 0.93TF, so no, that doesn't add-up.

Oops I got it slightly wrong, teaches me to post from memory

Nvidia claim in their white paper that Fermi has 8x the Double Precision performance of the 2xx series not single precision as I posted.

http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiArchitectureWhitepaper.pdf

256 FMA ops/clock double precision, 512 FMA ops/clock single precision......8x the peak double precision floating point performance over GT200

taking 96 GFLOP as the GTX285's Double Percision performance:

8*96 = 768 GFLOP which is where ATI have got their figure from for the chart. Then again using the Nvidia white paper which states that the Fermi has twice the performance in single versus double precision and you get the to 1.5TFLOPS for the single precision performance.

The difference is the GTX285 it had almost 10 times the single precision performance compared to its double precision (933GFLOPS single vs 96GFLOPS double). Clearly Nvidia have concentrated on boosting the double precision of the Fermi and the ratio is only double now.

So unless Nvidia are doing a screwball with their white paper I would indeed agree with ATI comparision of the single and double precision speeds although as said, number of cores is irrelevant.

EDIT: I look forward to seeing pics of you eating your hat?

LeJosh · 19 Oct 2009 at 12:47

mmj_uk said:
Total guesswork.

Says at the bottom of the photo that they based it on publicly available information and I think if the shader is at 1.5GHz.

straxusii · 19 Oct 2009 at 13:23

ATI marketing should just keep quiet and let Nvidia marketing continue to dig themselves a deeper hole

Duff-Man · 19 Oct 2009 at 15:26

Greebo said:
taking 96 GFLOP as the GTX285's Double Percision performance:

From where did you get the GTX280 peformance being 96GFLOP? It's off by about 20%.

The GTX280 can perform 3 double precision operations per (shader) clock, and had 30 dp processing units. At 1296mhz this equates to 116.64GF [1296*30*3 MF]. Alternatively, you can simply divide the quoted single-precision performance by 8 (since there is one dp unit for every 8 sp units), which gives you 116.625GF. The small difference is accounted for by rounding of the single-precision performance to 3 significant figures.

AMD state in that picture that they assume a 1500mhz shader clock for Fermi. This, along with the information from the whitepaper (8x the performance of GT200 clock-for-clock), implies that the double precision performance is 1080GF [i.e. (1500/1296)*8*116.63]. If we further assume double the single-precision performance (as is also stated in the whitepaper) then we arrive at around 2.16TF for the Fermi. This is more in the expected ballpark.

Of course, the shader-clock speed is the 'great unknown' in these computations, but AMD explicitly state that they assume a 1500mhz shader speed (which won't be far off the truth). Again, I state that AMDs computation of single- and double-precision performance is inconsistent.

Greebo · 19 Oct 2009 at 15:46

Duff-Man said:
From where did you get the GTX280 peformance being 96GFLOP? It's off by about 20%.

The GTX280 can perform 3 double precision operations per (shader) clock, and had 30 dp processing units. At 1296mhz this equates to 116.64GF [1296*30*3 MF]. Alternatively, you can simply divide the quoted single-precision performance by 8 (since there is one dp unit for every 8 sp units), which gives you 116.625GF. The small difference is accounted for by rounding of the single-precision performance to 3 significant figures.

.

Err that's wrong though isn't it? It can't perform 3 double precision operations per (shader) clock, only 2 plus one single precision.

GTX 280, reference clocked at 1296 MHz. Notice that Port 0 instructions can be multiply-adds (2 flop/cycle) and Port 1 instructions are just multiplies (1 flop/cycle):

Single precision:

1296 MHz/s * 30 SM * (8 SP/SM * 2 flop/cycle per SP + 2 SFU * 4 FPU/SFU * 1 flop/cycle per FPU)
= Port 0 throughput + Port 1 throughput = 622080 Mflop/s + 311040 Mflop/s = 933 GFlop/s single precision

For double precision:

1296MHz/s * 30 SM * 1 double precision FPU * 2 flop/cycle = 78 GFlop/s

The Port 1 units can be co-issued with double precision instructions, so can also process 311GFlop/s of single precision multiplies while doing double precision multiply-adds. [That’s probably not terribly useful without single precision adds though.]

You have wrongly assumed that the dp can process as many flops/cycle as the sp when in fact's it's only two thirds (116.64 x 2/3 = 77 GFLOPS)

ANyway, I'm right unfortunately so unless Nvidia is lying with their 8 times faster dp speed then it will only be 1.5TFLOPS for single assuming a 1500 shader speed.

I don't know where you have got your "expected ballpark" for the Fermi. The only way that it can be faster than 1.5TFLOPS is if Nvidia are lying about the 8x performance of the GTX2xx series or the shader speed is a lot more than 1500.

Please show me where I am wrong if you think I am?

Oh and this which seems to confirm that I am not the only one who calculates DP performance this way:

Speaking of double precision, the Fermi has implemented IEEE 754-2008-compliant double-precision floating point operations. As we discussed in our Radeon HD 5870 exposé, gigaFLOPS stands for one billion FLoating point Operations Per Second. A floating point operation is a basic calculation used by the CPU to process code, especially “scientific” ones like computer AI, video encoding and physics. Double-precision FLOPs ensure a high degree of accuracy in these calculations, which translates to more accurate rendering or encoding. We guess Fermi will be north of 700 billion precision FLOPS, while the HD 5870 weighs in at 544 billion. On the other hand, the HD 5870 will deliver an assbeating in the altogether less useful single-precision category with nearly twice the performance.

http://icrontic.com/articles/nvidia_fermi_dissected

Columbo · 19 Oct 2009 at 15:50

News shock "AMD claims its newest technology is better than its closest rival". :rolleyes:

elpedro · 19 Oct 2009 at 16:36

ManCuBuS said:
News shock "AMD claims its newest technology is better than its closest rival".

I refer you to this post good sir, ATI have only based it on what nVidia themselves have told us. Its a little more than mere fiction.

hominid said:
ATi have based their predictions on a pdf of the specs for Fermi on NVidia's web site. It's true that not all the specs are there, so some of the performance estimates have to be inferred, but mostly it's correct according to NVidia's own released spec.

Lightnix · 19 Oct 2009 at 17:39

512 FMA ops /clock as per the whitepaper here:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiArchitectureWhitepaper.pdf

FMA = 2 FLOP.
GTX 285 clock speed (so a reasonable, and judging by past trends in terms of Nvidia's GPU releases, somewhat optimistic estimate at Fermi's shader clock speed) = 1476MHz
1476*512*2 = 1511424 MFLOP/S
= 1511.424 GFLOP/S

Tah dah?