Why is Super Pi on Linux much quicker than Windows?

fishfishfish · 22 Nov 2008 at 20:15

My 4GHz Q9550 rig can hit 9.26 seconds on Super Pi on Linux, but running the same calculation on Windows it comes in a good few seconds slower. Is the Linux version of Super Pi written more efficiently (assuming that the code is different) or is it just an OS thing?

matt@q9550:~/Documents/Downloads/super_pi$ ./super_pi 20
Version 2.0 of the super_pi for Linux OS
Fortran source program was translated into C program with version 19981204 of
f2c, then generated C source program was optimized manually.
pgcc 3.2-3 with compile option of "-fast -tp px -Mbuiltin -Minline=size:1000 -Mnoframe -Mnobounds -Mcache_align -Mdalign -Mnoreentrant" was used for the
compilation.
------ Started super_pi run : Sat Nov 22 20:09:30 GMT 2008
Start of PI calculation up to 1048576 decimal digits
End of initialization. Time= 0.156 Sec.
I= 1 L= 0 Time= 0.424 Sec.
I= 2 L= 0 Time= 0.480 Sec.
I= 3 L= 1 Time= 0.476 Sec.
I= 4 L= 2 Time= 0.484 Sec.
I= 5 L= 5 Time= 0.480 Sec.
I= 6 L= 10 Time= 0.464 Sec.
I= 7 L= 21 Time= 0.464 Sec.
I= 8 L= 43 Time= 0.468 Sec.
I= 9 L= 87 Time= 0.464 Sec.
I=10 L= 174 Time= 0.468 Sec.
I=11 L= 349 Time= 0.468 Sec.
I=12 L= 698 Time= 0.468 Sec.
I=13 L= 1396 Time= 0.464 Sec.
I=14 L= 2794 Time= 0.464 Sec.
I=15 L= 5588 Time= 0.464 Sec.
I=16 L= 11176 Time= 0.456 Sec.
I=17 L= 22353 Time= 0.456 Sec.
I=18 L= 44707 Time= 0.436 Sec.
I=19 L= 89415 Time= 0.404 Sec.
End of main loop
End of calculation. Time= 9.265 Sec.
End of data output. Time= 0.056 Sec.
Total calculation(I/O) time= 9.321( 0.336) Sec.
------ Ended super_pi run : Sat Nov 22 20:09:40 GMT 2008

PhillyDee · 22 Nov 2008 at 20:51

Ask in the linux forum and people will say windows causes hardware to run slower.
Ask in the windows forum and people will say the code is better optimised on Linux.

BigglesPiP · 22 Nov 2008 at 21:12

What does the CPU use history look like for each core.

Could be that linux doesn't do threadswapping so much. I can see linux running less in the background too, despite there being far more processes.

Garp · 25 Nov 2008 at 23:35

Likely it's the way it was compiled, those are rather specific options mentioned:

Code:

-fast -tp px -Mbuiltin -Minline=size:1000 -Mnoframe -Mnobounds -Mcache_align -Mdalign -Mnoreentrant

pgcc is produced by the Portland Group to create highly optimised binary code, it's likely a combination of the above flags to the portland compiler could work very nicely with your chipset/architecture under Linux, but possibly not so nicely with others.

Windows one might have been compiled without any of those fancy options.

matja · 26 Nov 2008 at 12:39

I've noticed that the Windows version of SuperPi continually accesses the disk from where you run the program - it will return different results for different HD's/controllers/drivers even if you have the same CPU (for 1M I get about 17 seconds run from a SATA HD, over a minute for running from a USB HD). Since Linux handles block device buffering/caching better than Windows, this may be one reason for the discrepancy. In any case, it's a pretty outdated 'benchmark' that seems to have stuck for some reason.

dirtydog · 26 Nov 2008 at 13:25

It isn't Windows' fault because I can score higher in Linux Super Pi even running in VMWare on Windows, than Windows natively. It must be the way the Windows client was written or compiled that is the problem.

Anyway the actual times are not the reason most people run the benchmark - it is simply a good way of comparing the performance of different CPUs.

(Q6600 @ default, pi to 1M, Ubuntu/VMware = 16.865, Windows native = around 21 secs)

dangerstat · 26 Nov 2008 at 14:44

Yeah it's got to be due to the compiler. You could probably make SuperPI even faster by using icc with Intel's MPI libraries.

matja · 26 Nov 2008 at 20:12

Run SuperPI inside wine if you want to do a real comparision, it's not calling win32 API functions instead the inner loop anyway.

Nimble · 27 Nov 2008 at 21:06

That's not a good comparison, no WIN32 function calls in the inner loop doesn't mean they're not going to slow it down still.

If you want a fair comparison, write the code yourself.