Running Intel Burn Test/Linpack 'Properly'

Soldato
Joined
1 Jun 2010
Posts
7,053
Location
London
Hi guys

I just came across this article yesterday on how to run Intel burn test/Linpack properly. I don't know know if you guys read it but it seems very interesting. Ofcourse re-reading the article several times including comments will hopefully make it more simpler to understand:

http://www.overclock.net/intel-cpus/645392-how-run-linpack-stress-test-linx.html

Apparently from my understanding author suggests before running the program:

-Disable C1E and EIST in bios
-Close any programs such as web browing etc including antivirus temporarily
-Go to Windows task manager and try to disable as many background running programs as you can if possible
-Look at the Free physical memory available in windows task manager
and not the available memory
-Choose the 'Custom' stress testing in IBT and input the 'Free' physical memory (preferably slightly less) for accurate testing.
-Need to look at the consistent values of GFlops (speed) during stress testing i.e the speed at which cpu is calculating those equations.

E.g Theoretically regardless of how much actual physical ram you use:

Single core processor @ 3GHz: 4x3 = 12Gflops
Dual core processor @3GHz: 8x3 = 24Gflops
Quad core processor @3Ghz: 16x3 =48Gflops
Six core processor @3Ghz: 24x3 =72Gflops

Those figures above are the theoretical maximum speeds at which cpus can perform stress test calculations at 3GHz and Gflops increase linearly as you increase cpu frequency (speed).

So a Quadcore @ 4GHz : 16x4 = 64Gflops

Here is intel info to back those Gflops values:

http://www.intel.com/support/processors/sb/cs-023143.htm#3

For example My Q6600 @2.4GHz should give me 16x2.4 =38.4Gflops in stress test.When I run IBT with custom setting and input free physical memory I get 30.2Gflops roughly which is still acceptable though I should get close to 34Gflops going by the article and this is to be expected as real test values could never reach maximum theoretical values due to L1,L2, mobo chipset,ram etc.

However what author seems to suggest is that if you use 'maximum' stress level, you will be using 'available ram' which consists of both physical and virtual memory. This would slow down your cpu and you may get 15Gflops as opposed to 30+Gflops. So with low Gflops values, your temps will be lower and your test will be invalid.

Author suggest running IBT for 30-50mins or longer than 10 passes:).


Originally posted by PERSPOLIS

How to run LinPack stress test(LinX/IBT)properly-an explanation(maybe a guide)
About LinPack

The LinPack stress test is an amazingly well-optimized app that solves a system of simultaneous linear equations by the Gaussian elemination method.Anyone who has programmed an equation solver or at least has studied the algorithm knows that there are lots of memory traffic involved.Without optimized dram access,the CPU would be twiddling its thumbs while the memory is struggling to catch up.Actually the chipset & memory run like hell to keep up with the CPU.Hence this test stresses CPU,L1&L2 caches,chipset & memory all at the same time.It is an excellent all-around stability test,if performed properly.

If this test is so good then why so many people complain against it?some people state that they pass this test but fail prime95 or even worse,get bsod when running normal apps like games.Others report they can pass the test one day & fail another day without any change in their settings.
The short answer:They do not run the test properly!

So...how come I can't run the test properly?

In short The background apps,unneeded os services,unnecessary paging & using the pc while the test is runnig steal so many precious CPU cycles from the test as to render it totally useless.
Virtual memory management(paging) is a prime suspect.Windows uses most of the unused ram to cache files,which is good.The problem starts when our app(LinPack)asks for a certain amount of physical ram,but the os only gives us a mixture of physical & virtual ram,even when there is enough ram to meet the demand.Let's say our system has 3200mb of free ram in win64.Windows uses a large chunk(say 2000mb)for file caching.We run LinX/IBT & start a test that needs 1800mb of ram.The OS decides to preserve the file cache,so it gives us 1100mb of physical ram & 700mb of virtual ram.This means the cpu wastes a considerable amount of time waiting for the OS to read/write data to/from hard disk,which screws the test big time.The test takes longer to compelete each pass,the core temps are lower,we observe wild fluctuation in temps & the system is not stressed enough.

But it could be even worse.Let's say you have an oc that is certified to be stable thru numerous different tests.If you use your system for a couple of hours & then try to run LinX/IBT with a problem size that say,needs 2000mb of ram,the os may only give you 500mb of physical ram & 1500mb of virtual ram.Now something weird happens:the test fails very fast(sometimes in only one pass)even with settings that are certified to be stable.I'm not sure about the reason,but it most certainly is a software bug.After all LinPack is not designed to measure hard disk traffic!Now if you force the OS to flush the cache & run the test without a reboot,you can pass the test with flying colors again.

That's why I believe a test run with 2400mb of physical ram is better than one with 3200mb of physical & virtual ram combined.

How to run LinX/IBT properly

-temporarily disable all unneeded apps running in the background
-temporarily disable the auto protect feature of your antivirus
-temporarily close the side bar in vista
-temporarily disable unneeded OS services,including superfetch/prefetch,readyboot/readyboost,windows defender,screensaver,...
NOTE:If you are not comfortable with disabling the OS services,then reboot & let your computer sit idle for 10 minutes before running the test.Do not use your pc or run any app during this period.You also need to observe all the other steps mentioned here
-Run only one temp monitoring app(hwmonitor,realtemp,coretemp,...)that works best for you
-In task manager-performance tab,check the amount of FREE PHYSICAL MEMORY & use a bit less(200-300mb less)
-In win64,with 4 gigs of ram,use 2400mb or more.In winXP 32bit uSe IBT & select 2047mb of ram
-Positively do not use the pc while the test is running

If windows is using most of your ram for caching & you want to flush the cache,run LinX/IBT & select a high amount of memory.For example if you have 4 gigs of ram,in vista64 use 3000-3200mb.Start the test & let it run for 30-40 seconds,stop the test,close LinX/IBT & check your free memory in task manager.

Another alternative is running the test in safe mode.It seems a good idea,but I have not tried it myself,so I'm not sure about its pros & cons.

...But we still need an idex to prove we are running the test properly

Even if we take all the steps mentioned above,we still need an index(call it a criterion or guideline if you like)to make sure we are running the test properly.

Lest's face it;We have seen people that calim 10 or even 5 passes is all you need to prove you are stable,but there are others that suggest 100,200 or even 500+.The only reason these people give is that the test has failed,say after 70 passes so you need 100+ iterations.In fact a test may fail due to reasons unrelated to your oc settings,but discussing the reasons needs another thread.

Without an index it's like flying in the dark,we may reach our goal,but we can never be sure.
I mean 10 passes of a good test can catch errors that 100 passes of the test run blindly cannot!

So let's try to find a good index:

-CPU usage:seems promising.We only need to make sure our CPU usage is close to 100%.Right?
Wrong.To verify,run Wprime,prime95 & LinX(or IBT)separately,while monitoring CPU usage & core temps.The cpu usage should be close to max for all 3 tests,but the temps are higher in prime95 versus wprime.LinX(or IBT) temps should be the highest of all 3,which means LinPack Stresses the system more than prime 95,which in turn is more stressfull than wprime.
So CPU usage is not acceptable as an index.

-Core temps:very system dependent.The temps could be vastly different due to case cooling,CPU heatsink,ambient temp,CPU vid,....
We need an idex that is comparable across differnt rigs.
So temp is not acceptable as an index.

-CPU performance in GIGAFLOPS:
This is the index we have been looking for.
GIGAFLOPS stands for giga(one billion)floating point operations per second.
LinPack displays the GIGAFLOPS at the end of each pass,so we only need to have an estimate of it before we run the stress test.

Linpack uses 64 bit(double precision)floating point numbers to store the coefficient matrix etc.
It also uses SSE(2) instruction set & registers to run as fast as possible.Each SSE2 register is 128 bit wide,so we can pack 2 64 bit values in a single register.Current processors can perform a multiply-add operation on each 64 bit value in a single cycle.So it can do 2 multiplies+2 adds per cycle,which translates into 4 FLOP(floating point operations)per cycle.This is for one core.For a dualcore the value is 8 FLOP/cycle.With a quadcore it is 16.

Now let's calculate cpu performance in term of GIGAFLOPS for a few chips:

singlecore @ 2 GHz:2X4=8 GIGAFLOPS
Dualcore @ 3 GHz:3X8=24 GIGAFLOPS
quadcore @ 4 GHz:4X16=64 GIGAFLOPS
sixcore @ 4 GHz:4X24=96 GIGAFLOPS

Note that the above values are only the upper limits.The actual value we get in LinPack is somewhat lower due to the os overhead,LinPack bookkeeping & because Linpack cannot keep the CPU execution units busy all the time.


Calculating & estimating CPU performance in term of GIGAFLOPS

NOTE:From now on I use win64 with 4 gigs of ram in all the following discussion unless noted otherwise.

The gigflops value as reported by LinPack is roughly constant for a given CPU at a given core clock.The impact of ram speed & FSB is very small.
This is very important.It means we can have a fairly accurate estimate of the GIGAFLOPS we should achieve even before running the stress test.This is the index I have been talking about.
For example the index for E5200 @ 3 GHz is roughly 20 GIGAFLOPS.The ram speed & FSB could make this value change from 19 to 20.5;Hence if we run LinX/IBT and only get 15 GIGAFLOPS then we are obviously performing an improper test that is not very useful.I have seen people running an E5200 or E6300 @ 4+ GHz and only getting 13 GIGAFLOPS!!That's also why you sometimes see people getting ridculously low temps while running LinX/IBT.The temp differnce between a proper & improper run of the test could be more than 20 c°.

Furthermore,we only need one GIGAFLOPS estimate per chip.We can calculate other values by proportion.Let's say for an E8400 @ 3GHz the index is 21 GIGAFLOPS.Then for a 4GHz oc we can expect a performance value of 21X4/3=28.

But all these need to be verified.To this end,I performed a number of tests on my sig rig in vista64.First I overclocked my cpu to 3 GHz(333X9)with my ram clocked at 1066 mhz.After several runs of LinX(just to make sure) I found the GIGAFLOPS for this oc.Then I kept FSB & ram speed constant & raised the multi.In each step,I report the actual GIGAFLOPS as displayed by LinX and a value that I have calculated from the base(3GHz) oc.Here are the results:


CPU clock--------FSBXMulti---RAM Speed----Actual GIGAFLOPS---Calculated GIGAFLOPS
(GHz)--------------(MHz)
__________________________________________________ _________________
3.00--------------333X9.0---------1066--------------20.3-------------------- 20.3
3.16--------------333X9.5---------1066--------------21.2-------------------- 21.4
3.33--------------333X10----------1066--------------22.2-------------------- 22.6
3.50--------------333X10.5--------1066--------------23.1-------------------- 23.7
3.66--------------333X11----------1066--------------24.0-------------------- 24.8

As you see,the calculated & actual values are very close,which proves our point.
Also note that the calculated values are higher than actual values & the delta becomes more as we oc higher.The reason is we oc the cpu(and L1 & L2 caches),but keep the memory clock constant.To see if this is really the case I ran another test:

CPU clock-----FSBXMulti-----RAM Speed---------Actual GIGAFLOPS
(GHz)-----------(MHz)
_________________________________________________
3.00----------333X9.0----1066 with optimized settings-----20.3

3.00----------333X9.0----800 with stock settings-------------19.5


Once more our point is proven.In short,the impact of ram speed is small.The calculated value could roughly overestimate the actual value by 1%-3%.

Next let's try to make the calculated & actual GIGAFLOPS equal!This is done by overclocking CPU/FSB/RAM at the same time and by the same amount.The aim is to verify our assumption!

We start by setting CPU/FSB/RAM @ 2.5/200/800.Then we oc by 20% which is CPU/FSB/RAM @ 3/240/960.Finally we oc by 33% with CPU/FSB/RAM @ 3.33/266/1064


CPU clock--------FSBXMulti---RAM Speed----Actual GIGAFLOPS---Calculated GIGAFLOPS
(GHz)--------------(MHz)
__________________________________________________ _________________
2.50-------------200X12.5----------800--------------16.24-------------------- 16.24
3.00-------------240X12.5----------960--------------19.54-------------------- 19.49
3.33-------------266X12.5---------1064--------------21.71-------------------- 21.60

The results speak for themself.

Now let's guess the expected GIGAFLOPS for my CPU @ 4 GHz:20.3X4/3=27.06
But because I'm not overclocking my ram the actual value is a bit less.by consulting the first table I estimate the actual value to be 25.8 GIGAFLOPS.

We don't even need to find out our actual base GIGAFLOPS ourselves;We can ask others.Let's say you have an E8400.You ask other(reliable)people for an estimated(or measured) GIGAFLOPS for your chip @ stock.You are given a value of 21 GIGAFLOPS.Now you want to calculate the expected value for an oc of 3.6GHz.SO 21X3.6/3=25.2.Now a good estimate for your oc should be around 24.5 GIGAFLOPS.

The following is the estimated GIGAFLOPS for a few chips:

E5200 @ 3 GHz 19-20 GIGAFLOPS
E8400 @ 3 GHz 21-22 GIGAFLOPS
E9550 @ 4 GHz 54-56 GIGAFLOPS

I5 quadcore @ 4 GHz 59-61 GIGAFLOPS
I7 quadcore @ 4 GHz 60-62 GIGAFLOPS
Gulftown 6core @ 4 GHz 90-93 GIGAFLOPS

How many passes?

I can only talk about my choice,yours may be different and I can understand that.
I suggest running the test for 30 to 50 minutes,but not less than 10 passes.

What about Win32?

Here is the good news.Almost everything I said about running the test properly applies to winxp 32 bit as well.Using IBT with 2047mb of ram & making sure I'm running the test properly,I have always been able to reproduce an error that has occured during a Linx/IBT run in vista64.
Note that in win32,linpack uses 32 bit code,but in win64 it uses 64 bit code that runs faster.Also vista64 memory management is much better than 32 bit xp.As a result the cpu performace(in GIGAFLOPS)is lower in win32.For example the GIGAFLOPS for my cpu @ 3.66GHz is almost 24 in vista64 & almost 19.5 in xp 32 bit.So non of the GIGAFLOPS values given for win64 are useful for win32.You need to work out the proper numbers yourself.

Acceptable tolerances

It depends on the accuracy of your estimated GIGAFLOPS value.Generally a value of -1% to -3% should be ok.Personally,I try to make my estimate as accurate as possible;Then I only allow a tolerance of around -1.5%.Let's say I'm expecting 23.6 GIGAFLOPS.I accept every pass with a value of 23.2+ as acceptable.Now if the total no. of passes with a value of 23.2 GIGAFLOPS or more is less than 10 the whole test is unacceptable to me.You really need to experiment & decide for yourself.


The number of calculations performed during each pass

The number of floating point operations(FLOP)performed during each pass is a function of n,where n=the no. of equations(problem size)
I have done the math.The result is a cubic polynomial: an³+bn²+cn+d where a,b,c & d are known values.It is possible to keep only the highest order term(an³)and still get very good accuracy.
Here is the result:
The number of math operations in each pass=(2/3)n³=n³/1.5 FLOP
Divide by 1e9(one billion)to convert to gigaflop.
Note that this should be equal to the product of Time X GFlops as reported by LinX for each pass.

As an example let's calculate the no. of math operation with a problem size of 10000:

10000³/1.5e9=666.7 gigaflop

Now we can calculate the time needed for each pass even before starting the test.This is the net time Linpack spends to solve the equations.It needs a little extra time to calculate & fill the arrays at the start of each pass.


Here I try to clarify a few points:

-The max gigaflops is measured like this:a few numbers are loaded into cpu registers & math operations(multyplies & adds) are performed on them billions of times.There are no L1,L2 & ram access involved.This means we get the same number for similarly clocked e5200,e7300 & e8400.Similarly,we get the same result for similarly clocked q8300,q9400 & q9550!
As you see the result(while correct)is useless as a benchmark.It only serves as a performance upper limit.
In real world apps,we can never ever get the max gigaflops.
A CPU with more cache and/or more advanced architechture helps us get closer to the max value.

-The impact of ram on CPU performance is small,but not trivial.This means if you are using very slow ram with a powerful CPU,your gigaflops performance would suffer.The performance hit could be even more with a quadcore.
Also note that if you are using a combination of Nvidia chipset+slow ram+untweaked bios settings,the memory subsystem may become a bottleneck.

-It seems this test is somewhat more optimized for INTEL processors & chipsets,so the difference between max & actual gigaflops could be a tad more for AMD processors.

-Still,the best method is to measure your basline gigaflops as I've explained in the op regardless of the max value.
 
Last edited:
Soldato
OP
Joined
1 Jun 2010
Posts
7,053
Location
London
WZ30 did you set the Threads to suite your CPU, mine CPU i7 920 = 8threads. Do that and run Intelburntest again.

I just leave threads on auto. Do you think I should manually set it to 4 ?

Edit: When I leave threads on auto, I get 30.2..Gflops. When I leave threads on 4, I get 27.4.. Gflops. So seems like by leaving threads on auto I get higher Gflops in IBT.
 
Last edited:
Back
Top Bottom