S&M 1.7.6 issues

24 Jan 2006
Opteron 165
MSI Neo4-F
2x1GB Gskill HZ PC4000 (Just replaced as an RMA as one stick was faulty)
POV 7800GTX 256MB
Enermax 465W PSU
Thermal Take Big Typhoon
Antec P180

I posted a few weeks ago about randon prime fails at both stock and when OC'd. After some time spent the memory was to to blame with errors showing up in Memtest 1.65 that strangely didn't show in the OCZ 1.0 version. Anyhow this has now been replaced.

Memory is now 20hrs + stable in memtest @ 250Mhz 3-4-4-8 1T
Prime is stable @ stock 1.35v 1.8Ghz and at 1.40v, 2.5Ghz. Currently on 14hours and counting for the latter so is looking good. It did fail prime the other day but that was at 2.6Ghz and I think I was @ 1.44v. Probably my own fault as I had only upped the voltage by 0.2v from when it fell over during a quick and dirty OC test using clockgen to up the HTT every minute while running prime to get a rough Idea of scaling vs voltage. It actually ran all night and only failed when the heating kicked in the following morning and the room temp raised quite a bit. With a little more voltage I'm sure 2.6 would be fine and I even ran for an hour at 2.7 with 1.54 v but the CPU was a little too hot so I'd backed down to 2.6 and was looking for the minimum stable voltage.

Rather than wait for prime I decided to use S&M to test the next couple of voltage increments. I only have voltages up to 1.4v at 0.25v increments. I also have % increases of 3.3 , 6.6, 9.9 etc up to 19.9 so I use these with the voltage to make the steps. eg. 1.40v -> 1.375v +3.3% (1.42v) -> 1.350v +6.6% (1.44v) etc.

When I run the full S&M suite it always fails on core 1 FPU test on the second loop. This is at both 1.8Ghz and 2.6Ghz. core 0 is fine.
All further testing at stock 1.8Ghz
To rule out issues I looped the S&M memory test - Ok after 6 hours
I looped the cache and integer tests - OK after 3 hours (fed up of waiting)
I looped the FPU test - ok after 6 hours
I looped the Power test - ok after 3 hours (bored again)

If I loop the interger and FPU test together it fails on core 1 second loop FPU test.

If I loop the FPU and Power test S&M periodically say it has an error and has to close, the VGA window stops responding but the FPU test is happily looping in the background.

So this is driving me nuts, days of testing have passed and I am no closer to knowing if it is my CPU or Mainboard or even if it's just S&M that has a problem.

3dmark2005, Fear, Doom3, Quake IV etc are all fine, I've had no other crashes.

I'd just like to know if other Dual core AMD users have had similar problems.

Unfortunately nothing so simple. Downloaded S&M again and replaced the files. Still have the same issue with core 1 pass 2

Also checked the PSU voltage, 12.01v +/- 0.02v , 5.09v +/- 0.01v 3.39v +/- 0.01v accoring to my Fluke. voltages compated on and off load.

The reinstall of S&M seems to have fixed the 'Must shut down' errors. Have been looping FPU and VGA @2.5 Ghz for the past couple of hours which is usually more than enough to provoke it.

Just a shame I still get the core 1 error going from interger to FPU whatever clockspeed I use.

Figure I'll give it a final run of Prime overnight and if I don't get any errors I'll leave it at 2.5 for a while.

Temps are around 54C in a 23C room at max load so I figure I have some headroom for summer when temps are likely to approch 60C though I doubt I'll be running such a heavy load.

This is the first chip I've owned that gets majorly hotter when I increase core speed. apppox 10C difference in load temperature from 1.8 to 2.5

The temperature increase is unusual. I've removed and reseated the HSF a good number of times and each time the thermal paste is a nice thin opaque layer.

The temperatures are consistantly approx 4C better at load than I obtained with the orignal 4 heatpipe HSF which I'd refitted a number of times.

I think the chip may have an issue with the contact between the heatspreader and the cores but I'm not yet prepared to risk removing the heatspreader and potentiall killing the chip.

The other option is that core is not great silicon and does produce a lot more heat when the frequency goes up.

I'll have to see how brave I feel at the weekend.

I wiped the disk and reinstalled XP from scratch last night. At 2.5Ghz I've just completed 18hrs of dual prime followed by 3 hours and counting of looped 3dmark2001. Faultless.

The error in S&M always shows '2' so it's consistant. There may even be a fault on the core, just that nothing else finds it.

Thanks for the help


Change of heart last night resulted in the decapitation of my core. The IHS is now removed.

Temps seem to have dropped 2-3C under load so not a dramatic change, and it suggests the IHS was doing a good job.

Just started priming at 2.6Ghz, 1.49v (set) 1.45v(monitoring) Temp seems stable at 51-52C vs room temp of 22C.

I still get the same error in S&M so I've decided to take your advice and ignore it.

Also I've found the cause of my 'Serious Errors'. Seems my USB wifi stick, used to prevent the little monster getting on the internet causes a crash at very high CPU loads. It would always loose the wifi connection under extreme load but it also seems to cause a crash when installed under such conditions. This got worse once I set the priority for prime to 1 below the maximum. With prime on max the mouse pointer will only move a fraction at a time !

Have to see how this pans out


Top Bottom