Memory Errors?

iviv · 5 Jun 2010 at 16:16

This started off yesterday with a bluescreen while playing WoW, the error being:
MEMORY_MANAGEMENT, code 0x0000001A

Shrugging this off as a one-off, I rebooted, loaded up WoW again, played for about 20 mins before WoW itself crashed, with the error:
The instruction at 0x005595fc referenced memory at 0xffffffff. The memory could not be read.

Downloaded memtest86+ and ran it, only doing one pass, everything was fine. Restarted again, everything was stable.

And today, WoW once again crashed:

Once again, talking about Memory, though this time WowError caught it and sent a crash log off to Blizz.

However, all this constant references to memory errors have me rather worried, but memtest passed perfectly fine.

I'm using 4x1Gb modules of geil value ram. PC6400 (800Mhz) 5-5-5-18 running at 749Mhz, 4-4-4-12 timing.

The system's been stable for years, and it was much hotter last summer than this, so I don't think temperature's an issue, ram shouldn't get too hot anyway, right?

But... what could cause these errors to come up?

Edit: Apparently bad things come in groups. Was playing TF2, bluescreened, this time an error with dxgmms1.sys Currently downloading the feb 2010 DX redist, just to be on the safe side. Still clueless as to the issue :x

iviv · 5 Jun 2010 at 17:49

Broken Hope said:
WoW is actually really good at finding instability, that combined with the TF2 crash means you defo have something playing up.

Thanks for the confirmation

On a serious note, I'm currently giving memtest a proper run, going to let it do a few passes, just to completely rule out memory being an issue. The thing is, its been running Vantage and Heaven recenrly, so I'd have thought those would show up any instabilities, plus as I mentioned, it doesn't seem to be temp, as its pretty cool today, especially compared with days in the past, and last summer's heat wave.
The only thing I find weird is the memory errors when there's nothing wrong with the memory. Could it be the memory on my GPU that's acting up? Or the paging file, perhaps?

The recent dxgmms1.sys certainly implies graphics card issue, but the cards temps are fine when gaming.

iviv · 5 Jun 2010 at 18:55

****.

iviv · 5 Jun 2010 at 19:00

So, where do I go from here? Take all but 1 stick out, and run memtest on each stick to see the faulty one? Or is there any info in there which tells me which stick it is? I'm thinking the former

iviv · 5 Jun 2010 at 19:06

Ah, right.

Which one's going to be the last stick? Furthest away from the CPU, I assume?

iviv · 5 Jun 2010 at 19:17

Thanks. Chatting with a techie friend on MSN, he's saying it could be because the timings are at 4-4-4-12 instead of the ram's usual 5-5-5-1, could that have anything to do with it? Its not something I've altered myself, its set to 'auto' in the mobo I think, could that have caused this?

Either way, testing that one stick now.

iviv · 5 Jun 2010 at 19:46

Its manually set to 2.1V, wasn't stable at anything less.

Also, is the maths right here? The errors occured at 4263Mb, but 4 lots of 1Gb ram = 4096Mb?

Edit: 3 passec complete on what I think was the 4th stick, nothing. I'll try the first stick after 5 passes.

Edit2: I stuck all the sticks back in, went to bios to find an option to lower the timings back down to 5's... and I couldn't find an option to. Only ram options I could find were for the speed, which is 750Mhz (2x fsb), nothing to do with the timings. Weird :x

iviv · 6 Jun 2010 at 11:40

6 1/2 hours of memtesting on all 4 sticks together, 7 passes complete and no errors found.
Could that mean the first one was a false positive? A mistake? I'm just confused now :x

iviv · 6 Jun 2010 at 12:29

Hmm, needing reseating could be a possibility, its probably been knocked around a fair ammount. Also, did sme googling and found this helpful post: http://www.thetechrepository.com/showpost.php?p=54&postcount=2
While its for the P5B-Deluxe, I imagine it would be in the same place for my mobo. The only question I have is... what do I set all the options to?
CAS latency, RAS to CAS delay and RAS Precharge to 5, RAS active to precache to 18, but then what about the others? Does what he says below the pictures sound about right?

From watching the memtest, I believe I originally singled out the incorrect module to test on its own, the third stick was the one throwing up errors, for some reason, memtest was listing the memory pairs as being 0-2048Mb, and then 4097-61xx can't remember the exact values, but meaning that error fell on the third stick. I think I'm going to push the timings up, and then memtest overnight again if I don't get any crashes while playing games today.

iviv · 7 Jun 2010 at 08:54

Goddamnit.

This time, the errors seem to be all over the shop? Rather than one paticular address like last time. Now, I'm confused by this, as the only thing I've done since last time was change the timings, from the 4-4-4-12 that it was running at, to the 5-5-5-18 that they are designed (Though thinkig about it, 5-5-5-15 would probably be better), but either way, the timings are alower, so the ram should be less stressed, and less prone to errors?

I'm just confused now

iviv · 7 Jun 2010 at 09:26

That's the timings I had it at for that run.

And I agree regarding something being an issue at any timing, but the first errors were just one stick, the second lots of errors were over a couple.

iviv · 7 Jun 2010 at 11:44

At the moment I've removed the overclock from the system, running everything at default apart from memory and CPU voltage. CPU voltage is at 1.3 because I don't have the foggiest what the stock voltage was, and its been running at 1.4v fine, so 1.3 seemed like it would be easily enough.
Ram voltage I kept at 2.1V because that's where it was stable. Also, interestingly, now the cpu is back at stock speeds, the ram's defaulted back to 5-5-5-18 timings.
Anyway, that's been going 2 1/2 hours without errors, but I'll run it all day, and then beings the long arduous task of testing all the sticks :x

Funny thing was, a couple of weeks I was asking if I should chuck my system out and move up to one of the fancy AMD six core CPUs because mine was getting a little sluggish, and was advised to wait until their new CPUs in the new year, which made sense. But with all this going on, part of me wants to say 'screw it' and splash out now :x

iviv · 7 Jun 2010 at 12:46

Thing is, the cheapest i7 cpu is £55 more than AMDs hex core, both running at 2.8Ghz. Plus, Bulldozer should work in AM3 800 chipset motherboards, so I would have a viabl upgrade path, whereas I believe Intel's next range of CPUs will require new everything?

As for the CPU Overclock, I don't think that could be the issue, its been running those speeds for close to three years now with no issue. But I'm three hours into another memtest with no issues so far.
Then again, Its passed memtests before without showing issues, so I don't know. That's partly why I want to get shot of it, but main problem with selling it is that I'd have to mention this issue when I sell it, which I would imagine will decrease the value of it.
Unless I ebay it and deny everything!

iviv · 8 Jun 2010 at 12:18

24 hours on the default clocks and everything passed ok. Even so, I'm going to give the stick that was throwing up errors another 24 hours on its own, so that just it gets stressed, should mean more passes on it.

Its weird, my overclock's been stable for years, weird that its gone now. Guess I'll have to lower it a bit, even though my CPU temps have always been fine.

Actually, having said that, cpu temps were never the issue when I was originally overclocking it, the northbridge couldn't handle it. Could the fact I changed from a 4850 to a 5770 graphics card have done it, since the card is rather close to the northbridge? Slightly more heat kicked out means its overheating now its summer?

iviv · 8 Jun 2010 at 13:00

Ah, good point. How's best to rule out the motherboard? I don't have any spare CPUs or RAM lying around to switch them around, and when I do test it, should I be running at stock speeds or overclocked?

iviv · 8 Jun 2010 at 14:38

I've just been reminded of one thing, though. In fact, its been happening for a little while but I completely forgot about it. Sometimes, on starting up I get the message: CPU Fan error during POST. Now, the fan's correctly plugged in and spinning, and switching the PC off and on again got rid of the error. But it happened just now after I finished testing the single stick on its own (9 passes all ok). I was planning on restoring the overclock, and then stressing for a few hours with prime95 or something, and keeping an eye on it, but after 4 attempted restarts and getting the 'cpu fan error' it makes me think maybe the mobo is at fault?

Also, if it was the CPU at fault, why would that make it fail during memtest? Its not exactly being stressed by that, is it?

iviv · 8 Jun 2010 at 15:49

True. Any ideas on the CPU fan error?

Anyway, running OCCT to stress it now. Will give that a couple of hours, see if anything breaks. I'm running back at my original overclock, memory being set automatically.

Edit: An hour on OCCT and nothing. Highest it got was 50C. But then again, its been raining all day, not hot hot sun.
Also, noticed that the cpu mutiplier is jumping between 8.5 and 9x. Guess I hit the speedstep button sometime >_>

iviv · 8 Jun 2010 at 22:25

Larga data set should stress both the CPU and RAM. 4 hours, no errors :x

I'll wait for another hot day, run it again, but as it stands I think I'm just going to sell it on and get a new system anyway. SupCom 8 player huge maps aren't really playable anyway XD