[solved] Bad Memory not diagnosed by MemTest86 caused amdkmdap errors and instability

Associate
Joined
17 Mar 2003
Posts
351
Location
London
I just want to share my findings here since this community has often helped me with other problems in the past (besides, not sharing would be a waste of good testing data :)). Please forgive the double post as this one explains the problem and how I figured it out, while the next one gives details of the actual testing results.

I've been having instability issues ever since I put together my system 6 months ago, but I suspected my graphic card since I've always had crashes during gameplay and with videos, the error always being "amdkmdap stopped responding". Last few weeks though I had random crashes of firefox, mediamonkey and so on and I started to suspect my cpu and motherboard. I did not suspect my RAM (anymore) because I had run MemTest86 3.5 for over 8 hours over night on a couple of occassions and it never found any problems. However, I finally managed to nail down the problem to one of my GEIL 2GB PC3-1066 memory sticks thanks to having two nearly identical systems, a lot of careful testing with Prime95 and a few games that consistently crashed.

On two different systems the PC becomes consistently unstable when that one stick of RAM is used regardless of slot used so I'm 100% confident that the stick is faulty. Prime95 will fail quickly for blend or large FFTs, but never fails for small FTTs and when running Civ V (Dx11), Guardian of Light or Blood Bowl I ALWAYS get amdkmdap errors with the problem stick and NEVER without it. Guardian of Light and Blood Bowl will usually run for a while before failing and may survive the "gpu crash", while Civ V (Dx11) tends to crash almost immediately most of the time. I now also wonder if my long running issues with ATI 5770 were all due to the faulty RAM, or if that was a seperate issue finally resolved by the bios and driver updates I've been doing.

I still need to RMA the memory pair and explain that the problems do not appear in MemTest86 - hopefully overclockers have better testing techniques that will confirm the faulty memory and I can get a new pair with no issues this time.
 
System 1
Win7 64-bit (fully up-to-date)
Gigabyte GA-MA770T-UD3 (latest bios F8b and latest chipset drivers)
AMD Phenom II x4 955 3.2 GHz (Titan Fenrir cooler)
2 x 2GB GEIL PC3-10660 RAM CL=9-9-9-24
Sapphire ATI HD 5770 (updated with latest bios)

System 2
Same as above, but with AMD Phenom II x2 550 and a Nvidia Geforce 8800 GT card.

Testing Utilities
MemTest86 3.6 (bootable CD) and 4.1 (bootable USB)
Prime95 Windows64 v25.11, build 2 (tested with "round off checking" on (mostly) and off.
Core Temp 0.99.7

MemTest Results of Bad GEIL stick
Note: Tested with only the known bad memory stick and on both systems
Memtest86 3.5 (System 1): 17 Passes (~8 hours), Errors: 0
Memtest86 3.5 (System 2): 4 hours, Errors: 0
Memtest86 3.5 (System 2): Set MemTest to probe - messes up display of data, but still shows no errors after 2 passes.
Memtest86+ v4.10 (System 2): 3 Passes (1hour 20 minutes), Errors: 0
Memtest86+ v4.10 (System 2): Set MemTest to probe hangs the machine

Prime 95 - System 1 with bad stick only
Note: Max temp 44 C (average 40-42)

Blend (tests some of everything, lots of RAM)
- FATAL ERROR: Rounding was 0.5 expected less than 0.4
- Error reported after 0 or 1 passes on all 4 cores (less than 2 minutes until failure). Confirmed by running many times!
- Occasionally on starting test CPU Frequency will not switch from x4 to x16 and load on all CPUs is reported by Core Temp as 2-10%, but machine is very sluggish and unresponsive - in the first couple of minutes one or more cores will switch to x16 and 100% load and fail soon thereafter as per error above.
- One occasion of blue screen with error code 0x00000050 when dragging prime95 window during test.

In-place large FFTs (maximum heat, power consumption, some RAM)
- Still reports error as for blend test, but takes longer, several tests (4+), before error occurs. Tried several times.

Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested much)
- No errors reported even after over an hour of testing

Prime 95 - System 2 with bad stick only

Blend (tests some of everything, lots of RAM)
- FATAL ERROR: Rounding was 0.5 expected less than 0.4 on core 2 less
than two minutes
- FATAL ERROR: Rounding was 0.5 expected less than 0.4 on core 1 after 57 minutes
- FATAL ERROR: Rounding was 0.5 expected less than 0.4 on core 2 after 56 minutes
- Same sluggish and unresponsive behavior on starting Prime95 as with System1 with cpu usage 2-8% for a minute or two before going to 100% and machine becomes more responsive.

In-place large FFTs (maximum heat, power consumption, some RAM)
- Skipped testing of this as evidence already quite conclusive and running out of time

Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested much)
- No errors reported after ~10 minutes

Prime 95 - System 2 with good good memory stick only
Note: Max temp 49 C

Blend (tests some of everything, lots of RAM)
- No errors reported after 2 hours and 11 minutes

In-place large FFTs (maximum heat, power consumption, some RAM)
- No errors reported after 10 minutes

Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested much)
- Skipped due to lack of time

Prime 95 - System 1 and System 2 with good memory pair
Note: Max temp 50 C (averaged 48-49 C)

Blend (tests some of everything, lots of RAM)
- No errors reported after couple of hours (both machines)

In-place large FFTs (maximum heat, power consumption, some RAM)
- No errors reported after ~20 minutes (both machines)

Small FFTs (maximum FPU stress, data fits in L2 cache, RAM not tested much)
- No errors reported after ~10 minutes (both machines)

Game Testing - Bad Stick
Civ V (DX11): Crashes during startup, load screen or quickly when loading mid-game on huge maps
Blood Bowl: Game freezes, sometimes recovers and other times crashes (both reporting the amdkmdap error in windows event viewer) - usually happens in first hour of gameplay.
Lara Croft and the Guardian of Light: Freezing during intro video (but recovers), crashed after maybe 30-40 minutes of play. Reloading same game in same room and it keeps crashing each time (amdkmdap error in windows event viewer).

Game Testing - Good Stick & Good Pair
No crashes after 6+ hours of Civ V (DX11), No freezing or crashing after ~2 hours blood bowl and no crashes or freezing after playing through first level of Guardian of Light twice!

Bios Tweaking with Bad Memory
Note: None of these things made any difference to the FATAL ERRORS in the blend test
- Increase voltages to CPU and motherboard slightly
- Decrease the voltage to CPU from motherboard default of 1.4v to 1.3v
- Set cpu frequency multiplier to x6

Summary
With good stick on its own or good memory pair in either system Prime 95 and games never crash.
With bad stick on its own or with together the good stick in either system games crash and Prime95 reports errors in the first hour and usually within the first couple of minutes.
 
Last edited:
Back
Top Bottom