System appears to be down. Which part do I blame?

Soldato
Joined
22 Dec 2008
Posts
10,369
Location
England
Hey. Computer was behaving absolutely fine yesterday, very stable at 4.2ghz. It's not doing so well now.

It posts, can get into bios fine. Gets as far loading grub, then reboots. Shows no inclination to stop doing this. No error beeps or graphical corruption, fans spin up as expected. Nothing is overheating.

So far I've reset to stock, tried optimised and failsafe defaults. Also tried variations on these themes and previous stable settings. Reset cmos repeatedly. Now down to one stick of ram (tried a couple, in different slots). Tried second known good psu, issue remains.

This may be the cold boot issue the gigabyte UD5 is famous for. If so, flashing to a newer bios than F7 may fix things. Alternatively it's unstable, even at stock, and flashing the bios is going to leave me with a brick.

System spec
i7 920
Corsair dominator (two sets)
Gigabyte UD5
PC P&C 860W
8800gt

Ideas much appreciated, I'm a bit bemused by this.

edit: trying to boot from usb, it ignores all keyboard input and freezes after a bit.
 
Last edited:
Second psu has now been tried. Reinstalling OS is unfortunately not possible as I can't get it to boot from any media.

Different symptoms having changed to another stick of ram, now seeing nothing whatsoever on screen. So it might be dead ram, I guess I have a lot of combinations to try.

So, results from ram. Tested with two sticks individually, one from each triple channel set.

Dimm slot 1,3,5 (blue) refuse to post when occupied with dimm from either set.
Dimm slot 2,4,6 (white) post fine but reboot when it should load the OS, when occupied with dimm from either set.
Multiple sticks act in much the same way, as long as one of 2,4,6 has a stick in it'll post and reboot.

This may be a feature of gigabyte boards, refusing to post when only blue slots are occupied, but I doubt it. Considering flashing the bios, but think I'll wait to see if any of you fine gentleman have any ideas.

In the interests of completeness, this is the post I've made on the Gigabyte forums.
Me said:
My UD5 is refusing to boot, in an issue which I believe to be distinct from the normal cold boot issues these boards suffer from. I have some CAD work due in halfway through next week which I do not wish to do on a netbook, so assistance in getting this running would be much appreciated. At least I'd like advice on whether this is board, cpu or ram.

***************************************************************************************************

Symptoms
With dimm slots 1,3 or 5 occupied, the board does not post. With dimm slot 2,4 or 6 occupied, the board posts, will allow access to the bios, but reboots shortly after. Specifically it makes it partway through the bootloader then gives up and reboots, this cycles indefinitely.

With multiple slots occupied, as long as one of the occupied slots is 2,4 or 6, the board posts then reboots much like before. With multiple slots occupied, none of which are 2,4 or 6, it refuses to post.

LCD poster cycles through numbers rather swiftly. It seems to settle on 96, then on FF before rebooting. If no stick is in slots 2,4,6 it sits on 69 for a while, changes to 6F briefly then flickers through a few to come back to 69. I have a suspicion the time between changing from 69 to other numbers is approximately the same as time between reboots when it has ram in slot 2,4,6.

The manual suggests 69 is L2 cache, 96 is loads of things and FF is try to boot. This leads to fears that my processor is dying, which I'd really rather it doesn't as common knowledge is processors don't die, making rma a challenge. I don't have a spare processor with which to test.

This remains the case on optimised or failsafe defaults, as well as on previous stable settings. Cmos reset between each test, some repeated without cmos reset with no change. It also occurs whichever stick/combination of sticks is used from the two triple channel sets. Finally this occurs whether trying to boot from hard drive, usb or cd rom. OS reinstall is inappropriate given I can't get it to boot from disk or usb stick to attempt this.

***************************************************************************************************

Specification
Gigabyte UD5
Intel 920
Corsair dominator, 12gb
PC P&C 860W (tried backup psu also, both run a different system fine).
8800gt

Strangely it was absolutely fine yesterday. I left it idling overnight.

edit: Well I'll be damned if I understand this, but my backup computer, just assembled, appears to be doing exactly the same thing. Tried on a different power socket, same.

:confused:

On a not unrelated note, how the hell do I test mains AC?
 
Last edited:
Ahh Plec, I admit I'm happy to see you. Eventually you'll run into computing problems and I'll attempt to come to the rescue. Cheers for your reply in the other thread, don't know how I failed to notice anyone responding. An electrical forum is a very good idea. Contacting the landlord seems like a plan too, there's definitely something wrong here.

In the meantime, I've found a socket which my backup computer will boot from. So one socket working, about 5 not. This suggests a **** poor wiring job to me.

The Asus rma experience is very much in mind at present, really hoping nothing like that reoccurs. At least the Asus died in different accommodation, so I can't deduce that my home is killing computers just yet. Scary thought though.

Unfortunately while the board behaves a bit better in the known "good" socket, as in it gets partway through booting from usb (I believe it would finish booting if the copy of ubuntu was better), it still doesn't post unless there is ram in one of 2/4/6 which I'm taking as a pretty bad sign. To Google I think

edit: Google won't tell me what the symptoms of a failing imc are, I'll ask intel when they're next available. It's decided to boot with all 12gb, made it all the way to memtest and is running this now. I don't know what to make of this, but somehow I think it's going to pass.

Memtest is fine. Persuaded it to boot from a usb stick and repaired the mbr from there. I think the area of the disk responsible for booting the machine corrupted, that or grub 2 screwed me. That it wont boot with ram in some of the slots is worrying, will wait to see what Gigabyte say about this. Power circuitry in my flat is clearly crap, it's ridiculous that only a couple of the sockets will let my computer boot.

However at least for now I'm looking at my familiar desktop. That's a very good start.
 
Last edited:
Silence isn't computer related thankfully, I lost the last week or so to debugging matlab code. Damned thing didn't work in the end, and it was only meant to solve differential equations. Some updates and responses to the above.

@westom I know little of the internal workings of a psu. There's a transformer which behaves as they always do followed by voltage regulator circuitry. However I do know my psu is good for 22, 26, 64A on the 3.3, 5, 12V lines, is considered ludicrously reliable, is barely stretched by the load from my system and that for two of these monsters (from different batches, both with certificates verifying they passed diagnostics at the factory) to have the same fault is very unlikely. I'm pretty sure it's not the psu letting me down. I'm also pretty sure a multimeter would only give crude numbers, I'd need to attach it to an oscilloscope while under load to get any meaningful answers.

That's reassuring Davy, though a bloody strange design on Gigabyte's part. It does at least make the motherboard less suspect. Thanks for the description of the cold boot issue, I agree that this seems distinct from my current issues. I may try the newer bios. Intel tech support are pretty certain it's the motherboard at fault, if the imc is on the way out they'd expect at least equal behaviour on each slot, and probably complete loss of function.

Pressure on the socket is a good point. It's one I'd completely forgotten about. The block hasn't been off in ages though, it's even still mounted backwards. I just haven't had time to strip the system down yet. In a similar theme, the ek needs to be milled and lapped before mounting with the liquid pro, as the cpu also needs to be lapped I'm going to wait until this is diagnosed. If the imc is dying on me, now would be an unwise time to lap it.

Liquid pro is loads of fun. Hot, wet aluminium with a tiny drop actually froths, it's wonderful to watch. It also bonds fairly convincingly with copper, though I have a suspicion the physical strength of the bond is lacking until it's properly set. Buffing the surface for a time reveals a pattern suspiciously like the liquid pro being in the surface defects but abraded from the surface, however leaving it on the surface for several months then buffing it doesn't shift it. A good long time at elevated temperatures (probably folding at 3.6ghz, 70 degrees or so) will set it beautifully, and I'll have a block soldered to my cpu. I've pretty much destroyed my stock heatsink though, so can't actually test my machine without water anymore. Just another thing on the todo list, currently after taking this machine above stock. The whole point to my degree is to eventually have an engineering lab to play with, a reinforced shed will come first as soon as I have a garden :D

Are you on water yet my dear man? I'm sure I remember a potential spec me thread.

On, ontopic, I'm awaiting an electrician. I'll probably clock the machine tonight, as my lab partner is "too tired to do mechanics". It'll be more vulnerable to weird supply voltages and I think I'll be able to tell if it's behaving unusally. The electricity here must be dirty as hell though (I know, I know) as the load on it is heavy, very variable and it probably has numerous unsafe devices attached, only some of them mine.
 
Assuming you're trying to help westom, would you care to tell me the flaw in this reasoning.

One mains socket out of six tried in my flat allows the computer to boot. The other five do not. I'm deducing that the cables in the wall are a bit knackered. Computer behaves much as it should do, as long as it's connected to this specific socket. Changing power cable doesn't change matters.

I'm using a pc p&c psu which is overspecified for my computer, and has a reputation for outstanding quality relative to the atx spec. Further I have tested with a second psu of the same model from a different batch. Symptoms are identical, presumably meaning the odds of identical faults developing in each psu, one of which was in a cupboard until recently, are negligible.

Finally, hard resets of the computer often cause the relevant circuit breaker to go. This doesn't lend itself to great faith in the power system in general.

A power supply is a closed box that will take 110 to 240V ac at 50hz and output 3.3,5,12V through a variety of cables, conditional on ambient temperature and the input voltage being reasonably close to what it expects. My one in particular has the above ampage ratings before it drops out of atx spec, which is 5% variation on the rails, or 10% on -12V. I haven't connected either of mine to testing circuitry and run them under load, so I'm taking it on faith that it performs as specified. That I'm currently testing an overclock on the machine adds credence to the psu behaving itself.

I'm in a poor position to say what is wrong with the external cabling, but circuit breakers shouldn't trip well under load and all sockets should behave identically. They don't, so there's a place to start.
 
Back
Top Bottom