System appears to be down. Which part do I blame?

Soldato
Joined
22 Dec 2008
Posts
10,369
Location
England
Hey. Computer was behaving absolutely fine yesterday, very stable at 4.2ghz. It's not doing so well now.

It posts, can get into bios fine. Gets as far loading grub, then reboots. Shows no inclination to stop doing this. No error beeps or graphical corruption, fans spin up as expected. Nothing is overheating.

So far I've reset to stock, tried optimised and failsafe defaults. Also tried variations on these themes and previous stable settings. Reset cmos repeatedly. Now down to one stick of ram (tried a couple, in different slots). Tried second known good psu, issue remains.

This may be the cold boot issue the gigabyte UD5 is famous for. If so, flashing to a newer bios than F7 may fix things. Alternatively it's unstable, even at stock, and flashing the bios is going to leave me with a brick.

System spec
i7 920
Corsair dominator (two sets)
Gigabyte UD5
PC P&C 860W
8800gt

Ideas much appreciated, I'm a bit bemused by this.

edit: trying to boot from usb, it ignores all keyboard input and freezes after a bit.
 
Last edited:
Sounds very much like a cold boot issue the Gigabyte UD5 is renowned for. If not that then all i can think of of is changing PSU like you said or re-installing the OS.
 
Second psu has now been tried. Reinstalling OS is unfortunately not possible as I can't get it to boot from any media.

Different symptoms having changed to another stick of ram, now seeing nothing whatsoever on screen. So it might be dead ram, I guess I have a lot of combinations to try.

So, results from ram. Tested with two sticks individually, one from each triple channel set.

Dimm slot 1,3,5 (blue) refuse to post when occupied with dimm from either set.
Dimm slot 2,4,6 (white) post fine but reboot when it should load the OS, when occupied with dimm from either set.
Multiple sticks act in much the same way, as long as one of 2,4,6 has a stick in it'll post and reboot.

This may be a feature of gigabyte boards, refusing to post when only blue slots are occupied, but I doubt it. Considering flashing the bios, but think I'll wait to see if any of you fine gentleman have any ideas.

In the interests of completeness, this is the post I've made on the Gigabyte forums.
Me said:
My UD5 is refusing to boot, in an issue which I believe to be distinct from the normal cold boot issues these boards suffer from. I have some CAD work due in halfway through next week which I do not wish to do on a netbook, so assistance in getting this running would be much appreciated. At least I'd like advice on whether this is board, cpu or ram.

***************************************************************************************************

Symptoms
With dimm slots 1,3 or 5 occupied, the board does not post. With dimm slot 2,4 or 6 occupied, the board posts, will allow access to the bios, but reboots shortly after. Specifically it makes it partway through the bootloader then gives up and reboots, this cycles indefinitely.

With multiple slots occupied, as long as one of the occupied slots is 2,4 or 6, the board posts then reboots much like before. With multiple slots occupied, none of which are 2,4 or 6, it refuses to post.

LCD poster cycles through numbers rather swiftly. It seems to settle on 96, then on FF before rebooting. If no stick is in slots 2,4,6 it sits on 69 for a while, changes to 6F briefly then flickers through a few to come back to 69. I have a suspicion the time between changing from 69 to other numbers is approximately the same as time between reboots when it has ram in slot 2,4,6.

The manual suggests 69 is L2 cache, 96 is loads of things and FF is try to boot. This leads to fears that my processor is dying, which I'd really rather it doesn't as common knowledge is processors don't die, making rma a challenge. I don't have a spare processor with which to test.

This remains the case on optimised or failsafe defaults, as well as on previous stable settings. Cmos reset between each test, some repeated without cmos reset with no change. It also occurs whichever stick/combination of sticks is used from the two triple channel sets. Finally this occurs whether trying to boot from hard drive, usb or cd rom. OS reinstall is inappropriate given I can't get it to boot from disk or usb stick to attempt this.

***************************************************************************************************

Specification
Gigabyte UD5
Intel 920
Corsair dominator, 12gb
PC P&C 860W (tried backup psu also, both run a different system fine).
8800gt

Strangely it was absolutely fine yesterday. I left it idling overnight.

edit: Well I'll be damned if I understand this, but my backup computer, just assembled, appears to be doing exactly the same thing. Tried on a different power socket, same.

:confused:

On a not unrelated note, how the hell do I test mains AC?
 
Last edited:
Hey Jon

Sorry to read you're having problems, I'm just on my way out and noticed your post - so apologies for the breifness - i'll look in when i get back but you seem to be covering most avenues...

On a not unrelated note, how the hell do I test mains AC?

From your last thread this indeed may not be that unrelated - link - there 'could' be a chance that it's related and certainly worth investigating if you don't have any luck with the standard troubleshooting methods.

Did you try a post in the 'electricians forum' - it may be worth a try?

Good luck Jon, it sounds a sod of a problem.
 
Last edited:
Ahh Plec, I admit I'm happy to see you. Eventually you'll run into computing problems and I'll attempt to come to the rescue. Cheers for your reply in the other thread, don't know how I failed to notice anyone responding. An electrical forum is a very good idea. Contacting the landlord seems like a plan too, there's definitely something wrong here.

In the meantime, I've found a socket which my backup computer will boot from. So one socket working, about 5 not. This suggests a **** poor wiring job to me.

The Asus rma experience is very much in mind at present, really hoping nothing like that reoccurs. At least the Asus died in different accommodation, so I can't deduce that my home is killing computers just yet. Scary thought though.

Unfortunately while the board behaves a bit better in the known "good" socket, as in it gets partway through booting from usb (I believe it would finish booting if the copy of ubuntu was better), it still doesn't post unless there is ram in one of 2/4/6 which I'm taking as a pretty bad sign. To Google I think

edit: Google won't tell me what the symptoms of a failing imc are, I'll ask intel when they're next available. It's decided to boot with all 12gb, made it all the way to memtest and is running this now. I don't know what to make of this, but somehow I think it's going to pass.

Memtest is fine. Persuaded it to boot from a usb stick and repaired the mbr from there. I think the area of the disk responsible for booting the machine corrupted, that or grub 2 screwed me. That it wont boot with ram in some of the slots is worrying, will wait to see what Gigabyte say about this. Power circuitry in my flat is clearly crap, it's ridiculous that only a couple of the sockets will let my computer boot.

However at least for now I'm looking at my familiar desktop. That's a very good start.
 
Last edited:
Hey Jon

Sorry for the late reply – I thought I would quickly flood the upstairs of my house before I returned to this. This obviously wasn't intentional more of a process in my attempt at fitting a new power shower (which I, thankfully, finally succeeded in doing) – I think next time I shall use my plumber and overlook his eccentricies. Although I probably didn’t make any more mess than he would have – it’s a much more fun watching someone else’s futile attempts to stop unwanted water coming from a ‘supposedly’ isolated pipe than experiencing it myself…


Anyway, before I mainline some alcohol as a reward for not allowing the water to make it to the ground floor lets try and prevent you from projecting your new build into the nearest supporting wall.

Ahh Plec, I admit I'm happy to see you. .


I’m not sure how much I will be able to help as you’ve probably tried or hypothesised most theories. I suspect this thread will be more about bouncing possible theories off each other but the discussion process is always valuable and certainly makes the frustrating troubleshooting process a less solitary and lonely experience. (It can also save valuable bits of hardware being rendered worthless in a matter of seconds through random yet, seemingly, necessary acts of violence. ;))


In the meantime, I've found a socket which my backup computer will boot from. So one socket working, about 5 not. This suggests a **** poor wiring job to me.


That obviously sounds ominous and certainly needs looking into.

You’ve mentioned in other posts that you’re frequently faced with a Gigabyte splash screen for long periods of time/hangs (cold boot issue). Perhaps it’s been getting past this point and loading the GRUB etc and hanging (unbeknown to you) but still displaying the splash screen and over time it’s corrupted the mbr?..


How a poor mains supply/faulty wiring could cause this I don’t know – as I would have thought the PSU would have shut the PC down or just refused to power on. But whether it’s due to the supply or something within the PC itself it’s perhaps worth considering that the 'screen hang' may be hiding other boot activity?.. (Unless it’s not posted of course in which case ignore).

Have you noticed any HDD activity when the splash screen hangs?


The 'flaky mains supply' (forum member 'westom' would have a conniption fit at that description) indirectly causing the boot sector to become corrupt via hangs or inducing component faults seems to fit nicely and on the surface appears logical but I would have thought that the PSU would have exposed a poor mains supply(?) (what are your thoughts?) But it’s certainly worth investigating and it’s definitely worth getting an electrician round to stabilise your supply around the house (I’m guessing you’re probably in the process of this – student funds permitting).


Unfortunately, until you get your electrics checked and repaired and your computer stable you won’t know for sure if it is the underlying cause of all your problems. (A bit of a catch 22 I’m afraid – plus if you were to find a fault within the PC and corrected the electrics – you still wouldn’t be 100% sure that it was the mains supply at fault as it may have just been coincidence.)


I believe it would finish booting if the copy of ubuntu was better


I can post you a couple of copies of Ubuntu on DVD if it’s of any help – just let me know and I’ll insert a disposable e-mail in my next post.



I can’t remember if you bought a new PSU with the i7 build – and, if you did, if it was used in the other house with dodgy electrics?


Google won't tell me what the symptoms of a failing imc are, I'll ask intel when they're next available


Might be a good shout - any luck with intel?
 
Last edited:
This may be the cold boot issue the gigabyte UD5 is famous for. If so, flashing to a newer bios than F7 may fix things. Alternatively it's unstable, even at stock, and flashing the bios is going to leave me with a brick.
This is why the better computers provide a comprehensive hardware diagnostic. To execute specific tests that only test each subsystem or component. This is also why a multimeter is so useful to completely eliminate some suspects. Currently you are still suspecting everything rather than isolating the problem to specific suspects.

For example, did you know of the power supply controller? A power system component that could cause your failures. A suspect quickly eliminated by others who know your component if you provide numbers from six wires to define or exonerate it as a suspect.

Temperature is also a diagnostic tool. 40 degree and 90 degree heat is another powerful tool. Selectively heating or cooling is another way of isolating a problem to specific suspects.

Currently, most everything remains a suspect. What you have done can better suggest which test to do first. But does nothing to identify the failure or the suspect that defines that failure.

What makes everything act intermittent? A power system problem. More than just a supply. Only way to see that failure in but a minute or to completely eliminate the subsystem as a suspect: a $16 multimeter.
 
Damn Jon, this isn't good news at all mate.

Regarding the RAM, afaik the board willnot POST at all if you try having the RAM in the blue slots, you only start using the blue slots when you are using 4 or more DIMM's, so that should explain why the system doesn't do anything when you have tried having the RAM in those slots.

It doesn't sound like the cold boot issue to me neither, or at least not the cold booting issue that I used to experience, as it didn't even get close to POST'ing before it powered back down, what would happen with me anyway, was I would goto turn my system on for the first time in the morning, the fans would spin up, and the mobo would light up, then it powered down, and started up again straight away.

Most of the time it would POST after starting the 2nd time, but there were a few occasions that it would shutdown, and restart a few times, then it would give me the "your o/c has failed, going back to default settings" warning message.

The problems that you have described dont sound like this, or at least they don't to me!!

It's really weird how your other system is also doing this, yet it works on a different socket in your house, that sounds like something is wrong with the electrics in your house to me.

I wish that I could offer you some help mate, but I know absolutely nothing about this kinda thing!!

Hope you get the problem sorted out soon though, as I know how frustrating it is when things dont work..
 
I wish that I could offer you some help mate, but I know absolutely nothing about this kinda thing!!

The above is very disconcerting as DavyBoy’s last build embarked on a world tour of nearly every known quirk/fault a new build could muster. It even had his sanity on its 'to do list'... ;)

If DavyBoy hasn’t come across a similar problem with the UD5 platform, after all the extensive research he did into his problems, you may have to consider donating your machine to 'technical phenomena research'.

/back on topic proper

Have you had the heatsink/waterblock off your CPU recently? The reason i ask is that i vaguely remember a post about someone having memory issues (although they were slightly more straight forward) due to the heatsink applying too much pressure to the socket - or similar. (i'll try and find the link).

EDIT: I can't find the thread but i've discussed said thread content in another and you're in the same thread discussing it with me (make sense?) - so i'm guessing you've probably considered this and along with socket/pin damage? (if you've removed the HS/block recently.)
 
Last edited:
JJ, your silence is worrying…

I’m hoping that this isn’t a sign that your electrics have totally died or that you now have 2 ATX sized paper weights?

I’ve just read the Coollaboratory Liquid Pro Thermal Compound threadlink – did you finish testing it on every available inanimate object in the house and get round to applying it to your i7. And did you let on to your other-half that you were sticking toxic compounds in her oven? ;) (Christ that reads like some grotesque euphemism).

You are definitely a man in need of a shed/garage with a dedicated power supply. Better still you should make that a shed/garage with dedicated power supply and *smoke detector*…
 
Last edited:
Silence isn't computer related thankfully, I lost the last week or so to debugging matlab code. Damned thing didn't work in the end, and it was only meant to solve differential equations. Some updates and responses to the above.

@westom I know little of the internal workings of a psu. There's a transformer which behaves as they always do followed by voltage regulator circuitry. However I do know my psu is good for 22, 26, 64A on the 3.3, 5, 12V lines, is considered ludicrously reliable, is barely stretched by the load from my system and that for two of these monsters (from different batches, both with certificates verifying they passed diagnostics at the factory) to have the same fault is very unlikely. I'm pretty sure it's not the psu letting me down. I'm also pretty sure a multimeter would only give crude numbers, I'd need to attach it to an oscilloscope while under load to get any meaningful answers.

That's reassuring Davy, though a bloody strange design on Gigabyte's part. It does at least make the motherboard less suspect. Thanks for the description of the cold boot issue, I agree that this seems distinct from my current issues. I may try the newer bios. Intel tech support are pretty certain it's the motherboard at fault, if the imc is on the way out they'd expect at least equal behaviour on each slot, and probably complete loss of function.

Pressure on the socket is a good point. It's one I'd completely forgotten about. The block hasn't been off in ages though, it's even still mounted backwards. I just haven't had time to strip the system down yet. In a similar theme, the ek needs to be milled and lapped before mounting with the liquid pro, as the cpu also needs to be lapped I'm going to wait until this is diagnosed. If the imc is dying on me, now would be an unwise time to lap it.

Liquid pro is loads of fun. Hot, wet aluminium with a tiny drop actually froths, it's wonderful to watch. It also bonds fairly convincingly with copper, though I have a suspicion the physical strength of the bond is lacking until it's properly set. Buffing the surface for a time reveals a pattern suspiciously like the liquid pro being in the surface defects but abraded from the surface, however leaving it on the surface for several months then buffing it doesn't shift it. A good long time at elevated temperatures (probably folding at 3.6ghz, 70 degrees or so) will set it beautifully, and I'll have a block soldered to my cpu. I've pretty much destroyed my stock heatsink though, so can't actually test my machine without water anymore. Just another thing on the todo list, currently after taking this machine above stock. The whole point to my degree is to eventually have an engineering lab to play with, a reinforced shed will come first as soon as I have a garden :D

Are you on water yet my dear man? I'm sure I remember a potential spec me thread.

On, ontopic, I'm awaiting an electrician. I'll probably clock the machine tonight, as my lab partner is "too tired to do mechanics". It'll be more vulnerable to weird supply voltages and I think I'll be able to tell if it's behaving unusally. The electricity here must be dirty as hell though (I know, I know) as the load on it is heavy, very variable and it probably has numerous unsafe devices attached, only some of them mine.
 
@Westom I know little of the internal workings of a psu. There's a transformer which behaves as they always do followed by voltage regulator circuitry. However I do know my psu is good for 22, 26, 64A on the 3.3, 5, 12V lines, ... I'm pretty sure it's not the psu letting me down. I'm also pretty sure a multimeter would only give crude numbers, I'd need to attach it to an oscilloscope while under load to get any meaningful answers.
Power supplies are not designed that way. To provide those currents, a transformer would be maybe 20 or more pounds. Would weigh as much as the rest of that computer. A regulator connects directly to AC mains; is not after a transformer. Apparently other system components such as the power controller are also unknown to you.

Correct - the best way to obtain power supply integrity is an oscilloscope and only when the power supply is fully loaded. But provide numbers from a 3.5 digit multimeter (and those numbers must be accurate to 3 significant digits to be useful) to have a useful answer quickly. Then one who designed supplies can report power system integrity (including other criticial components). Currently the power 'system' remains unknown - the third state. Only way to move that system to 'definitively good' or 'definitively bad' is the meter or an oscilloscope. Until you do that, every other diagnotic test for other component remains 'maybe'.

Report meter numbers to get an answer that adds things you apparently did not know. Power supplies do not work as you have described. Know that a supply is good only because it is oversized or because you suspect - and learn nothing. Your choice. That meter will report things you did not even know exist. We used meters to identify a defective product before it went out the door. For example, is a pullup resistor missing? Numbers to three digits will report things not apparent.

Are incandescent bulbs varying 30% and 50% in intensity? Even variations that great are prefectly normal voltages to any computer. Computers are required to be that robust. Anything the electrician might look for can be observed in 30 seconds with a multimeter right at the wall receptacle. You need an electrician if bulb intensity is varying that much. And still a properly constructed computer calls that voltage ideal. I am confused why you are waiting for an electrician. What do you expect to discover?
 
Assuming you're trying to help westom, would you care to tell me the flaw in this reasoning.

One mains socket out of six tried in my flat allows the computer to boot. The other five do not. I'm deducing that the cables in the wall are a bit knackered. Computer behaves much as it should do, as long as it's connected to this specific socket. Changing power cable doesn't change matters.

I'm using a pc p&c psu which is overspecified for my computer, and has a reputation for outstanding quality relative to the atx spec. Further I have tested with a second psu of the same model from a different batch. Symptoms are identical, presumably meaning the odds of identical faults developing in each psu, one of which was in a cupboard until recently, are negligible.

Finally, hard resets of the computer often cause the relevant circuit breaker to go. This doesn't lend itself to great faith in the power system in general.

A power supply is a closed box that will take 110 to 240V ac at 50hz and output 3.3,5,12V through a variety of cables, conditional on ambient temperature and the input voltage being reasonably close to what it expects. My one in particular has the above ampage ratings before it drops out of atx spec, which is 5% variation on the rails, or 10% on -12V. I haven't connected either of mine to testing circuitry and run them under load, so I'm taking it on faith that it performs as specified. That I'm currently testing an overclock on the machine adds credence to the psu behaving itself.

I'm in a poor position to say what is wrong with the external cabling, but circuit breakers shouldn't trip well under load and all sockets should behave identically. They don't, so there's a place to start.
 
I'm in a poor position to say what is wrong with the external cabling, but circuit breakers shouldn't trip well under load and all sockets should behave identically. They don't, so there's a place to start.
Obviously I am trying to help. But your post is devoid of critical facts and (especially) numbers. Something I never saw before is this a tripping breaker. Which breaker? A power panel breaker? One in the computer? What kind of breaker? And RCD?

Appreciate how much more complex I view your problem due to so many unknowns. For example, what is the total load on that circuit when a breaker trips? For example numbers from each powered appliance. From what was posted, I have a list of suspects far longer than all other posts combined.

Break down a complex problem. Separate the problem into parts. Analyze each part separately. Do not disconnect or change anything. Step one - get facts - especially numbers. Only then can I reduce my long list of potential suspects to be helpful.

From the original post, that means but two minutes with a 3.5 digit multimeter to measure six wires. IOW to move one item - the entire power system - from a third state (unknown) to any other state (definitively good or definitively bad). Both a minimally sized or oversized supply fail equally. You have zero reasons to exonerate any part of that 'system'. A larger wattage supply is not more reliable and is often less reliable to sell at a same price.

What is the function of every power supply? To make even crappy AC power irrelevant. As noted earlier, incandescent bulb intensity can change 30% and the computer must be perfectly happy - assuming a good supply system.

Your original post discussed a Bios that did not even boot - read a disk drive. Few components are related to that failure. All can act defective due to something only the meter can identify. Start there to break the problem down; to determine which of so many directions to go next. A process often called "Follow the evidence".
 
Are you on water yet my dear man? I'm sure I remember a potential spec me thread.

I tagged on to setter’s ‘spec me thread’ as I was buying a new case and he seems to have come from a similar tech era to myself so I was interested to see if he could be persuaded into the change after all this time being on air.

I’m not sure what setter decided on, but I bottled out (again) pretty early on and bought a case that would only partially hide a modest water setup if I changed my mind in the future – I was considering the large Silverstone, Lian Li or Corsair cases if I had gone the whole hog.

I bought the Lian Li PC-7FN (i like simple clean lines - and I couldn't risk a door front, my preferred choice, as my little daughter would probably have mistaken it for a dolls house and promptly ripped it off its hinges):

3894513125_9d34387c42.jpg


62c.jpg


I nearly bought the Fortress but really didn’t want a window. I’ve 7v modded all the case fans and so far things seem to very cool – but it’s winter so things may radically change in the summer. But for now it’s silent…

Time/other commitments were still the major turn offs for me with regards to water. Not the initial setting up, as I would see that as recreational, nor the subsequent fiddling to get things just so. It was the casual gfx upgrade or the inevitable trouble shooting (with a heavily clocked rig) if things went t*ts up that made me avoid it for another year.

It’s my own fault; I should have experimented with water before I had kids. Everything becomes a major challenge with small kids and time becomes a very precious commodity. And although I still love building, taking apart and tweaking computers – I can’t imagine that I would enjoy the same luxury if my rig were under water due the extra time required (for upgrades/fault finding)

Plus, I have quite a healthy HTPC fund at the moment and if I had moved to water that would have depleted significantly with the far superior case and custom water kit with obligatory *beautiful* (:D) copper block.

All poor excuses I know, but valid enough for me. I will probably end up building in the future, as more and more friends/family have enquired about it. Oddly, enough I’m looking into building a passive i7 build for a friend – he didn’t have the funds for custom water so I suggested the h50 but now I’m awaiting some results in another thread where someone is going to try and passively cool his 920 with a ThermoLab Baram. Should be interesting looking at his stats when they appear…

/On topic how are things your end – any luck pin pointing the problem and has an electrician got your house electrics sorted?
 
Last edited:
Which breaker? A power panel breaker? One in the computer?
Ever seen a computer with resettable circuit breaker?
A larger wattage supply is not more reliable and is often less reliable to sell at a same price.
PC P&Cs are some of the highest quality PSUs you can get and there are no cheap crap components in them.
All can act defective due to something only the meter can identify. Start there to break the problem down; to determine which of so many directions to go next. A process often called "Follow the evidence".
Multimeter can't exonerate any part of the "power chain".
Output voltage can look stable and right to the spot while at the same time ripple can be killing components.
Neither does it tell if wall voltage has any fast disturbances.

And the evidence of PC miraculously working in one socket but not in others tells there's definitely something fishy going in them even if there would be some fault "inside the box".
 
Ever seen a computer with resettable circuit breaker?
Yes. That list of possible breakers he could have been referring to is long. Moving on to the point: this is a new symptom not previously posted.

Higher wattage supplies can sell by using cheaper components or missing essential functions that were even required in the 1970s. Many assume higher wattage is better because that is the popular myth. Higher wattage means it must cost more or something gets downgraded or forgotten. Those who claim a higher wattage supply is more robust typically are discovered to have no electrical knowledge.

A computer working in one socket and not another implies numerous possible problems. Everyone only speculation due to insufficient information (such as numbers from a multimeter). Including and not limited to a defective power supply or power controller. What would identify the failure in but minutes - and definitely? A multimeter. Apparently you do not understand what all that information is from numbers on a multimeter. Only those who did this stuff would.

But then I have seen (and designed) computers with resettable circuit breakers. And other things you have never seen. Including a PC supply containing a battery so that the computer's supply was also a UPS. A desktop that operated like a laptop. Have you seen these? We do things that require electrical knowledge not found with computer repairmen and shotgunners.
 
And the evidence of PC miraculously working in one socket but not in others tells there's definitely something fishy going in them even if there would be some fault "inside the box".
Please learn the numbers before making such conclusions. How defective would a socket need be? Well every minimally acceptable computer must work just fine and even start up when the light bulb (connected to the same socket) dims to 50% intensity. Learn the numbers. Every computer is required to be that robust. So robust that a hoover would not even work on a socket that a computer can still operate from. Please learn numbers before posting such speculation.

Oh. And if that receptacle was defective, the meter would have provided numbers so that other with knowledge could make a reply - without speculation. Just another reason why a ten quid meter is so useful. Those with superior experience, training, and knowledge are restricted (cannot provide a complete answer immediately) due to a shortage of posted facts and numbers.
 
Back
Top Bottom