OK so here's a bit of actual technical knowledge surrounding the whole shutdown, what should have happened and potentially why it all went wrong.
In an ideal scenario, the system would've indeed automatically swapped over to another supply. The reason this didn't happen could be extremely simple, or it could be extremely complicated. The most likely cause that it didn't swap over is that the failover kit didn't work correctly. These tend to come in a variety of flavours:
Further to this, when the failover switches activate, they need smoothing to ensure that the power supplies are in sync or you risk damaging the kit. This is called
synchronisation. This happens by monitoring the sine waves of the supplies and ensuring that they match perfectly before swapping the load over. If they don't match perfectly and the switching takes place you risk crossing the phases of the three phase supplies and that's when things go bang. Like properly bang. This process usually takes a second or so but can take longer. In order for the load to not drop off during this very short process, another system is put in place. This will usually be a battery bank or a
rotary UPS, which will carry the load using stored inertial energy.
As you can tell by now, we're already looking at three major potential failure points, each of which has hundreds of failure points within themselves, which is why they are subject to extremely strict maintenance.
Now let's assume that something has gone wrong, and the system has failed. The reason for this could be a fluke, lack of maintenance, poor system management, lack of management, inexperienced or inadequately trained staff or contractors, etc. The investigation will reveal all that in due course.
Anyway. The supply has now failed, and the failover systems have failed to switch to another supply for whatever reason. This power supply will likely have many
interlocked safety systems connected to it, the purpose of this is to turn off other circuits or systems to prevent damage or injury to staff or other systems. Depending on how the system is configured, this could have a chain reaction and what may have led to the whole airport shutting down. For reference, this whole process will likely have taken less than a couple of seconds.
So now you're sat with a dead airport, nobody knows why, everyone's panicking and you need to get this back online. Your first thought will be to simply swap to another supply and voila, you're singing and dancing. The reality is that you have no idea what caused the power to fail, so you start investigating. First thing you do is call UKPN to see if the issue is on their end. They inform you of the transformer fire, but you know you have more supplies available to you so simply switch over to one of those right? No, because the
inrush current on a system the size of Heathrow will be enough to take out an entire substation, so what'll happen if you do that is that hundreds or even thousands of safety devices will activate, tripping the power to thousands of circuits across the airport. There will be no rhyme or reason either, they will just pop at random, so you'll need to check every single circuit in the airport and manually switch them back on.
This is of course not ideal, so you do a controlled restart. This means physically walking the entire airport and turning off all the distribution boards or LV panels as stipulated in your
emergency startup procedures - assuming of course that these are in place, current, applicable and that your staff are all competent and trained to the levels of knowing what to do.
This process, also known as load shedding, will take hours, and many staff. Once you've turned everything off, you can swap over to the secondary incoming supply, and then slowly start reinstating all the circuits one at a time so you're not hammering the hell out of a supply that likely hasn't been used in years.
Once you've reinstated all the power, you then need to check the entire electrical infrastructure, as you'll have loads of switches that will need manual resetting, and most likely all the generators will still be running as these often require manual shutdowns after activation.
This is just the engineering side of things. Now the other teams will need to jump in to action. The security teams will need to bring their CCTV systems online, as well as their access control kit (door locks, swipes, main PCs, etc). The cleaning crew will need to do a full tour of the airport to make sure that any leaks or spillages are cleaned up (sensor taps are spectacular for staying on in the event of a power failure for example), the UKBA will need to get all their systems online, body scanners, passport scanners, PCs, the lot. This again will take hours.
All of the above makes a ton of assumptions, most notably that everything starts up perfectly and no further works are required, which for a system the size of Heathrow will be the same odds as winning the lottery. Not to mention how all the ancillary systems respond to power outages. I'd have thought that things like body scanners might not like having their power randomly killed, but then these might have their own local UPS systems, who knows.
The above is a very,
very broad guesstimate of what happened. I don't know their systems, staff, maintenance regimes, etc, so I can only apply what I do know, and fill in the gaps with a lot of assumptions.
Only once the above is all completed can the investigation begin, which is likely what's happening now.
What I will say is that the whole reinstatement might have gone like clockwork, given the scale of the shutdown and the works involved to get all the systems back online, I must admit that turning it all around in that short a time is quite impressive.