Soldato
How can one power supply issue ground the entire fleet of BA? This is gross ineptitude.
I can offer an example. Please note two things - this is only an illustration of how one power supply issue could cause this, not a claim that it is. Secondly, that this is explanation, not excuse.
So, big companies with critical IT systems run multiple datacentres in case of disaster - including power supply failure. However, companies are usually pretty terrified of actually testing complete data centre failures. I know - I had to have a big argument with a director once on the necessity of doing this. They are risk averse. I had to show them that the small risk of causing an outage was greater than never testing it and having it fail under circumstances out of our control. In any case, whether regularly tested or not, failover of very large systems such as BAs flight management handling bookings, boarding passes, cancellations, et al. is a complex task.
There are reports of things going wrong before the complete collapse. People seeing wrong destinations come up, missing flights and similar. What this sounds like to me is a partial failover. For example, the off-centre / off-centres were not properly synced. The system failed over and either data was missing because it hadn't propagated yet or it failed over and then the original centre came back online and systems transferred back over to that one and THAT one was no longer current. Or possibly it was running from both intermittently / simultaneously. One way of transferring between datacentres is to update firewalls / load-balancers to direct traffic to the different centre and this can take a few moments to propagate. I'm not sure what database systems BA use (but probably Oracle) but if you enter a scenario where you have two MASTER databases (in practice, BA will have a more complicated than just two) and they get conflicting data in them, it can be a nightmare to disentangle. I know. One job I had was to repair a situation where two databases that were supposed to be in sync had diverged. Note, diverged is different from one merely being behind where you just replay the transactions to catch it up.
So basically, in this hypothetical (possibly real, who knows?), they did failover to a different data centre but either the data wasn't there or more likely if the other errors reported are true, systems were bouncing between them and causing data corruption.
Anyway, that's just an example of how a power supply failure at one place could collapse the whole system. If anyone is inclined to write a post saying how this shouldn't happen or listing ways to prevent it - feel free. Just don't write it as if I'm the person arguing against you!