WhatsApp Down?

Most likely preceded by a discussion with management which went something like this:
"Can you make something to do X,Y and Z?"
"Yes, but it'll cost £xxxxxx, we could make something which will do X and Y but won't work with Z for £xxxx"
"Lets go for the cheap option"
"Ok, here you go"
"WHY DOESNT THIS WORK WITH Z?! THE USERS ARE TRYING TO DO Z AND IT HAS BLOWN UP!!!!"

Probably! Wouldn't surprise me. When you see people who have no idea of what effort is required to achieve xyz you get crap code. Had some brilliant technical project managers and had some who struggled to handle outlook. You don't need to be super technical but you need to be able to listen to what experts in said field are saying. Ignoring Devs, DevOps, security etc always ends up hurting them long term. I wonder who's head is going on the line for this at Facebook :o
 
Saw something from Cloudflare? saying when it started Facebook BGP routing had been withdrawn from the internet, if that is the case should be a case of re-adding in the routes and let BGP do its thing, unless it is something more serious than that.
 
Saw something from Cloudflare? saying when it started Facebook BGP routing had been withdrawn from the internet, if that is the case should be a case of re-adding in the routes and let BGP do its thing, unless it is something more serious than that.
SOMEBODY GET THIS MAN TO THE AUTHORITIES - HE HAS THE SKILLS WE NEED!

uKbocSr.jpeg
 
Only the ones that work with "you build it you own it principle"

I've worked in places where Devs dump anything together, fail to test it and push it to production only for it to blow up at 3am while they're all asleep and poor SREs etc get called out to debug their broken code. I've worked with people who literally refuse to look at their code until you point them at the line that is breaking things. So yeah, totally depends on the company and their way if working. If they build it and they own it the quality of work increases massively as they'll be the ones up at 3am trying to debug their code.
Except that it is the testers fault for not running a regression test
 
Surely for a problem of this magnitude, it has to be either a monumental internal screw up of a regular process or perhaps someone internally sabotaging the system?
 
If your domain relies on dns entries in your domain records, which to be honest it shouldn't but if it does :D, ours also had a 4g backup thing you could connect to to override the system.

Bit of a different scenario when it's all the external records that are needed and have been removed for whatever reason. :cool:
 
Surely for a problem of this magnitude, it has to be either a monumental internal screw up of a regular process or perhaps someone internally sabotaging the system?

Word on the street, is that an automated change was rolled out around 5-6 hours ago which wrecked their external routing, it also wrecked their back end connectivity, and I'm told that they're not actually able to login to the systems they need to fix, due the knock-on effect of all the authentication services being broken.

It's pretty much unheard of for a tech company of that size to have an outage like this, last so long... If the above is true - it's really embarassing...
 
Bit of a different scenario when it's all the external records that are needed and have been removed for whatever reason. :cool:

It's facebook wouldn't surprise me given the size if all the internal and external records are weirdly interconnected.
 
Back
Top Bottom