WhatsApp Down?

grudas · 4 Oct 2021 at 21:54

touch said:
Most likely preceded by a discussion with management which went something like this:
"Can you make something to do X,Y and Z?"
"Yes, but it'll cost £xxxxxx, we could make something which will do X and Y but won't work with Z for £xxxx"
"Lets go for the cheap option"
"Ok, here you go"
"WHY DOESNT THIS WORK WITH Z?! THE USERS ARE TRYING TO DO Z AND IT HAS BLOWN UP!!!!"

Probably! Wouldn't surprise me. When you see people who have no idea of what effort is required to achieve xyz you get crap code. Had some brilliant technical project managers and had some who struggled to handle outlook. You don't need to be super technical but you need to be able to listen to what experts in said field are saying. Ignoring Devs, DevOps, security etc always ends up hurting them long term. I wonder who's head is going on the line for this at Facebook

Dyson · 4 Oct 2021 at 21:57

Malevolence said:
Well don't ******* use it then. It ain't exactly rocket surgery.

Science, rocket science.

visibleman · 4 Oct 2021 at 21:57

Lysander said:
Apparently it's an "internal error". Not sure if I believe that though.

I mean, there's no real evidence that it's an external attack but sure, let's hear your CT take on this....

Nikumba · 4 Oct 2021 at 21:59

Saw something from Cloudflare? saying when it started Facebook BGP routing had been withdrawn from the internet, if that is the case should be a case of re-adding in the routes and let BGP do its thing, unless it is something more serious than that.

dlockers · 4 Oct 2021 at 22:01

keenan · 4 Oct 2021 at 22:02

Dyson said:
Science, rocket science.

Haven't you ever done a tripple bypass on an Astro Blaster?

dlockers · 4 Oct 2021 at 22:03

Nikumba said:
Saw something from Cloudflare? saying when it started Facebook BGP routing had been withdrawn from the internet, if that is the case should be a case of re-adding in the routes and let BGP do its thing, unless it is something more serious than that.

SOMEBODY GET THIS MAN TO THE AUTHORITIES - HE HAS THE SKILLS WE NEED!

jonneymendoza · 4 Oct 2021 at 22:03

grudas said:
Only the ones that work with "you build it you own it principle"

I've worked in places where Devs dump anything together, fail to test it and push it to production only for it to blow up at 3am while they're all asleep and poor SREs etc get called out to debug their broken code. I've worked with people who literally refuse to look at their code until you point them at the line that is breaking things. So yeah, totally depends on the company and their way if working. If they build it and they own it the quality of work increases massively as they'll be the ones up at 3am trying to debug their code.

Except that it is the testers fault for not running a regression test

Ad_Augendae · 4 Oct 2021 at 22:04

Ima havin', a panic attack me thinks. WuT/woot

dlockers · 4 Oct 2021 at 22:04

jonneymendoza said:
Except that it is the testers fault for not running a regression test

Except that is the Test Managers fault for not automating regression

Malt_Vinegar · 4 Oct 2021 at 22:05

Surely for a problem of this magnitude, it has to be either a monumental internal screw up of a regular process or perhaps someone internally sabotaging the system?

Skynet5 · 4 Oct 2021 at 22:05

Dyson said:
Science, rocket science.

Rocket surgery is much harder. Fixing something you yourself didn't create. Much much harder.

Simmz · 4 Oct 2021 at 22:05

Vince said:
If your domain relies on dns entries in your domain records, which to be honest it shouldn't but if it does , ours also had a 4g backup thing you could connect to to override the system.

Bit of a different scenario when it's all the external records that are needed and have been removed for whatever reason. :cool:

mushtafa · 4 Oct 2021 at 22:06

If it's as simple as the DNS record being deleted, why can't it just be re-added?

ATiRubyfan · 4 Oct 2021 at 22:06

Back to MSN messenger lads!

Kreeeee · 4 Oct 2021 at 22:09

Felon · 4 Oct 2021 at 22:09

Malt_Vinegar said:
Surely for a problem of this magnitude, it has to be either a monumental internal screw up of a regular process or perhaps someone internally sabotaging the system?

Word on the street, is that an automated change was rolled out around 5-6 hours ago which wrecked their external routing, it also wrecked their back end connectivity, and I'm told that they're not actually able to login to the systems they need to fix, due the knock-on effect of all the authentication services being broken.

It's pretty much unheard of for a tech company of that size to have an outage like this, last so long... If the above is true - it's really embarassing...

Vince · 4 Oct 2021 at 22:09

Simmz said:
Bit of a different scenario when it's all the external records that are needed and have been removed for whatever reason.

It's facebook wouldn't surprise me given the size if all the internal and external records are weirdly interconnected.

Rroff · 4 Oct 2021 at 22:10

Kreeeee said:
I'm amazed how many businesses rely on WhatsApp to function.

What a reckless way to do business.

We don't rely on it but it makes things so much easier - we have fallback in place.

Ad_Augendae · 4 Oct 2021 at 22:10

dig tells me that cloudflare isn't resolving facebook.com root servers give me a reply