Damn, WhatsApp is back.
Too early to speculate this is related to the Facebook whistleblower due in front of senate today?
Overly confident, lots of assumptions and a jump to conclusion there - you work in 1st line?So not your domain then, and would have still worked even if yours had gone down. Winner
Too early to speculate this is related to the Facebook whistleblower due in front of senate today?
What am I meant to do? Speak to the wife?
Loving the Daily Mail tracert image
Too early to speculate this is related to the Facebook whistleblower due in front of senate today?
It is a much more interesting story.I think that it's plausible that this might have been caused by the actions of a disgruntled employee...
The reason I think it's plausible, is the nature of the change; At 1540 Facebook made a change to it's external routing (BGP) which withdrew a large number of BGP prefixes, some of these prefixes were for it's DNS and internal infrastructure ranges - which is weird. It's weird because a change like that (anything involving important public prefixes) would normally require several levels of approval, and it would also be subject to health checks (pre and post checks) and automated rollbacks if any dashboards went red during or after the change.
It just seems fishy that this failed in a way which was essentially non-recoverable and it lasted so long, the fact this whistleblower thing is going on at the same time might just be a complete coincidence, but it's a fact that one of the main causes of cyber/DOS attacks is disgruntled/upset employees.
I'd argue it is more likely that BGP changes are absolutely few and far between, and someone done-goofed. We have all felt that dread when a ping isn't responded to from a remote device. Messing the config up on a router on the other side would make anything other than physical access or a fully redundant out of bands access futile.
Well, far from me to be a CT nutter but it does seem a bit coincidental that this happened the day after that kicked off, conveniently burying the story in the media.
I don't doubt it - I can just imagine a conversation going something like...Yeah I mean it's possible that someone just 'screwed up'
The problem with that though, is that Facebook have some of the worlds best network automation - a lot (if not all) of their changes are modelled and tested before they're deployed. If anything goes wrong - it's normally rolled back automatically or halted mid-deployment.
What's also weird, is that Facebooks network is massive and highly distributed - it's very unusual to make a change, which would affect their entire infrastructure across the globe at once, especially with things like public internet facing prefixes - stuff like that would normally have to be approved and checked numerous times, that's before any automated checks.
I don't doubt it - I can just imagine a conversation going something like...
"We are going to deploy all this uber cool stuff to automate/security check/double and counter sign any config changes with auto regression and failover backups plus simulation"
"OK where do we start?"
"The devices we mess about with most"
And ergo, BGP peering routers that get updated once in a blue moon drop to the bottom of the list.
I imagine the network administrators dealing with BGP are the Rayban wearing, rollerblading kind who don't need no config or change approval
I don't doubt it - I can just imagine a conversation going something like...
"We are going to deploy all this uber cool stuff to automate/security check/double and counter sign any config changes with auto regression and failover backups plus simulation"
"OK where do we start?"
"The devices we mess about with most"
And ergo, BGP peering routers that get updated once in a blue moon drop to the bottom of the list.
I imagine the network administrators dealing with BGP are the Rayban wearing, rollerblading kind who don't need no config or change approval
WhatsApp was down? Didn't even know.
lol
I know a bunch of the network engineers at Facebook, a lot of them are straight talking Russians - but funnily enough right now, none of them are talking
Facebook's edge network is pretty awesome to be honest, they have over 160 POPs globally and the DCs to go with it - I'm just having a hard time picturing a normal, scheduled change taking it all down at once, for 6 hours... It's just weird.