Major Websites down due to cloud issue

Soldato
Joined
29 Dec 2014
Posts
5,758
Location
Midlands
On a side note what is the real benefit of 'Cloud' services?
I would have thought the infrastructure would protect against any outages by means of multi location backup servers?

Cloud companies like AWS/Azure/Google whatever, are so utterly enormous, that parts of them are always "on fire" in the sense that on any given day, all of those providers will be having some sort of large scale outage. The reality is, these companies are so large, that they're often able to shift things around, and are so resiliant - nobody ever knows. Automation is at such a level, that most faults get fixed entirely by automation and don't require human intervention at all. Obviously occasionally, something really bad happens and we see these global outages - but on the whole, this isn't actually anything new, we do get large scale outages from time to time, due to one reason or another.

In terms of the benefits of the cloud, I'll give you an example;

Back in 2015 I helped design and build the worlds largest low-latency gaming network, designed purely to give players the lowest possible ping. We had hundreds of locations all over the world, in the end we probably spent in the region of $500m and it worked very well. At the same time - we ran most of our servers on "bare metal" in our own DCs. (all of this costs a fortune, required around 150 people to run it all)

Eventually, we started playing with AWS - the benefit, is that it's just so easy to turn stuff on, because it just is... Time goes on - devs just start doing stuff inside AWS because it's easy and fast and it works well. To the point that, it's actually easier to login to the AWS portal, spin up your stuff and get it going - than it is to walk 10ft across the office and ask someone to build it on your own infrastructure (which already exists). Coupled with the fact that "because it's Amazon" people just sign it off.

It got to the point, where if I wanted say, $5m in network hardware and costs, combined with 6x months of resource to build 2x new POPs (points of presence) and everything that goes with it. I'd have to go through the eye of a needle - but if somebody wanted to spend $50m on Amazon - don't even ask, it's approved - the costs just got absorbed as part of "the bill" and I believe we were in the top 10 largest AWS customers at the time.

Then you have performance. At the start AWS was pretty crap for latency because of the locations, so our own global network was way better - but eventually they brought out "global accelerator" which allowed us to connect players into the closest AWS edge nodes and run them on servers as close as possible to them, whilst using the collosal AWS backbone. It got to the point where - combined with the servers inside AWS and global accelerator - we simply couldn't compete with it. If I'd have spent $1Bn on our network it would still have fallen way short, it wouldn't have made sense.

Cut to the chase - most of the stuff started moving away from our infrastructure to AWS - I quit the company, and am now a senior design engineer at AWS...
 
Soldato
Joined
21 Jan 2010
Posts
21,947
Cloud companies like AWS/Azure/Google whatever, are so utterly enormous, that parts of them are always "on fire" in the sense that on any given day, all of those providers will be having some sort of large scale outage. The reality is, these companies are so large, that they're often able to shift things around, and are so resiliant - nobody ever knows. Automation is at such a level, that most faults get fixed entirely by automation and don't require human intervention at all. Obviously occasionally, something really bad happens and we see these global outages - but on the whole, this isn't actually anything new, we do get large scale outages from time to time, due to one reason or another.

In terms of the benefits of the cloud, I'll give you an example;

Back in 2015 I helped design and build the worlds largest low-latency gaming network, designed purely to give players the lowest possible ping. We had hundreds of locations all over the world, in the end we probably spent in the region of $500m and it worked very well. At the same time - we ran most of our servers on "bare metal" in our own DCs. (all of this costs a fortune, required around 150 people to run it all)

Eventually, we started playing with AWS - the benefit, is that it's just so easy to turn stuff on, because it just is... Time goes on - devs just start doing stuff inside AWS because it's easy and fast and it works well. To the point that, it's actually easier to login to the AWS portal, spin up your stuff and get it going - than it is to walk 10ft across the office and ask someone to build it on your own infrastructure (which already exists). Coupled with the fact that "because it's Amazon" people just sign it off.

It got to the point, where if I wanted say, $5m in network hardware and costs, combined with 6x months of resource to build 2x new POPs (points of presence) and everything that goes with it. I'd have to go through the eye of a needle - but if somebody wanted to spend $50m on Amazon - don't even ask, it's approved - the costs just got absorbed as part of "the bill" and I believe we were in the top 10 largest AWS customers at the time.

Then you have performance. At the start AWS was pretty crap for latency because of the locations, so our own global network was way better - but eventually they brought out "global accelerator" which allowed us to connect players into the closest AWS edge nodes and run them on servers as close as possible to them, whilst using the collosal AWS backbone. It got to the point where - combined with the servers inside AWS and global accelerator - we simply couldn't compete with it. If I'd have spent $1Bn on our network it would still have fallen way short, it wouldn't have made sense.

Cut to the chase - most of the stuff started moving away from our infrastructure to AWS - I quit the company, and am now a senior design engineer at AWS...
We are also noting that firms are avoiding investment in bare metal due to talent strategy. You buy AWS, you get the best talent in the world without having to find and nurture it yourself.
 
Soldato
Joined
29 Dec 2014
Posts
5,758
Location
Midlands
We are also noting that firms are avoiding investment in bare metal due to talent strategy. You buy AWS, you get the best talent in the world without having to find and nurture it yourself.

Yeah if I was hiring decent engineers to work for a hypothetical company, I know that the good ones are eventually going to talk to people, word will get around - then one day, they're going to get canvassed on Linkedin, and get offered $$$ to do cutting edge tech + Visa to the US (or wherever) which I would never be able to compete with..
 
Associate
Joined
20 Nov 2016
Posts
764
Cloud companies like AWS/Azure/Google whatever, are so utterly enormous, that parts of them are always "on fire" in the sense that on any given day, all of those providers will be having some sort of large scale outage. The reality is, these companies are so large, that they're often able to shift things around, and are so resiliant - nobody ever knows. Automation is at such a level, that most faults get fixed entirely by automation and don't require human intervention at all. Obviously occasionally, something really bad happens and we see these global outages - but on the whole, this isn't actually anything new, we do get large scale outages from time to time, due to one reason or another.

In terms of the benefits of the cloud, I'll give you an example;

Back in 2015 I helped design and build the worlds largest low-latency gaming network, designed purely to give players the lowest possible ping. We had hundreds of locations all over the world, in the end we probably spent in the region of $500m and it worked very well. At the same time - we ran most of our servers on "bare metal" in our own DCs. (all of this costs a fortune, required around 150 people to run it all)

Eventually, we started playing with AWS - the benefit, is that it's just so easy to turn stuff on, because it just is... Time goes on - devs just start doing stuff inside AWS because it's easy and fast and it works well. To the point that, it's actually easier to login to the AWS portal, spin up your stuff and get it going - than it is to walk 10ft across the office and ask someone to build it on your own infrastructure (which already exists). Coupled with the fact that "because it's Amazon" people just sign it off.

It got to the point, where if I wanted say, $5m in network hardware and costs, combined with 6x months of resource to build 2x new POPs (points of presence) and everything that goes with it. I'd have to go through the eye of a needle - but if somebody wanted to spend $50m on Amazon - don't even ask, it's approved - the costs just got absorbed as part of "the bill" and I believe we were in the top 10 largest AWS customers at the time.

Then you have performance. At the start AWS was pretty crap for latency because of the locations, so our own global network was way better - but eventually they brought out "global accelerator" which allowed us to connect players into the closest AWS edge nodes and run them on servers as close as possible to them, whilst using the collosal AWS backbone. It got to the point where - combined with the servers inside AWS and global accelerator - we simply couldn't compete with it. If I'd have spent $1Bn on our network it would still have fallen way short, it wouldn't have made sense.

Cut to the chase - most of the stuff started moving away from our infrastructure to AWS - I quit the company, and am now a senior design engineer at AWS...
Thank you :)
 
Man of Honour
Joined
31 Jan 2004
Posts
16,335
Location
Plymouth
Was surprised to see this the other day. But as a Fastly shareholder I'm pretty happy for the publicity, it's a great little company!

And not so little it seems...gov.uk is pointed directly to them!

Would be nice if the share price recovered after the drop due to Tiktok drama.... :)


AWS and its myriad of services is great (I use Global Accelerator and other fun things), but sometimes there is a company out there which does what you need slightly better.
 
Back
Top Bottom