High availability across physical sites without a single point of failure

Redgie · 17 Jan 2017 at 10:17

I'm researching for a project whereby a web-hosting solution must be in place that guarantees the highest possible uptime, without fail. This means that cross-site servers are a must, and due to the nature of the project, an Active-Passive setup is the most likely (whereby one site is responsible for hosting, until a fault occurs or a manual intervention is done, at which point the passive site takes over and they switch titles).

The problem I have is that the load balancer is going to be a single point of failure. Now, Digital Ocean did a great article on this, but unfortunately solved the problem using a "floating IP", which is a problem for 2 reasons:

It's a product, not a technology, so we'd be tied to a provider of said floating IP
The service responsible for dealing with the floating IP still represents a single point of failure

That said, their diagram did perfectly illustrate what I'm trying to achieve (albeit with an internal IP, whereas mine would be public):

One other common solution is to have 2 (or more) load balancers, with a DNS record for each site attached to domain.com, so that every user will have the ability to try every load balancer in case one goes down, however this seems to have its own problems:

Some clients will not rotate to the next IP if the first fails
The timeouts for failing to a 2nd IP can be large
This approach puts a lot of responsibility on the user
If a change needs to be made, the TTL of the DNS record becomes important

Finally another option is to use a virtual IP, however due to the need for the redundant server(s) to be located in a different geographical area, it seems that the same-subnet limitation of virtual IPs makes this approach unsuitable.

Am I missing a fundamental, and commonly known solution to a cross-site solution with no single point of failure? Thanks in advance

Redgie · 17 Jan 2017 at 14:16

Hmm, so what's the failover procedure in the event of the primary load balancer failing? How does the client know to look for the secondary balancer?

Redgie · 17 Jan 2017 at 14:37

But does that not then have the same issues as with any DNS based solution (caching of DNS records, slow TTLs, client side oddities etc.)?

Redgie · 17 Jan 2017 at 16:16

Damn, was really hoping there would be something I was missing.

Oh well, thanks for the help

Redgie · 18 Jan 2017 at 07:51

Interesting, the background for this project is healthcare, so there are certain scenarios where availability needs to be guaranteed. But I definitely take the point about ensuring the system isn't over-complicated in the pursuit of the elusive five 9's.

Perhaps then the DNS solution is appropriate. Now to find out exactly which implementation of a DNS solution is the most appropriate.

Thanks for all the information guys. And for the book recommendation Caged