VMWare Disaster Recovery Kit Debate

blueboy2001 · 10 Apr 2014 at 00:23

Production VMWare stack is 5x HP DL380's - 60 cores, 1Tb RAM. Recently upgraded, 50% RAM usage, CPU < 20%.

We have a DR site with 2 old HP servers, 24 cores, 128Gb RAM. These won't run enough of our infrastructure to provide a worthy DR solution. 24 current cores and 512Gb RAM is the requirement, so we need 2 new servers.

The debate has come about over whether we should have identical servers in our DR site. From a standardisation point of view good, but if there is a major firmware bug or hardware issue we might experience the same problem rendering the DR site useless.

The other issue is we're looking at HP DL380 Gen 8's vs Dell R720's, Dell are offering a slightly better spec but £5k cheaper over a pair of servers. I prefer Dell, colleague prefers HP.

Which way would you jump?

Caged · 10 Apr 2014 at 00:37

Virtualisation has made the hardware almost irrelevant - yes you don't want to use some whitebox crap but the areas where HP would excel over others would be in things like iLO and the ease of servicing parts when the server is still running, except none of that matters any more since you can put a host into maintenance mode and it won't impact your services at all. For that reason I'd really struggle to buy HP over Dell when there's a price chasm between them, all else being equal (and it pretty much is).

And for the same reason I don't think the DR site hardware matters as much either, as long as the CPU capabilities are the same and the hardware is beefy enough to do what you need it to.

I take it you're using Site Recovery Manager?

anything I don't mind · 10 Apr 2014 at 11:31

What we do for DR is have a basic setup and then in the event of a DR we rent servers for the first week from the DR company and utilize our netapp snapmirror synced datastore on these rented servers. This is most cost effective. Once the DR goes in to the first week and if it is going to last longer they can use insurance and the HP support contracts to get new servers on site within 24 hours. This is a lot cheaper than having dedicated hardware at the DR.

We rent half a rack at the DR and have one esxi box and one switch and one physical server. It needs to be updated as they want xenapp gateway failover. So looking at adding in some netscaler.

Deleted member 138126 · 10 Apr 2014 at 11:39

2 different brands = twice as complex to manage, 2 different support contracts, 2 different ways of doing everything. The increased overhead of this VASTLY outweighs the infinitesimal risk you mentioned. Having both the same actually improves your risk, as you upgrade firmware at your DR site, run it for a few weeks, then upgrade at primary site.

leigh_boy · 10 Apr 2014 at 11:49

agreed with ROTOR i would want to keep it the same all round so if does cost 5k more thats nothign in the grand scheme of things when the vms will not fire up.

ecksmen · 10 Apr 2014 at 19:43

rotor said:
2 different brands = twice as complex to manage, 2 different support contracts, 2 different ways of doing everything. The increased overhead of this VASTLY outweighs the infinitesimal risk you mentioned. Having both the same actually improves your risk, as you upgrade firmware at your DR site, run it for a few weeks, then upgrade at primary site.

That's not strictly speaking true though, when you consider the VM layer would be a constant, while there is increased complication that only really depends upon the complexity of the environment and the rate of change. If little changes week on week and the core infrastructure remains constant then that perceived level of complication isn't perhaps a big deal.

Similarly, treating the DR platform as a test bed for patching is also eroding some of the point of having DR in the first place. If you're uncertain a patch or change will work in live they why risk the ability to invoke DR if that patch has crippled the DR ability.

Deleted member 138126 · 11 Apr 2014 at 13:02

ecksmen said:
That's not strictly speaking true though, when you consider the VM layer would be a constant, while there is increased complication that only really depends upon the complexity of the environment and the rate of change. If little changes week on week and the core infrastructure remains constant then that perceived level of complication isn't perhaps a big deal.

Similarly, treating the DR platform as a test bed for patching is also eroding some of the point of having DR in the first place. If you're uncertain a patch or change will work in live they why risk the ability to invoke DR if that patch has crippled the DR ability.

That's a lot of "ifs" you've got in there. Complexity is bad for reliability. So my question to you is this: why would you? You haven't given a single reason why it's a good idea to have 2 different hardware vendors, just because you can save £5,000 one-off.

ecksmen · 12 Apr 2014 at 10:00

rotor said:
That's a lot of "ifs" you've got in there. Complexity is bad for reliability. So my question to you is this: why would you? You haven't given a single reason why it's a good idea to have 2 different hardware vendors, just because you can save £5,000 one-off.

Its only your opinion that complexity is bad for reliability, this is simple not true. DR would be considered more complex than a single site solution, but which is better for over all reliability?

I do not share the opinion that having two hardware providers is a complex situation and even less so in a handful of servers, that if anything is not complex but perhaps thats where our opinions differ.

The only point that matters is that the DR hosts are capable of sustaining the load in a DR environment whatever the hardware vendor and that it is not treated as a pre-prod / testing environment as you recommended which erodes the purpose of having DR in the first place.

VMWare Disaster Recovery Kit Debate

Deleted member 138126

Deleted member 138126

Deleted member 138126

Deleted member 138126