VMWare vSAN: Controlled shutdown to single host in power failure - does it work?

Soldato
Joined
19 Oct 2002
Posts
2,714
Location
Auckland, New Zealand
I'm playing around with VMWare vSAN/HA as I might purchase VMUG Advantage. I have a Dell 1920w UPS that powers the 3 host homelab (smooths out issues with power delivery and NZ is prone to power cuts).

In the advent of a power cut, the UPS can keep all 3 machines plus Unifi switch up for 30 minutes before cutting out and shutting it all down. I'd like to extend this as far as possible, and would like to drop to a single host with switch to ensure that I still have domain services etc. running for as long as possible.

With HA / DRS turned on, is there anyway to ensure that I can do a controlled host shutdown on 2 hosts while keeping a single host up? I've configured my VM / Host groups using the 'should' controls to try and keep the key VMs running on a single host and the rest to be on the other 2 hosts... but I'm not sure whether I can then force them to not run when a single host is left? Would the above work and it would continue to keep all VMs running until out of memory and the power the rest up when power was restored?

Just thoughts at the moment and happy to be corrected completely if I'm on the wrong tact... Clearly before shutting a host down it should really be in maintenance mode with HA disabled but in the context of a power outage this might not be the best option.

Thanks,

Chris
 
How are you detecting a power failure and passing that to vCenter? You can do virtually anything with PowerCLI so as long as you can pass the power cut event then Powershell will be able to handle all the process for you - disable HA, migrating machines, maintenance mode and power down.
 
I guess I'm not! The Dell MUMC application controls the UPS and is installed in a VM that is on the keep running list but that is it. MUMC can talk to the esxi hosts and has can set them to shut down but I don't think it can control HA / DRS at all.
 
Interesting. I'll have a look at that. I'm interested primarily in shutting down the vms not needed, disabling HA / DRS, migrating the others to the final host, shutting down the two hosts. On power up, all 3 hosts should come back online with probably only the essential VMs back up and running as the rest I can start when required - hopefully this can be achieved by using the groups I've defined as that would be much easier.
 
Have a read of

ftp://ftp.dell.com/Manuals/all-products/esuprt_ser_stor_net/esuprt_rack_infrastructure/dell-line-interactive-tower-ups-500t_Reference%20Guide5_en-us.pdf
 
Thanks Caged; I presume if I have defined that I want certain VMs preferring the last host remaining (VMHost1) then ESXi will prefer these running over others that are migrating? E.g. Essentials, MBX1, Zevenet Load Balancer, vCenter are preferred on host 1 and I'm hoping that esxi would migrate these on to the host first and then any others if there is space left before shutting down the rest?
 
So you have a three node vSAN and are planning on "two node" failure situation? How is this possible?
Basically you are looking for trouble. Putting the integrity of vSAN at the mercy of a bunch of home-made PowerCLI scripts would not make me sleep well at night. I have yet to see VM shut-down scripts that work reliably, if at all (a lot of the time they rely on the OS behaving itself 100% correctly); then you're into the realm of "ok, I've waited two minutes for it to shut down cleanly, it's still not off, so I'm going to force power it off", at which point... what have you really gained with the script? And if I know one thing about vSAN is that when it works, it works great, but when it breaks, it breaks real good.

The only way that vSAN makes sense for me at home (as a way of learning it) is with nested ESXi (on a single host). Play with it, learn it, get rid of it. And don't put anything valuable on it while you're doing it!
 
Basically you are looking for trouble. Putting the integrity of vSAN at the mercy of a bunch of home-made PowerCLI scripts would not make me sleep well at night.

But he's not doing that. He wants to manage the VMs gracefully in a poweroff event, not configure vSAN. You obviously have a lot less luck at PowrCLI than I do seeing as I use it widely in an enterprise and have yet to see a properly written one fail.
 
But he's not doing that. He wants to manage the VMs gracefully in a poweroff event, not configure vSAN. You obviously have a lot less luck at PowrCLI than I do seeing as I use it widely in an enterprise and have yet to see a properly written one fail.
He wants to manage the automated unattended shutdown of his estate that runs on top of vSAN. Have I got that wrong?

I’d be worried for your enterprise if you are scripting the automated shutdown of ESXi hosts running vSAN!
 
We don't run vSAN except in test cluster.

So trying to manage host shutdowns it is worse than just letting them all turn off when the UPS runs out? I bow to your experience.
 
We don't run vSAN except in test cluster.

So trying to manage host shutdowns it is worse than just letting them all turn off when the UPS runs out? I bow to your experience.
A) chill
B) you clearly haven’t read what I posted, which is that I don’t think you should use vSAN to store anything important on your home lab
 
And by the way, there is no enterprise that is managing host shutdowns via automated script. That is just looking for accidental shutdowns. In an enterprise, a UPS is a 30 second solution to cover the time it takes the redundant generators to kick in.
 
For the second time. I was talking about scripting VM shutdowns and migrations so that he can concentrate all VMs onto one host as the OP requested not host shutdowns. HP MUMC and vSphere itself manages host shutdowns if required there's no need to script it.
 
Scripts aside, a three node vSAN cluster will not let you take two of those nodes down happily. As it stands, it would only let you put a single node in maintenance mode, so how you can safely take two out of action is beyond me. I've built multiple vsan clusters for various enterprises over the past few years and have never been able to do this - try putting two of your hosts in maintenance mode... vsphere will not let you.

The consistency of the cluster would be seriously compromised shutting down two of the nodes, not to mention the fact that he would absolutely need to be certain that one node had enough storage to run all VM's. Assuming you have a FTT of 1, then you're boned trying to run it on a single host.

Three nodes is already one less than I would ever recommend for a vSAN cluster for this reason. I've had to deal with some huge issues in three node clusters after host failures. Three may be good for a home lab, but you still have the same constraints as any other deployment.
 
Hi guys, thanks for all the replies. I am aware generally of how vSAN works at a simple level and a 3 node cluster is running 2 copies of data and a witness, which is why it is unprotected with a single node loss, I was just hoping that vSAN might have had a bit of stuff built in which it doesn't and understandibly so!

I've been toying with Starwinds vSAN for a while now, just running on the VMWare setup to see how it works. This is a pure replication using iSCSI so it can drop to a single node and still keep all the VMs up but on a 2 node setup it can using heartbeat synchronization in order to avoid node majority... seems fairly resilient from what I can test. Its not contained within the hypervisor kernel so speeds are lower I believe but peoples testing suggest not by a vast amount and for a home network should be fine.

Starwind vSAN is a interesting thing as its free and paid; free has a 30 day console plus virtual tape library and then it reverts to powershell only which can do most of what the console can do... As it runs on windows it can use storage spaces to concatenate drives together or use parity etc.

My options will etiher be 2 node vSAN with witness on separate exsi host which is fine as I have a host in the vsan currently that runs my xpenology with passed through HBAs and so that vm can't vmotion anyway so changing that host so it is a witness and the final host to shut down could be ok, or to run a 2/3 node Starwind vSAN and control VM shutdowns via scripting.

I'm curious if any of you have looked in to starwinds software?
 
Have a look at EMC Scaleio... you can download the full version for free (licensed version is a huge cost, but free is identical sans support).

It's a bit more involved to setup, but easy enough if you can follow the instructions. Performance wise it beats vSAN easily, and it is much more flexible/scalable.
 
Back
Top Bottom