vSphere HA problems

Associate
Joined
1 Dec 2005
Posts
803
Overnight I had a PSU blow and take out the power circuit that it shared with the my other ESXi hosts. When I managed to power one of them up this morning and get my vCenter VM started again, it seemed to be having trouble reconnecting to the host. I went through the loop of forcing a re-connect and it seemed to re-install HA on the host, but kept timing out while waiting for the election process to complete.

Any ideas on the best way to solve this? I really need to get some of the other VMs running on the host while I fix the other host with the blown PSU (and fix the broken power circuit too).

Cheers
 
Not explicitly, but it looked like vCenter did that as part of the re-connect process. I'm not in front of the system at the moment, is the agent in the list of services under security profile?

It's sod's law... only enabled HA yesterday and it all went to pot overnight!
 
Thanks for the link, I'll give that a go later. It's all 5.1, and I have now got what I needed to running again. After a bit of Googling it seems like the general fix tends to be 'disable HA then enable it again'. Well the disabling worked a treat and allowed vCenter to see the host again - for a short while, then it would disconnect. I didn't have time to diagnose that but I was able to attach the VMs to the inventory on the working server and boot them up ok.

I'll come back to it tonight when I've got time. Something somewhere is clearly not happy...
 
Hmm... the vCenter server needed some Windows updates and since it was pretty much useless at the moment I decided to install them and reboot it. Since the reboot, it seems happy. The working host is connected, the vApps and resource pools are good, VMs all showing as running correctly.

I'm wondering if any of this is related to originally booting vCenter before either of my domain controllers were up? They're also running DNS - could part of this have been a name resolution issue?

I'll not try HA again until I've got another host powered up, just in case.
 
I think I tried this before. A restart at the Vpshere client did not help here. A hard reboot was done instead. But this was only once.
 
I think you're spot on with your thoughts as to why it happened - you normally need DNS available for it to work properly:

All hosts in a VMware HA cluster must have DNS configured so that the short host name (without the domain suffix) of any host in the cluster can be resolved to the appropriate IP address from any other host in the cluster. Otherwise, the Configuring HA task could fail. If you add the host using the IP address, also enable reverse DNS lookup (the IP address should be resolvable to the short host name).

I've seen this catch more than a few of our clients out; you can get around it by adding entries (short and fqdn) to the host files on your VC and ESX hosts.
 
Shaz]sigh[;24300660 said:
DNS requirements were removed as part of the 5.0 release, are you running 4.x?

Just googled this and you're absolutely right, although it's still considered best practice to ensure all hosts are fully contactable via DNS; I'd still recommend using the hosts files on small deployments to remove DNS as a potential issue.
 
Last edited:
Back
Top Bottom