Strange MS network load balancing problem

Associate
Joined
29 Dec 2007
Posts
1,414
Location
London
Hey guys,

I couldn't decide whether to stick this in the enterprise or network section, so I went for here.

I've a strange issue which I can't seem to get my head around.

We have 2 virtual web servers which host client websites. They're in a load balanced cluster using the Windows Server 2003 default nlb program.

I've changed pulic IP for privacy.

WEB1:
Public NIC: 10.10.10.139
Private NIC: 192.168.220.30

WEB2:
Public NIC: 10.10.10.140
Private NIC: 192.168.220.40

CLUSTER IP: 10.10.10.150

NLB settings:
Both servers unicast
Both servers have a different host priority (1, 2)
WEB1 "dedicated IP address": 10.10.10.139
WEB2 "dedicated IP address": 10.10.10.140

This has been in operation for about 2 years, but I've found a pretty major flaw today.

Now from the surface all looks well, the settings look totally correct in the nlb manager.

WEB2 was drainstopped and restarted (unrelated issue). WEB1 should obviously notice and start serving the connections, but we had instant complaints that the websites were down.

Order of events:
WEB1/2 both servicing web requests, equal load.
WEB2 drainstopped
WEB2 restarted
WEB1 stops servicing web requests
WEB2 comes back up
WEB1 still not servicing web requests
WEB2 added back to the cluster
WEB2 servicing connections
WEB1 still not servicing web requests
WEB1 drainstopped
WEB1 added back to the cluster
WEB1/2 both servicing connections

What the monkeys!?

Something that may well be to do with it is:
Opening "Network Load Balancing Manager" on either server and connect to the cluster with ip 10.10.10.150
The configuration from the whichever machine your on is loaded but we get "Host unreachable, error connecting to "othermachinename.domain.biz" when reading the configuration from the other web server.

Googling this seems to point to icmp being blocked. There is no FW between the servers. We can ping each NIC from each machine fine.

I've been reading everything I can find to do with load balancing but I'm yet to find anything which could be wrong. This was set up before I joined the company.

Any ideas/pointers?
 
You are running in Unicast mode and the issue you have is with ARP tables on the client machines.

You need to enable multicast, or add in a second adapter to allow NLB intercoms.

Edit -

I don't know how this could ever have worked...
 
Last edited:
Are you using teamed NICS? If so that could be a cause.
M.

No not teamed nic's

You are running in Unicast mode and the issue you have is with ARP tables on the client machines.

You need to enable multicast, or add in a second adapter to allow NLB intercoms.

Edit -

I don't know how this could ever have worked...

I did initially think this would be the problem. But then I had a read of

http://technet.microsoft.com/en-us/library/cc782694(WS.10).aspx

When you use the unicast method, all cluster hosts share an identical unicast MAC address. Network Load Balancing overwrites the original MAC address of the cluster adapter with the unicast MAC address that is assigned to all the cluster hosts.

When you use the multicast method, each cluster host retains the original MAC address of the adapter. In addition to the original MAC address of the adapter, the adapter is assigned a multicast MAC address, which is shared by all cluster hosts. The incoming client requests are sent to all cluster hosts by using the multicast MAC address.

Select the unicast method for distributing client requests, unless only one network adapter is installed in each cluster host and the cluster hosts must communicate with each other. Because Network Load Balancing modifies the MAC address of all cluster hosts to be identical, cluster hosts cannot communicate directly with one another when using unicast. When peer-to-peer communication is required between cluster hosts, include an additional network adapter or select multicast mode. When the unicast method is inappropriate, select the multicast method.

We're running unicast mode on the the public NIC's but we have a seperate NIC for private admin communication (i think you may have missed that :p). Is there anything special I need to setup in nlb manager to specify comms between the servers on the private (192.168.220.x) network?

I may well have misunderstood that, so I'd be greatful if you could elaborate a tad.

Thanks :)
 
Last edited:
Another important factor is that you said VM. If these are VMware Vswitches don't support unicast and they recommend Multicast.
 
Another important factor is that you said VM. If these are VMware Vswitches don't support unicast and they recommend Multicast.

They are indeed Vmware vm's, hosted and maintained at a 3rd party location, we've no idea what setup they have to be honest. Thanks for the pointer, I'll get digging :)
 
Back
Top Bottom