AD Slow Performance with VMware (and some Procurve questions thrown in)

slylittlefox · 14 Oct 2008 at 15:00

Hi all,

I work for a company who deploys networks to small companies. Because of the cost involved, we're trying to avoid the expense of an ESX license, and so we've been using VMware Server 1 and 2 to run AD servers on top of PowerEdge 2900s.

However, at the past couple of sites we've installed - the general performance of the workstations used throughout the organisations has been dire. I know we should be using ESXi since its general release, but there's a lot of work involved with planning and changing scripts etc so we're still in testing.

In the meantime we're stuck with Server 1 & 2, and the dire performance as a consequence. It seems to be a random problem - at different times during the day the whole network (i.e. the workstations connected to the domain) will grind to a halt, the computers will lock up when trying to save files etc.

Has anybody any experience with using VMware Server in a production environment? Or indeed has anybody experience with running AD controllers in a virtualised environment?

Also - I don't know if this is related: but generally at most of the sites we deploy two switches, one in the server rack and one in a comms cabinet. They're Procurve 1800/2510/2610/2810's - so not cheap kit (in context of course) - and we tend to trunk two (gigabit) ports on each to act as the link between them. However, am I right in thinking that a trunk on a Procurve is not the same as a trunk on a Cisco? And rather LACP on a Procurve is equivalent to a Cisco trunk? So are we setting the switches up wrong?

Any help or ideas (or sympathy

) is greatly appreciated.

VeNT · 14 Oct 2008 at 16:15

isn't it free now?

slylittlefox · 14 Oct 2008 at 17:42

Sorry yea it is free - that's what I meant by general release. We've been testing it since it was released but are still a couple of weeks off being able to roll it out.

atomiser · 14 Oct 2008 at 17:49

what else are you running on the 2900's natively? and what else are you running on the 2900's in vmware? what cpu, memory, and disk are you running on the 2900's? and what network mode are you running in vmware.

in terms of your switch questions... fairly sure cisco switches support both lacp and etherchannel, the latter of which i believe to be proprietary, for when you want to bond links together. as far as i am aware the procurves support lacp. assuming your doing procurve <-> procurve then lacp should be fine.

i think you can set them up in two ways though? so you either have an active/standby channel with failover for redundancy, or an active/active channel for performance and redundancy... so, you might want to look at how they are setup?

morfmedia · 14 Oct 2008 at 17:52

AD can run fine using VMWare just ensure time sync is well covered.

I've used it for remote offices (10-30 users) and works fine

slylittlefox · 14 Oct 2008 at 18:10

atomiser said:
what else are you running on the 2900's natively? and what else are you running on the 2900's in vmware? what cpu, memory, and disk are you running on the 2900's? and what network mode are you running in vmware.

in terms of your switch questions... fairly sure cisco switches support both lacp and etherchannel, the latter of which i believe to be proprietary, for when you want to bond links together. as far as i am aware the procurves support lacp. assuming your doing procurve <-> procurve then lacp should be fine.

i think you can set them up in two ways though? so you either have an active/standby channel with failover for redundancy, or an active/active channel for performance and redundancy... so, you might want to look at how they are setup?

Thanks for the response

The 2900s are Quad Core Xeons, one of the 2900s in question has 16GB of RAM, 2 x 400GB SAS and 2 x 750GB SATA. And at the same site we have two Procurve switches, with two trunked (not LACP'd) links. One switch is all Gigabit for the servers, the other has 2x Gbit and 10/100 for the rest.

Staying with the same server, there's 3 VMs - 2 Windows 2003 member servers (running a pretty dormant SQL server, and the other nothing as of yet), and then the last VM is the AD DC in question.

We made some changes to VMwares config yesterday that seemed to make a great improvement to the performance on the server itself. We configured it so that VMware no longer used the vmem file and instead just used the allocated memory - with a shm partition available to it if it needed. CPU use went from 80%-100% avg down to 10%-50% - but unfortunately the change hasn't been visible to the end-users.

I have been thinking about the fact that all three VMs are on the same SAS RAID 1 array, one of my colleagues has suggested we could try adding a couple more disks and splitting them across the arrays. But surely SAS is good enough to handle one server that's reasonably used and two others that are pretty much laying dormant?

The googling continues

EDIT: Sorry forgot to address your points about the Procurves - we're not using LACP for the Procurve <-> Procurve connection, just a straight trunk. Does LACP offer anything else other than redundancy?

slylittlefox · 14 Oct 2008 at 18:11

morfmedia said:
AD can run fine using VMWare just ensure time sync is well covered.

I've used it for remote offices (10-30 users) and works fine

Yep - we've upgraded to Server 2 which sorts the time sync issue out but the issue still persists :/

atomiser · 14 Oct 2008 at 18:45

sounds like your vm boxes are reasonably pokey, wouldn't do any harm to split the vm's out onto their own disks though. can you clarify what you mean by 'just a straight trunk'? have you literally just cabled the switches together twice? if so, are you running spanning tree on the switches, otherwise you will have a loop?!

slylittlefox · 14 Oct 2008 at 20:45

atomiser said:
sounds like your vm boxes are reasonably pokey, wouldn't do any harm to split the vm's out onto their own disks though. can you clarify what you mean by 'just a straight trunk'? have you literally just cabled the switches together twice? if so, are you running spanning tree on the switches, otherwise you will have a loop?!

Sorry - yea it's definitely not creating a loop - the switches haven't died a death

With the ProCurves - you can either make Trunks or LACP Trunks. Afaik the LACP trunks are as you say: they provide redundancy and also allow you to bond the links like a normal trunk giving you 2Gbps and redundancy. I think the 'Trunk' option just turns the two ports in to an Uplink/Downlink that dies if one of them is removed.

/helplessly scouring the VMware forums

atomiser · 14 Oct 2008 at 20:59

when the cack hits the fan are the vm's showing high utilisation? are you using the bridge networking mode so the vm's have 'real' ip addresses? can you, therefore, connect a client directly to the server via a crossover? have you looked at the stats on the switch ports? any errors? have you checked basics like ensuring the cable is ok? making sure there isn't any issues with anto-neg between the server and switch? and the switches? has it been ok and then things have suddenly started to give you issues? has anything in the environment changed? also, what av software you running on the vmware host, and the guests? are there any scheduled jobs that just happen to be running 4x over because of the nature of the environment that could be causing things to bog down?

s0ck · 15 Oct 2008 at 09:36

Wonder if VMWare is a red herring here? I'd be LACP'ing the links, although you did mention initial high CPU utilisation.
We, too, use VMWare Server for Small Business customers and utilisation is usually very low, as per normal machines. I do always make the point of putting each VM on it's own array but it shouldn't be a huge problem if the box is only reasonably used.
What is the base OS? 2003/2003SBS? I wouldn't feel comfortable using SBS as the base OS and haven't done it yet...

Stolly · 15 Oct 2008 at 22:42

atomiser said:
what else are you running on the 2900's natively? and what else are you running on the 2900's in vmware? what cpu, memory, and disk are you running on the 2900's? and what network mode are you running in vmware.

in terms of your switch questions... fairly sure cisco switches support both lacp and etherchannel, the latter of which i believe to be proprietary, for when you want to bond links together. as far as i am aware the procurves support lacp. assuming your doing procurve <-> procurve then lacp should be fine.

i think you can set them up in two ways though? so you either have an active/standby channel with failover for redundancy, or an active/active channel for performance and redundancy... so, you might want to look at how they are setup?

LACP/Etherchannel are both 802.3ad and are the one and the same IIRC.

Stolly · 15 Oct 2008 at 22:44

need to know how much ram in the host, and how much given to each guests, is this just one customer or a general problem with a few ?

Is it the same time each day ? What is the OS on the host ? Are you running VMware server here ?

sidethink · 16 Oct 2008 at 09:08

Have you pre-allocated your space on the virtual hard disks?

slylittlefox · 16 Oct 2008 at 09:48

Stolly said:
need to know how much ram in the host, and how much given to each guests, is this just one customer or a general problem with a few ?

Is it the same time each day ? What is the OS on the host ? Are you running VMware server here ?

This seems to be a general problem with three or four sites/customers, all similar setup. And the slowdowns occur during the day, so under relatively light load with all the users logged on to the domain and carrying out day-to-day operations. There's only 20 users at most at each site so I would have thought the setup we've given them would be more than capable. Of course, I could be wrong

I'll give two examples of the sites in questions:

Site A:
Host: Quad core Xeon, 16GB RAM, 2x SAS 400GB, 2 x SATA 750GB, VMware Server 1
VMs: 1 CPU, 3.6GB RAM
VM1: Windows 2003 SP2, AD DC, Exchange 2007 (15-20 mailboxes)
VM2: Windows 2003 SP2, SQL Server
VM3: Windows 2003 SP2

Site B:
Host: Quad core Xeon, 8GB RAM, 4 x SAS 400GB, VMware Server 2
VMs: 2 CPUs, 3.6GB RAM
VM1: Windows 2003 SP2, AD DC, Exchange 2007 (10-15 mailboxes)
VM2: Windows 2003 Terminal Server - not yet powered on

With Site A, I'm soon to be upgrading to VMware Server 2 and giving VM1 8GB (maybe more?) of RAM.

I know some might say it's a lot to be running an AD DC and Exchange on a VM with 3.6GB of RAM, but we've modded the VMware Server configuration as per this article and so there should be no disk swapping going on - and within the VMs they're only using half of their allocated RAM. I'm confused as to what's causing the problem

Thanks everyone for all your ideas and feedback

slylittlefox · 16 Oct 2008 at 09:48

sidethink said:
Have you pre-allocated your space on the virtual hard disks?

Yep - all allocated on disk creation and not split into 2GB parts