Anyone using HP P4500 SAN + VMware?

Lanz · 26 Jul 2011 at 17:20

Just wondering if any of you guys are using this combo?

Lanz · 26 Jul 2011 at 17:35

HP P4000/P4500 Lefthand people is what I need, thanks.

Lanz · 26 Jul 2011 at 20:06

Shotshots are taking ages, like 12+ mins for a 8GB VM, and we are losing heartbeats for maybe 10 seconds. This is only testing with 1 or 2 VMs, and we plan to put 150 on this thing.

I'm looking for a case study or some success stories of people hosting 100+ VMs on them, as HP says our config is correct, yet VMware support says there's something wrong with our SAN - We are going around in circles and getting nowhere.

I just want proof that there are people out there using this combo and having decent performance.

cheers

Lanz · 27 Jul 2011 at 10:23

BQ888A is the model, we have lots of them, 8 node cluster at each site, but not global clusters - just local network raid 10.

Esxi snapshots, not array snaps are taking time.

Its connected to a stack of 4 x Cisco 3750's that act as a single switch.

Flow control is enabled, as is Jumbo frames (which we might disable as it causes latency)

We are using BL460c G7 blades that are connected to Flex10 VC modules which have multiple 10G CX4 connection for iscsi to the 3750's.

We were sold these as a solution that will work with 100-200 VMs, but I know of no other company using this combo with that many. Most people just have 2 nodes, 10-15 VMs in a SMB type enviroment.

Lanz · 27 Jul 2011 at 13:09

The stack is dedicated to iscsi.
They are 3750-X, and there's no drops.
Flow control is on, spanning tree is disabled.
dedicated iscsi traffic, no VLANS.

Final design will have multiple vmks in each VMFS for each iscsi gateway (VIP) so 8 node P4500 cluster will have maybe 7 datastores, leaving one node for failover. 4 Node esxi cluster.

We are working with just a single datastore and a few VM's to test though.

A simple test that brings it to its knees:

Migrate a VM from datastore1 to datastore2, and do a snapshot of a VM already on datastore2 at the same time.

Pinging the VM that your snapshotting will result in lost packets for maybe 10 seconds , it'll get a red alert in virtual center and take maybe 12 mins to finish, all the while the pings will be terrible.

So in a production enviroment, a simple storage migration or snap will end up effecting all other VMs in that datastore.

Lanz · 27 Jul 2011 at 16:00

We took Jumbo off the exsi end and it improved, we are now taking Jumbo off the storage nodes so there's nothing using it.

VM we are using for testing has 8GB Ram, the datastores are using 8MB blocks on a 750GB VMFS volume.

The Blades have NC553i flex10 nics, that have HBA option inside the Virtual Connect, we have tried Hardware and Software based initiators with the same problem. I think we'll end up just using software in the end as I've read there's lots of issues with HBAs.

Lanz · 10 Aug 2011 at 13:41

MPIO is standard Round Robin on the Vmware side, but we have tested with fixed paths and its still the same.

VMware support now says we need to contact our storage vender, we are currently working with HP on a solution, but not getting very far.

Lanz · 20 Oct 2011 at 22:09

We are still testing, have upgraded all to SAN/IQ 9.5 and the problem still exists.

Your ALB should failover fine though. Flow Control (AND) jumbo frames can cause problems on some cheaper switches, its generally recommended to turn off Jumbo as it offers little or no improvement.