netapp nfs benchmark

anything I don't mind · 23 Jul 2013 at 11:54

The netapp 2240 and esxi 5.1 hosts are all set up.

The redundancy is fully working. I can turn off a storage switch and do a netapp controller takeover and everything continues too work ok.

There is still one outstanding issue that we can't find any help on. Im probably going to call netapp about it and see what they say. But wondering if anyone has any insight.

The old exchange server on old infrastructure uses iscsi through windows to connect to the old netapp. So we have to replicate this setup on the new kit. I have setup iscsi with two nics on a vswitch. Each nic goes in to its own storage switch. Each netapp controller has two gigabit nic one going in to each switch dedicated solely to iscsi. Iscsi is on its own vlan and jumbo frames is enabled. I set up mpio in windows to work with multiple paths and it works perfectly. As i said can turn off switches or do a takeover and the iscsi storage stays connected without any issues.

The issues is latency, when we run a icmp from the windows guests to the iscsi ips on the netapp. We get a 1ms most of the time but we get a lot of high latency spikes, 30-70. The old netapp was a solid 1 or 2ms the whole time. This has us very concerned. Not sure what could be causing it as we have short cables going from the esx host to storage switches (cisco 2960) and from there in to the netapp, its completely isolated.

Any ideas what we could try?

I did an atto benchmark on an disk added through nfs.

We also did a jetstress on the iscsi volumes and doing other tests. Any comments on the atto benchmark? it almost looks like its being limited by 1gigabit, but it has 2gigabit for nfs so i am not sure why its stopping at 1, probably cause its multipath and not lacp 2gigabit?

For the nfs we have 2xgb nic on each controller combined in to a vif and on its own vlan. Then we have one port group going to two nic, each nic goes to its own switch.

Wizzkidy · 23 Jul 2013 at 12:42

groen said:
The netapp 2240 and esxi 5.1 hosts are all set up.

The redundancy is fully working. I can turn off a storage switch and do a netapp controller takeover and everything continues too work ok.

There is still one outstanding issue that we can't find any help on. Im probably going to call netapp about it and see what they say. But wondering if anyone has any insight.

The old exchange server on old infrastructure uses iscsi through windows to connect to the old netapp. So we have to replicate this setup on the new kit. I have setup iscsi with two nics on a vswitch. Each nic goes in to its own storage switch. Each netapp controller has two gigabit nic one going in to each switch dedicated solely to iscsi. Iscsi is on its own vlan and jumbo frames is enabled. I set up mpio in windows to work with multiple paths and it works perfectly. As i said can turn off switches or do a takeover and the iscsi storage stays connected without any issues.

The issues is latency, when we run a icmp from the windows guests to the iscsi ips on the netapp. We get a 1ms most of the time but we get a lot of high latency spikes, 30-70. The old netapp was a solid 1 or 2ms the whole time. This has us very concerned. Not sure what could be causing it as we have short cables going from the esx host to storage switches (cisco 2960) and from there in to the netapp, its completely isolated.

Any ideas what we could try?

I did an atto benchmark on an disk added through nfs.

We also did a jetstress on the iscsi volumes and doing other tests. Any comments on the atto benchmark? it almost looks like its being limited by 1gigabit, but it has 2gigabit for nfs so i am not sure why its stopping at 1, probably cause its multipath and not lacp 2gigabit?

For the nfs we have 2xgb nic on each controller combined in to a vif and on its own vlan. Then we have one port group going to two nic, each nic goes to its own switch.

How is the switch ports setup? are they trunked? It is being limited to 1Gb so I would check switch config

anything I don't mind · 23 Jul 2013 at 13:18

the iscsi port config on the switch:

switchport trunk allowed vlan 203
switchport mode trunk
spanning-tree portfast trunk

Now i am not a cisco expert by any means, i just picked it up as i went a long.

It seems to work ok apart fro the latency on iscsi.

the nfs is setup exactly the same but different vlan and for nfs we had to enable a trunk between the two switches for heatbeat, which is config:

switchport mode trunk

For nfs it should be two gigabit so not sure why its limited. hmm. Maybe its because it goes through two switches and can only take one route at a time?

Wizzkidy · 23 Jul 2013 at 13:39

groen said:
the iscsi port config on the switch:

switchport trunk allowed vlan 203
switchport mode trunk
spanning-tree portfast trunk

Now i am not a cisco expert by any means, i just picked it up as i went a long.

It seems to work ok apart fro the latency on iscsi.

the nfs is setup exactly the same but different vlan and for nfs we had to enable a trunk between the two switches for heatbeat, which is config:

switchport mode trunk

For nfs it should be two gigabit so not sure why its limited. hmm. Maybe its because it goes through two switches and can only take one route at a time?

What is your enviroment running? anything disk intensive?

I would run a sysstat -u on the netapp and look at the output for CPU and CP time

anything I don't mind · 23 Jul 2013 at 13:52

Its brand new and we are still in testing phase so there is nothing running on it what so ever apart from one or two datastores that were set up with a vc on it and a few guests for testing.

RSR · 23 Jul 2013 at 14:20

groen said:
the iscsi port config on the switch:

switchport trunk allowed vlan 203
switchport mode trunk
spanning-tree portfast trunk

Now i am not a cisco expert by any means, i just picked it up as i went a long.

It seems to work ok apart fro the latency on iscsi.

the nfs is setup exactly the same but different vlan and for nfs we had to enable a trunk between the two switches for heatbeat, which is config:

switchport mode trunk

For nfs it should be two gigabit so not sure why its limited. hmm. Maybe its because it goes through two switches and can only take one route at a time?

NFS via iSCSI?

Have you read the NetApp best practices Setup Guides? VMware Plugin installed?

erm? Is that just one vlan or multiple?

If its just one then you shouldn't really be using a trunk? Ive also noticed you are missing flow control.....

anything I don't mind · 23 Jul 2013 at 14:54

[RXP]Andy;24652881 said:
NFS via iSCSI?

Have you read the NetApp best practices Setup Guides? VMware Plugin installed?

erm? Is that just one vlan or multiple?

If its just one then you shouldn't really be using a trunk? Ive also noticed you are missing flow control.....

vmware plugin installed? on the netapp? didnt know there was one. I have not read best practice guide. We already have a production netapp in place so basically used the knoweldge we learned from that setup to create new set up on new netapp.

i am not sure what you mean by nfs via iscsi, is that even possible?

We have 4 nics per controller on the netapp, 2 on each controller is dedicated to nfs and the other 2 to iscsi. We then have two cisco 2960. One of each port goes in to its own switch. ie from one controller one nfs and one iscsi goes in to one switch and same in other switch. The nfs requires the switches to be connected together not the iscsi.

We then set up nfs on a vlan and iscsi on different vlan. enabled jumbo frames and thats about it, i can paste the screenshots of netapp interfaces and vswitch config if it will help you understand.

lll look in to the flow control on cisco switches thanks for the tip and also look for the best practice guide.

Edit: its possible we are not licensed for the vmware plugin just looking in to it now. Finding information on netapp is by far the most troublesome i have come across so far. Cisco is good, you can find anything regarding cisco with a google search, netapp hides all documentation and communities behind login/password.

anything I don't mind · 23 Jul 2013 at 15:51

Ok ive installed the vmware plugin, just not the backup component and looking good so far, going to enable flow control on switches next and let you know if any improvements, thanks for the help

anything I don't mind · 23 Jul 2013 at 19:47

Yea that made no difference, ive tried all different types of configuration. I think its the switches, probably bought the wrong switches

for storage switches. Teach me to do more research, thing is i was not in charge of project, i was just asked casually which switches and 2960 are good switches etc. But I found an article that said there is a bug in the version of 2960 cisco os that the switches are on. So now i am going to try update the firmware to a new version.

Any idea and what switches i should have bought instead, probably what drz was talking about in the other thread. Maybe qlogic, this guy at other site used qlogic, but i thought cisco would be better.

I initially suggest FC after advice from someone on here and now the guy i work with said they should have listened to me and gone FC from the start.

It still works just latency goes occasionally spike to 31 or 46ms. But after enabling the flow control the esx ports for iscsi with:

flowcontrol receive desired

it improved a lot but still not 100% perfect.

That netapp plugin is nice but all it can realy do is apply recommended settings, not much else that i can see realy stands out that i can't do on the on command interface.

Edit: issue is the guy i work with is more of an exchange expert and he is leaving this week. So this is why i was trying to move the old exchange this way. But if he was not leaving would have just created exchange 2010 on new esxi and then moved mailboxes across and the vmstore in one maintenance window. This way we can go 2xgigabit lacp to each switch per controller and nfs only. So i am going to scrap iscsi and go with nfs only and try get the guy i work with to build exchange 2010 before he leaves. Ill test it for a few weeks and then do a full move over.

DRZ · 24 Jul 2013 at 06:15

Are you running Exchange inside VMware with the MS iSCSI initiator?

Forget the whole series of bad ideas and mistakes in this thread so far and get that fixed. Not only is it unsupported but it is a truly truly terrible idea.

anything I don't mind · 24 Jul 2013 at 15:29

I did not set up the infrastructure i just inherited it which is usually the case with my posts

This is the situation that i have to deal with and its all about find the best solution. Ideal world solutions is one thing but solutions in real world scenario with **** systems and other issues makes all that bit more complicated.

Turns out that we can't do the full nfs route that i want to do because:

a) we have no support on the old netapp and there is urgency to move all volumes to new infrastructure
b) we lack gigabit ports on the current production switches to run two esxi infrastructures at the same time.

This sort of rules out the option of migrating exchange to 2010 from old to new and forcing me in to situation where i have to duplicate the terrible windows iscsi setup that the old infrastructure has on the new setup.

But if you say it is unsupported and terrible idea then maybe its not even worth me updating teh firmware on these switches. Often the case with my threads is that i have to deal with **** set ups and convert them to new setups :/

Trying to convert that old config to one that uses isci through vmware would be problematic i have been told by my colleague who is more knowledgeable with exchange. This is why i am forced to duplicate the config on new hardware. I just don't see any alternative unless you have one ?

NFS

iscsi through windows

I am going to try and go full nfs and then migrate to exchange 2010 from old to new, ill have to deal with no support on old netapp and lack of gigabit ports.

anything I don't mind · 24 Jul 2013 at 18:31

I have updated the firmware on the switches to version 15 and re setup the switches with ideal config and then setup the iscsi mpio.

The latency issue is still there

But I ran another atto disk bench mark and by mistake I ran it on the C: drive which is a standard nfs vmware disk and this caused all the latency on the iscsi to drop to 1ms. Very odd.

Maybe it is just because it is so idle that it is not allocating any resources to icmp on the switches or something to that effect? ?

Here is a shot of the new atto benchmark of the iscsi since switch update, also note that multipathing is enabled on this benchmark which doubles performance. During the atto benchmark of e: drive the iscsi drive, the latency does improve a lot but not as much as it did when I did a benchmark on the c: or nfs drive.

I am sure that NFS at 2gigabit will still beat the iscsi mpio setup.

DRZ · 24 Jul 2013 at 23:23

Have you done the usual Windows TCP tuning to get the best out of iscsi?

It might be better for the short term to continue to bodge it but you really ought to step back from this and come at it from the point of view of starting again... if I was the client I'd be worried...

anything I don't mind · 25 Jul 2013 at 10:29

The current setup has been in place since 2008. Moving it all to a new netapp and esxi hardware is not that outrageous.

Currently they have been running nfs and iscsi (through windows to exchange) on 1x1gigabit. The storage and esxi are all going in to the main stacks with the client pcs.

So going to 2x1gigabit for iscsi and 2xgigabit (1 real) for nfs on isolated storage switches and going from 1tb to 5tb usable space is hardly a bodged job.

RSR · 25 Jul 2013 at 12:58

In this setup iSCSI, is the traffic all Layer 2 (flat subnet) or is there any routing going on??

Also what is the overall goal here? Throughput or IOPs?

VMware you will be limited to the initiators speed. As a example, you can have 2 x 1Gb NICs but you will still only get a 1Gb in terms of throughput (This is without NetApp / 3rd Party DSM installed) However, depending on your path policy you can set both paths to be active or leave it so that one is ob standby etc. which will help with IOPS.

This should have all been spec'ed out on the first draft of the design for the storage / VMware setup.

I don't know if you are also aware, vSphere doesn't support LACP on a VSS but it only supports it in a VDS.

anything I don't mind · 25 Jul 2013 at 14:00

We have narrowed the latency issue down to windows iscsi in vmware itself.

We have done various tests and that is our conclusion. I have finally convinced them to abandon iscsi in windows. Now we are going full nfs. Will be posting a 2gigabit atto benchmark later today.

I know that vmware does not support lacp without a vds. But we are only going to run lacp from the netapp to the storage switches and on the vmware just have two nics per port group. I think that should still give 2gigabit through put on nfs. Well ill soon see.

We are still building the "first draft" as you may have noticed i am not very experienced with enterprise storage. This is my first time setting up an netapp with esxi. Ive helped setup a p2000 with fc before. But that is about it. So we have a few weeks to test different configurations and find the best ones, i am currently in that process at the moment.

Still think netapp is overpriced.

a) management port is not even gigabit.
b) you can't modify the interfaces from the webgui or the command line you have to manually edit the rc file. For £30k some americans are sitting back having a laugh.

Even openfiler allows you to modify the interfaces from the webui and look at pfsense, no excuse to have such a cheap interface. Compared to the last java one it way better but still quite rubbish.

Definitely sitting back on a beach somewhere laughing, "Those fools bought our overpriced sans lol" http://www.theregister.co.uk/2010/05/27/netapp_fy2010/

RSR · 25 Jul 2013 at 14:48

groen said:
Still think netapp is overpriced.

a) management port is not even gigabit.
b) you can't modify the interfaces from the webgui or the command line you have to manually edit the rc file. For £30k some americans are sitting back having a laugh.

Even openfiler allows you to modify the interfaces from the webui and look at pfsense, no excuse to have such a cheap interface. Compared to the last java one it way better but still quite rubbish.

Definitely sitting back on a beach somewhere laughing, "Those fools bought our overpriced sans lol" http://www.theregister.co.uk/2010/05/27/netapp_fy2010/

Not sure if that's a serious post of not. :confused:

1. A management port doesn't need to be gigabit, as you are not passing production data over it. It does exactly what's its design to do.

2. Openfiler is not enterprise class level storage!

You'll often find that most enterprise level hardware is often best managed via CLI. Same with Cisco, Linux / Unix, Juniper etc...

That article is from 2010, so its a few years out of date. Yes, NetApp do make good profits as they sell enterprise level equipment that works.

RSR · 25 Jul 2013 at 14:57

I'd recommend you read the following:

Best Practices for Running VMware vSphere® on Network-Attached Storage (NAS)

VMware vSphere 5.x with Data ONTAP 7.3 and 8.x Recommendations

NetApp Storage Best Practices for VMware vSphere

As I think this may help.

anything I don't mind · 25 Jul 2013 at 14:57

Well netapp uses management ports for snap mirror. So it would make sense to make them gigabit and it would cost next to nothing to make them gigabit, they might as well just make it gigabit. There is no reason not to make it gigabit.

openfile is not enterprise storage, which is my whole point, is that you can change interfaces with ease but on netapp you have to mess around with editing the rc file because they are too lazy and cheap to make a proper gui for their overpriced hardware.

I have no problem managing through cli, but it should work when you do that, but it doesn't work. you have to use rdfile and notepad and wrfile then you get active and persistent mismatches, its just stupid.

Well thanks for your help. Just getting a bit annoyed with netapp interfaces changes, so ranting a bit. :/

edit: ok worked out how it works. the rc file is the persistent config and the gui and ifconfig is the active config. you can use the cli to modify the active config but not the persistant config. But you have to have the active and persistent match or you get complaints. Makes sense.

anything I don't mind · 25 Jul 2013 at 17:58

Ok I have done my testing and I see what you are saying about lacp. I enabled lacp on the netapp and the switches and finally got that working. But as the vmware does not support lacp the nfs is still limited to 1gigabit on the benchmark test.

I think though that the way vmware handles the multinic vmkernel ports it will use up the additional bandwidth when it is needed, ie if we allocate 4 gigabit nic to the nfs on vmware, benchmarks will always come back at 1gigabit. But when there is lots of load on the nfs it will use up more than 1gigabit. Is that correct?

As we are not licensed for a VDS not sure which option to pick. If we do use lacp netapp and switches, we lose switch redundancy without a stacking module.

So after all tests probably just going to go back to basics and do 2x nic in single mode with nfs, which will be 2x two nfs interfaces per controller. Then when it comes to setting up the datastores in vmware ill set one datastore up per nfs interface on the netapp. This way we can have 1x gigabit nfs for one exchange volume and 1x gigabit nfs for second exchange volume and then 1x gigabit for vm os store and 1x 1gigabit for DM file store. This way we can keep the redundancy and still utilize 4gigabit nfs bandwidth in total.

This way, no lacp, don't even need vlans on the storage switches and we can keep switch redundancy.