4Gb fibre channel vs iSCSI over 10GbE

RSR · 24 Apr 2013 at 20:13

That's ok, I will try and help out if I can.

I see you say the LSI card is configured in a RAID 0. This isn't going to help as ZFS by its very nature likes to have raw access to the disks. Does the RAID controller support JBOD? I have a LSI 9266-4i and I know that doesn't support JBOD. Once the disks have been present directly to Solaris you are then able to apply the chosen RAID level on them i.e RAID 0, 1, 1+0, Z etc...

Also about the hard drives them self's, are they a 4K drive or 512b drive? As I know there is a performance impact when the incorrect ashift value has been configured. Solaris 11 is meant to correctly detect this but in my experience it doesn't so you have misaligned disks which also impact on performance. The only way I know of to correct that problem is to edit a few files in Solaris 11 or run Open Indiana where you can force the ashift value when creating the zpool.

Have you ran any benchmarks within Solaris11, to give a idea of performance within Solaris? i.e pfexec, bonnie++, etc?

DRZ · 25 Apr 2013 at 10:38

Just to throw some enterprise-class real world numbers into the mix here:

We have a SQL DB on 96 600GB SAS spindles and when we're running our backups, we're hitting about 6,000 IOPS. Peak throughput is about 300MB/sec but that's mainly because nothing lasts long enough to generate a spike within the resolution of the logging...

24 3TB SATA spindles deliver about 1,500 IOPS. Peak throughput about 180MB/sec but a very very different traffic profile of course.

I'm going to set up something where I can safely run CrystalMark without impacting anything. I have a box with 24 1TB drives doing nothing at the moment and another 24 3TB shelf on the way so I'll have some kit to experiment with

RSR · 25 Apr 2013 at 11:38

following on from what you have posted.

I have installed a number of Dell Equallogic's (PS6100XV) for number of customers over the past few months, when the device is fully populated (24 drives 15k). I have been witnessing results around 2500 / 3000 IOPS but as these devices feature a extra caching layer I have had results of over 18000 IOPS from them. However, it terms of raw R/W throughput I have struggled to get over 180MB/s from them.

Also one other thing that's worth mentioning, specifically when using iSCSI when using Cisco 3750X switches we are seeing a number of problems with errors and retransmits due to the buffer size on the switches not being large enough. The only work around for this is to resize the buffer size or run the larger Cisco switches.

DRZ · 25 Apr 2013 at 14:30

Oh yeah, those are actual disk IOPS I've posted. We have FlashCache modules in our controllers, one of which serves 5TB of Windows 7 Virtual Desktops entirely from cache. Seriously speedy and pretty much makes boot storms a thing of the past for us. It just wouldn't be fair to give throughput figures from cache

I try and avoid iSCSI wherever possible for those sorts of reasons. I just can't bring myself to trust SCSI over a lossy protocol, especially having seen what happens when a SCSI bus (or FCAL) gets so busy it starts dropping frames...

I've pushed very hard for FCoE where possible for all block storage requirements because it makes the most sense, especially when the alternatives are either iSCSI or running an FC fabric alongside the DC LAN.

Unfortunately, people who understand both networking and storage in enough depth to properly run a converged infrastructure are in short supply.

RSR · 25 Apr 2013 at 15:27

DRZ said:
I try and avoid iSCSI wherever possible for those sorts of reasons. I just can't bring myself to trust SCSI over a lossy protocol, especially having seen what happens when a SCSI bus (or FCAL) gets so busy it starts dropping frames...

I agree, I really push for FC but normally due to budget constraints from the client it normally gets downgraded to iSCSI. Also I have also seen in the past and I have one case currently which is open with both Dell and MS where iSCSI is just falling flat on its face as it just can't cope. However, that said but the amount of t-shooting and engineering time to look at problems like this the cost often surpasses the cost of going FC in the first place. :confused:

Unfortunately, people who understand both networking and storage in enough depth to properly run a converged infrastructure are in short supply.

This is a area which I am looking into specializing in, its also an area personal interest for me as well. All being well, I hope to be sitting my NetApp certifications later this year.

We have FlashCache modules in our controllers, one of which serves 5TB of Windows 7 Virtual Desktops entirely from cache. Seriously speedy and pretty much makes boot storms a thing of the past for us. It just wouldn't be fair to give throughput figures from cache

We use FlashCache modules in a number of SANS and the difference it makes is night and day. Its a highly recommended IMO.

Appleby · 26 Apr 2013 at 15:07

[RXP]Andy;24175616 said:
That's ok, I will try and help out if I can.

I see you say the LSI card is configured in a RAID 0. This isn't going to help as ZFS by its very nature likes to have raw access to the disks. Does the RAID controller support JBOD? I have a LSI 9266-4i and I know that doesn't support JBOD. Once the disks have been present directly to Solaris you are then able to apply the chosen RAID level on them i.e RAID 0, 1, 1+0, Z etc...

So it turns out my raid controller will not do JBOD so the best I can do at the moment is set each disk as its own raid 0 and do it that way. I will have to try and get a 4/8 port sata card at some point.

[RXP]Andy;24175616 said:
Also about the hard drives them self's, are they a 4K drive or 512b drive? As I know there is a performance impact when the incorrect ashift value has been configured. Solaris 11 is meant to correctly detect this but in my experience it doesn't so you have misaligned disks which also impact on performance. The only way I know of to correct that problem is to edit a few files in Solaris 11 or run Open Indiana where you can force the ashift value when creating the zpool.

I'm pretty sure they are 512b drives. I don't really know about shift values so I shall do some more research. I'll come back when I've done more research and made a little more progress!

RSR · 28 Apr 2013 at 19:29

I've now moved back to a illumos based storage system, as iSCSI was becoming some what of a pain to use moving large amounts of data around. I have chosen Open Indiana (oi_151a7) to use evaluate as its based on the illumos kernel, which seems to have a lot of open source support going forward. Also the main reason from moving away from Solaris 11.1 / Nexenstor, was the poor performance.

So my current setup is as follows:

6 x 2 TB Seagate Drives in RAIDZ

Code:

root@san01:~# zpool status vol_data
  pool: vol_data
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        vol_data                   ONLINE       0     0     0
          raidz1-0                 ONLINE       0     0     0
            c4t5000C5004E31AC95d0  ONLINE       0     0     0
            c4t5000C50051D8D402d0  ONLINE       0     0     0
            c4t5000C50051E4E37Ad0  ONLINE       0     0     0
            c4t5000C50051E51DD0d0  ONLINE       0     0     0
            c4t5000C5005CB67BACd0  ONLINE       0     0     0

3 x Intel 335 240GB (I have removed one of the SSDs at the moment) which are presented as a raw device via Comstar. Also I have created 2 x 400GB LUNs on the RAIDZ volume for general usage.

Code:

root@san01:~# sbdadm list-lu

Found 5 LU(s)

              GUID                    DATA SIZE           SOURCE
--------------------------------  -------------------  ----------------
600144f0d46d42000000517ce70f0003  240057344000         /dev/rdsk/c4t50015178F369449Ad0
600144f0d46d42000000517ce71a0004  240057344000         /dev/rdsk/c4t5001517803D75FC1d0
600144f0d46d42000000517ce7240005  240057344000         /dev/rdsk/c4t5001517803DB359Ad0
600144f0d46d42000000517ce8f40006  429496729600         /vol_data/vmware_lun3
600144f0d46d42000000517ce9130007  429496729600         /vol_data/vmware_lun4

This presents to VMware via FC using a Qlogic 2462 HBA to the following setup:

VMware sDRS Setup:

The initial testing has been coming back very favourable and most importantly are showing a marked improvement over Solaris 11.1 / Nexenstor. I will post the benchmarks and complete write-up once I have concluded the testing.

I also have OmniOS and Nexenstor 4 in VMs at the moment, which I am doing load and performance testing on. So I am looking forward to the results.

Shad · 28 Apr 2013 at 22:10

Looking good. Are you doing anything specific for L2ARC and/or ZIL?

RSR · 29 Apr 2013 at 08:40

Shad said:
Looking good. Are you doing anything specific for L2ARC and/or ZIL?

Not really to be honest, I am not using the L2ARC and ZIL is on its default settings.

This is a quick test:

CrystalDiskMark

Code:

-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   261.490 MB/s
          Sequential Write :   233.017 MB/s
         Random Read 512KB :   231.278 MB/s
        Random Write 512KB :   215.771 MB/s
    Random Read 4KB (QD=1) :    14.421 MB/s [  3520.8 IOPS]
   Random Write 4KB (QD=1) :    20.470 MB/s [  4997.5 IOPS]
   Random Read 4KB (QD=32) :   193.227 MB/s [ 47174.5 IOPS]
  Random Write 4KB (QD=32) :   193.456 MB/s [ 47230.6 IOPS]

  Test : 1000 MB [C: 42.5% (21.1/49.7 GB)] (x5)
  Date : 2013/04/29 8:31:13
    OS : Windows NT 6.2 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

IO Meter

Code:

'Target Type	Target Name	Access Specification Name	# Managers	# Workers	# Disks	IOps
ALL	All	4K; 100% Read; 0% random	1	1	1	60226.54415
MANAGER	VM-DC02	4K; 100% Read; 0% random		1	1	60226.54415
PROCESSOR	CPU 0					
PROCESSOR	CPU 1					
WORKER	Worker 1	4K; 100% Read; 0% random			1	60226.54415
DISK	C:					60226.54415
'Time Stamp						
2013-04-29 09:32:09:876						
'Access specifications						
'Access specification name	default assignment					
16K; 100% Read; 0% random	0					
'size	% of size	% reads	% random	delay	burst	align
16384	100	100	0	0	1	0
'End access specifications						
'Results						
'Target Type	Target Name	Access Specification Name	# Managers	# Workers	# Disks	IOps
ALL	All	16K; 100% Read; 0% random	1	1	1	23382.31685
MANAGER	VM-DC02	16K; 100% Read; 0% random		1	1	23382.31685
PROCESSOR	CPU 0					
PROCESSOR	CPU 1					
WORKER	Worker 1	16K; 100% Read; 0% random			1	23382.31685
DISK	C:					23382.31685
'Time Stamp						
2013-04-29 09:37:24:906						
'Access specifications						
'Access specification name	default assignment					
512B; 100% Read; 0% random	0					
'size	% of size	% reads	% random	delay	burst	align
512	100	100	0	0	1	0
'End access specifications						
'Results						
'Target Type	Target Name	Access Specification Name	# Managers	# Workers	# Disks	IOps
ALL	All	512B; 100% Read; 0% random	1	1	1	71415.36936
MANAGER	VM-DC02	512B; 100% Read; 0% random		1	1	71415.36936
PROCESSOR	CPU 0					
PROCESSOR	CPU 1					
WORKER	Worker 1	512B; 100% Read; 0% random			1	71415.36936
DISK	C:					71415.36936

DRZ · 29 Apr 2013 at 12:33

What drives were you using in the figures above?

I just ran a quick test against 24x 3TB SATA disks, presented via iSCSI to a VM using the MS iSCSI initiator and with a gigabit vNIC (e1000).

Code:

-----------------------------------------------------------------------
CrystalDiskMark 3.0.2 x64 (C) 2007-2013 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   104.284 MB/s
          Sequential Write :   133.730 MB/s
         Random Read 512KB :    97.949 MB/s
        Random Write 512KB :    98.922 MB/s
    Random Read 4KB (QD=1) :     1.067 MB/s [   260.5 IOPS]
   Random Write 4KB (QD=1) :     7.751 MB/s [  1892.4 IOPS]
   Random Read 4KB (QD=32) :    30.949 MB/s [  7556.0 IOPS]
  Random Write 4KB (QD=32) :    61.210 MB/s [ 14943.9 IOPS]

  Test : 1000 MB [G: 84.5% (3462.0/4095.9 GB)] (x5)
  Date : 2013/04/29 12:28:37
    OS : Windows Server 2008 R2 Server Standard Edition (full installation) SP1 [6.1 Build 7601] (x64)

I was watching resource monitor so I know that the test hit the limits of the NIC during the sequential test but that doesn't go the whole way to explaining the disparity in the numbers (although the VMWare host and the 24-disk array are doing other things as well).

Interesting...

DRZ · 29 Apr 2013 at 12:39

Putting the exact figures into wmarow's calculator and I get pretty much exactly the same numbers as tested for the 4K Q=1 tests, so I'm not concerned about my results as such, just not expecting your results to be so much higher unless you're doing something special?

RSR · 29 Apr 2013 at 15:28

DRZ said:
What drives were you using in the figures above?

Those figures are from my Intel SSD array.

These are from the RAID-Z array which is running on 5 x Seagate Barracuda 2TB 7200RPM SATA 6Gb/s 64MB. However, the read figures will be a little high due to the Arc.

Code:

-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   340.060 MB/s
          Sequential Write :   314.321 MB/s
         Random Read 512KB :   312.904 MB/s
        Random Write 512KB :   291.995 MB/s
    Random Read 4KB (QD=1) :    25.927 MB/s [  6329.8 IOPS]
   Random Write 4KB (QD=1) :    17.780 MB/s [  4340.9 IOPS]
   Random Read 4KB (QD=32) :   247.915 MB/s [ 60526.1 IOPS]
  Random Write 4KB (QD=32) :    19.424 MB/s [  4742.2 IOPS]

  Test : 1000 MB [C: 35.7% (35.6/99.7 GB)] (x5)
  Date : 2013/04/29 15:15:03
    OS : Windows NT 6.2 Enterprise Edition [6.2 Build 9200] (x64)

Shad · 29 Apr 2013 at 18:11

DRZ, how come your read IOPS are so low given how many spindles you have there?

Here's my numbers from a few minutes ago:

Code:

-----------------------------------------------------------------------
CrystalDiskMark 3.0.1 x64 (C) 2007-2010 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]

           Sequential Read :   369.477 MB/s
          Sequential Write :    68.178 MB/s
         Random Read 512KB :   347.141 MB/s
        Random Write 512KB :    62.862 MB/s
    Random Read 4KB (QD=1) :    30.891 MB/s [  7541.6 IOPS]
   Random Write 4KB (QD=1) :    23.421 MB/s [  5718.1 IOPS]
   Random Read 4KB (QD=32) :   236.995 MB/s [ 57860.1 IOPS]
  Random Write 4KB (QD=32) :    32.337 MB/s [  7894.8 IOPS]

  Test : 1000 MB [C: 30.3% (15.0/49.7 GB)] (x5)
  Date : 2013/04/29 17:54:21
    OS : Windows NT 6.2 Server Standard Edition (full installation) [6.2 Build 9200] (x64)

Andy, I'm using the same 6x2TB Seagate drives as you, but in RAID10 instead. I actually changed from RAIDZ1 to this config to improve the write performance, which has been a success for actual file transfers (I've seen 140MB/s seq write on the SAN server during some file copies, which at the time I classed as a success!) but the benchmark isn't showing that at all now. Out of interest, what spec is your server? Any ideas what's holding me back (if anything)? There are 11 VMs running on the SAN at the moment, but they're not that busy to account for the difference. I wondered if this could be due to not having a dedicated ZIL drive, but I'm not sure if seq writes would use it anyway?

My server is a Poweredge 2950, 2x dual core 3.0GHz Xeons, 16GB RAM, SAS 6 controller, QLogic 2460 HBA. OmniOS and Comstar with Napp-It.

DRZ · 29 Apr 2013 at 18:31

Given the figures stack exactly with the calculations I guess I'm going with the fact there's 3TB per drive having a lot to do with it.

RSR · 29 Apr 2013 at 19:52

Shad said:
what spec is your server? Any ideas what's holding me back (if anything)? There are 11 VMs running on the SAN at the moment, but they're not that busy to account for the difference. I wondered if this could be due to not having a dedicated ZIL drive, but I'm not sure if seq writes would use it anyway?

My server is a Poweredge 2950, 2x dual core 3.0GHz Xeons, 16GB RAM, SAS 6 controller, QLogic 2460 HBA. OmniOS and Comstar with Napp-It.

I am running the following spec currently:

Asus P8B-M Mainboard
Intel Xeon E1230 CPU
16GB DDR3 ECC RAM
LSI 9207-8i SAS HBA
Intel RES2SV240 SAS Expander
Qlogic 2462 4GB HBA
320GB WD Drive OS
3 x Intel 335 SSD 240GB
5 x Seagate Barracuda 2TB

I am not 100% as this time I have just used out the box defaults which seem to be working really well. :confused:

DRZ · 23 May 2013 at 10:27

Thought you might find this interesting - the difference between window tuning on and off:

Deleted member 138126 · 23 May 2013 at 11:28

Interesting... it's a global setting though, so will impact data connections negatively, (I'm guessing) high-latency connections in particular? Isn't there a way of overriding the window size for the iSCSI ports?

RSR · 28 May 2013 at 16:59

That's a pretty good post. I have often wondered about the above, but I've not had time to test it as of yet.

casual_eze · 30 May 2013 at 09:58

you should be able to tune the TCP window size on the iSCSI device itself. This avoids doing it per host and affecting all the connections.
Tuning the TCP window size can benefit local performance (use lower window size ~ 8-16kb) and also when replicating over a high latency WAN (use a higher window size ~ 16MB!).

RSR · 28 Aug 2013 at 10:26

Ive been messing around with VMware and FC and I am now starting to see so impressive results:

4Gb fibre channel vs iSCSI over 10GbE

Deleted member 138126

Deleted member 138126