Help me organise my storage

Soldato
Joined
7 Jun 2003
Posts
16,147
Location
Gloucestershire
Here's the disks available to me:
8 x 15k 600GB SAS
16 x 3TB SATA
12 x 2TB SATA
8 x 500GB SATA - getting a bit old
8 x 240GB SSDs

4 x iSCSI Storage units, 2 of which will be used for virtual machines and 2 will be used for data storage and they're laid out like this:
2 x 12 bay
1 x JBOD off one of the above
1 x 16 bay which doesn't support the 3TB drives

One of the data storage units will be replicated backup only, the other will have up to 400-500 users hammering it at once, so long as storage space after RAID is above 1.5TB it's a considerable option. And finally the two boxes used for VMs will be in separate buildings connected by a 4Gb fibre trunk.

There's a total of 25 VMs going on them, the key servers involved are SCCM server, SQL server and exchange server, the rest are relatively low-mid load when it comes to IOPs.

I know in the grand scheme of things that's not a lot to go on but I wanted some rough suggestions really. I have an idea of how I would organise it but I've changed that 100 times so was after some other suggestions :)
 
Work out the IOPs for each service and then arrange them into fast and slow.
Fast for the production setup, slow for the replicated setup (it'll be functional just very slow if the first system goes down).

Are all the 'iscsi storage boxes' identical in terms of connectivity and performance?
When you say 1.5TB of space, how is that arranged? It's easy to get 1.5TB out of a RAID of 3TB disks, but not so easy with a RAID of 240GB SSDs (in RAID10 for instance)
How many drives does the JBOD chassis hold and can it do any sort of RAID? (I know you say JBOD)
Is caching of any sort available?

Work out where your load is going to be and put your fast storage in that direction.

And yes, you have been very vague. :)
 
Are all the 'iscsi storage boxes' identical in terms of connectivity and performance? - All have 4 host ports, the 16 bay is older than the other 12 bay units but performance wouldn't be too far off i'd imagine.
When you say 1.5TB of space, how is that arranged? It's easy to get 1.5TB out of a RAID of 3TB disks, but not so easy with a RAID of 240GB SSDs (in RAID10 for instance) - At the moment i have 2 x RAID6 arrays on both data units, one using 2TB disks the other using 3TB disks. I missed out this bit but i'd be after 2 x 1.5TB arrays (probably RAID10) on the data side that gets hammered.
How many drives does the JBOD chassis hold and can it do any sort of RAID? (I know you say JBOD) - 12 bay
Is caching of any sort available? - all of the units have 4Gb cache modules, assuming that's the information you were after there?

Work out where your load is going to be and put your fast storage in that direction.
Well i know which VMs get the most load, and i know that the data is currently hammered 8 hours a day, but i don't know how to improve it. This is trying to optimise an already in use system basically.

For example i currently have my SQL server holding all the SQL databases on the SSDs, while the OS for the SQL server is on one of the other arrays, but while this is nicely offloads the SQL load away from the OS datastores I'm still seeing I/O issues on those stores....mainly due to having so many VMs on SATA disks. Currently the SAS drives aren't in use, they're what we're going to be buying to help things.

I guess it was a bit hopeful for someone to pop in here and say "try arranging it like this" when there's so many variables to contend with :( keep asking me for more info though as like i say, I'm only after a rough guide as something to work with.....there's just too many variables to remember everything i'd need to put down in one post :)
 
http://www.wmarow.com/storage/strcalc.html

Do you have access to VMWare Capacity Planner (through a reseller or similar)? That should give you a good idea of what your I/O load is like and you can make a judgement from there.

I'd be nervous putting production VMWare load on any of those SATA disks, especially the 3TB drives. I have several arrays of 3TB disks between 12 and 24 disks in size (none RAID5 or 6) and all of them are what I would consider "slow", even when performing operations they are good at. Got to be the replica target and use the smaller drives for the user data (lots of sequential reads/writes)

8 spindles of SAS isn't a great deal either, nor is it a lot of capacity. SSD is going to be where your SQL will need to live but do you have enough capacity there?
 
http://www.wmarow.com/storage/strcalc.html

Do you have access to VMWare Capacity Planner (through a reseller or similar)? That should give you a good idea of what your I/O load is like and you can make a judgement from there.

I'd be nervous putting production VMWare load on any of those SATA disks, especially the 3TB drives. I have several arrays of 3TB disks between 12 and 24 disks in size (none RAID5 or 6) and all of them are what I would consider "slow", even when performing operations they are good at. Got to be the replica target and use the smaller drives for the user data (lots of sequential reads/writes)

8 spindles of SAS isn't a great deal either, nor is it a lot of capacity. SSD is going to be where your SQL will need to live but do you have enough capacity there?

I don't have access to capacity planner sadly :(

At the moment for vmware I have 4 arrays connected to all 4 hosts, each array has about 0.6-0.7TB used at the moment, so I'm not using a great deal of space for VMs, nor should i need much more (in fact 5 of my 32 VMs - i missed a load when i made the OP :p - will be going soon)

I'm hoping the 8 x SAS drives will become 16 x SAS drives in April, as i really don't like using these 3TB and 2TB SATA disks for my VMs, that said they're good enough for holding my DCs and non-critical/low load servers.

Just to outline what we're using right now:
30 permanently on VMs (2 are only used for testing) running on 4 x RAID6 arrays, each of 4 x SATA disks. It's problematic but the advise i was given when originally setting it up was poor and inaccurate, which is why I'm doing all this now.....that said, 2 of the arrays are coping fine, it's quite rare to see their latency jump above 20ms, but the other two are really struggling and causing problems.
 
Last edited:
It really sounds like you need to rearchitect your storage from the ground up. What's your budget like for 2013?

It probably won't cost you the earth to migrate that lot to some storage that is more suitable, more manageable and more recoverable.

A pair of NetApp 2240-2s, each half-populated with SAS and a shelf of SATA (again, half-populated with 3TB SATA) would give you all of the above and probably come in somewhere around £60-70k for the lot.

You'd also then be able to take advantage of SnapMirror for your replication management and, once you're buying SnapMirror, CIFS and iSCSI (or NFS, which is what I would do) you'll find it cheaper to buy the complete software bundle - giving you the complete SnapX range (SMVI, SMSQL, SM for Exchange, SnapRestore, SnapProtect etc etc), NDMP for backup speed boosts and so on.

EDIT:

Or just have one with SAS/SATA and the other just SATA as a replication target. Of course, the beauty of having the two the same is you can use SRM to protect your VMWare estate with no performance hit should you fail over.
 
I probably should have mentioned budget :p oh how i'd love to spend that much! :D

We're a large secondary school so budget is tiny (which is partially the reason for the el'cheapo solution) 1200 Students, 100-200 staff, 600 PCs and 100-150 laptops.

Two of our storage boxes were new last year, they're Eonstor DS S12E-G2140-4 as seen here: http://www.infortrend.com/uk/products/models/ESDS S12E-G2140
The JBOD unit is this one: http://www.infortrend.com/uk/products/models/ESDS S12S-J2000-G

And the older 16 bay is an A16E-G2130-4 as seen here: http://www.infortrend.com/us/products/models/ES A16E-G2130-4

All in all, very cheap boxes but i can't fault them for stability.
 
Looks like that JBOD shelf can be RAIDed by whatever controller it is connected to.

I'm not sure you can really do much to improve your situation beyond what it already looks like you are going to do, ie get your VMs onto SAS disks.

Out of interest, are you running your iSCSI across dedicated switching hardware (or hardware you know is up to the job/unstressed)? iSCSI behaves terribly when times get hard on the network. You might be losing out to dropped SCSI frames if things aren't too hot there.
 
Looks like that JBOD shelf can be RAIDed by whatever controller it is connected to.

I'm not sure you can really do much to improve your situation beyond what it already looks like you are going to do, ie get your VMs onto SAS disks.

Out of interest, are you running your iSCSI across dedicated switching hardware (or hardware you know is up to the job/unstressed)? iSCSI behaves terribly when times get hard on the network. You might be losing out to dropped SCSI frames if things aren't too hot there.

Yep connected to their own dedicated switches, I'm also using all 4 host ports.

I've got an iSCSI1 Vlan and an iSCSI2 Vlan, dedicated switch per Vlan and 2 host ports trunked to each switch, then 2 ports from the iSCSI switch to the core switch, then there's a 4Gb trunk to the other building holding the other storage units in the same configuration on both ends.

One thing i forgot to mention is that I'm using the JBOD unit for data and using the A16E for data at the other end. Would i see better performance for data by using the A16E over this side (the end that users connect to) and the JBOD for replicated storage at the other end? As i'd be using more drives in the A16E, the only issue being that i can't use the 3TB disks like i am in the JBOD, so i'd have to rejig some things. or am i clutching at straws there?
 
Last edited:
I probably should have mentioned budget :p oh how i'd love to spend that much! :D

We're a large secondary school so budget is tiny (which is partially the reason for the el'cheapo solution) 1200 Students, 100-200 staff, 600 PCs and 100-150 laptops.

Same as us, though slightly larger @ 2000 students ,250 staff, 900 PC's 200 Laptops.

However we have quite a few less servers, we virtualized all of out kit (10 servers) on 3 ESXi hosts and 1 EMC VNXe SAN.

VMs now include:

2 x DC
2 x Exchange
1 x WSUS
1 x App server
1 x Print Server
2 x Proxy's (1 ISA, 1 Smoothwall for public Wifi)
1 x SQL (SIMS/FMS)
1 x Webserver (LAMP)
1 x Bromcom Webfolder server

All on the 3 hosts, using ~50% of their 48GB RAM each.

VNXe SAN was ~£30k for 12 x 600GB 15k disks (configured in RAID10 for VMs) and 6 x 2TB 7.2k Nearline SAS (RAID 6) for file storage.

Granted we don't have a DR site however rebuilding all the infrastructure shouldn't take us past a day which the school is happy with. But we have been more than happy with the speed and such.
 
Restoring that in less than a day from tape? Including obtaining replacement hardware?

Best of luck meeting that RTO!
 
8 x 15k 600GB SAS - purely for VMs, 4.2Tb when in a RAID5 should be fine for all your VMs, split file server storage\exch DBs volumes off on to direct to VM iSCSI so you have your file\exchange servers running off of one of the big SATA iSCSI devices.

The rest? Use it for pr0n.
 
Same as us, though slightly larger @ 2000 students ,250 staff, 900 PC's 200 Laptops.

However we have quite a few less servers, we virtualized all of out kit (10 servers) on 3 ESXi hosts and 1 EMC VNXe SAN.

VMs now include:

2 x DC
2 x Exchange
1 x WSUS
1 x App server
1 x Print Server
2 x Proxy's (1 ISA, 1 Smoothwall for public Wifi)
1 x SQL (SIMS/FMS)
1 x Webserver (LAMP)
1 x Bromcom Webfolder server

All on the 3 hosts, using ~50% of their 48GB RAM each.

VNXe SAN was ~£30k for 12 x 600GB 15k disks (configured in RAID10 for VMs) and 6 x 2TB 7.2k Nearline SAS (RAID 6) for file storage.

Granted we don't have a DR site however rebuilding all the infrastructure shouldn't take us past a day which the school is happy with. But we have been more than happy with the speed and such.

9 of our VMs are DCs! We have a very split site, so each building has its own DC, admittedly 2 or 3 of those could probably be consolidated down now but it's having the time...

We've also got 3 proxies, 2 mail filters and 1 ISA which is all being consolidated down in April to two physical boxes (Palo Alto firewall + Whatever filtering solution we go down)

1 x SCCM/WDS
1 x exchange
1 x SQL Server
2 x Web servers
1 x App-V
1 x File Server
1 x Print server
1 x Door access control server
1 x Terminal server
1 x PAT / Asset management server (doesn't need to be on its own VM but one of our techs is a bit of a loose cannon and he's the one that works on it)
1 x linux gateway server

...i think that's everything, I'd love to be able to spend £30k on something, but sadly that's 50% of our budget :( We had to ask specially and go through weeks of meetings just to get 10k from outside of our budget in order to replace our dieing 4108gl core switch with an 8206zl.
 
Restoring that in less than a day from tape? Including obtaining replacement hardware?

Best of luck meeting that RTO!

Spare hardware is already available on site, just have nowhere to keep it online. Essential services can easily be restored in that time frame, especially now the days of VM snapshots have some about. :D

And its 4 Backup SAN across the site. Fire is the real issue with our setup, however the school look at if there is a fire in what is a central part of the building then it will not be open anyway. We do what we can with our money. Very easy to waste other peoples money if you are not careful.
 
9 of our VMs are DCs! We have a very split site, so each building has its own DC, admittedly 2 or 3 of those could probably be consolidated down now but it's having the time...

...i think that's everything, I'd love to be able to spend £30k on something, but sadly that's 50% of our budget :( We had to ask specially and go through weeks of meetings just to get 10k from outside of our budget in order to replace our dieing 4108gl core switch with an 8206zl.

Wow 9 DC's, do the sites have a link hence the need for local DC's or do you have a specific domain for each site?

We are lucky with our budget, the school is very understanding of what tech costs. We on roughly £120k for everything from Internet/Desktops/Server/Warranty/Network/VoIP/CCTV.

You're on Edugeek arn't you?
 
8 x 15k 600GB SAS - purely for VMs, 4.2Tb when in a RAID5 should be fine for all your VMs, split file server storage\exch DBs volumes off on to direct to VM iSCSI so you have your file\exchange servers running off of one of the big SATA iSCSI devices.

The rest? Use it for pr0n.

RAID 5? You can't be serious? 2.3TB useable in RAID10 is bountiful for his storage needs and won't kill him when his VMs write data. Do the maths on RAID5 and it makes sense in about 0% of use cases once you get out of 10-user sites.

Spare hardware is already available on site, just have nowhere to keep it online. Essential services can easily be restored in that time frame, especially now the days of VM snapshots have some about. :D

And its 4 Backup SAN across the site. Fire is the real issue with our setup, however the school look at if there is a fire in what is a central part of the building then it will not be open anyway. We do what we can with our money. Very easy to waste other peoples money if you are not careful.

Ahh, yeah - sounds much more feasible!

I've not got an unlimited budget (although compared to you guys it might feel like it!) so I know where you are coming from on that last point. Convincing people of the need to spend money on data protection is tough wherever you are though.
 
Wow 9 DC's, do the sites have a link hence the need for local DC's or do you have a specific domain for each site?

We are lucky with our budget, the school is very understanding of what tech costs. We on roughly £120k for everything from Internet/Desktops/Server/Warranty/Network/VoIP/CCTV.

You're on Edugeek arn't you?

£120k :( we get about £65k, though we don't do the CCTV, just everything else in your list :D not that the site team here ever spend a penny on our CCTV which is so happily falling apart lol

The 9 DCs does date back to pre-VM days where we did have a DC for each building which also acted as print server for the PCs on that Vlan and served mandatory profiles, we've just never consolidated them down, that said they do so little, just serving dns, dhcp, dfs namespaces and so on, that i guess i've never really thought about it since.

Yea I'm on edugeek, same name :)
 
RAID 5? You can't be serious? 2.3TB useable in RAID10 is bountiful for his storage needs and won't kill him when his VMs write data. Do the maths on RAID5 and it makes sense in about 0% of use cases once you get out of 10-user sites.


Ahh, yeah - sounds much more feasible!

I've not got an unlimited budget (although compared to you guys it might feel like it!) so I know where you are coming from on that last point. Convincing people of the need to spend money on data protection is tough wherever you are though.

Agreed RAID5 for VMs is a terrible idea! RAID10 all the way.

Yes it is hard telling someone to fork over x amount to burn yet more power and may do absolutely nothing for you. :o
 
I guess when it comes to redundancy and DR it's not so much about what you want from your system but by what your employer expects in the event of a disaster. We again did the cheap-ass route in order to obtain our online storage in two locations....waited until our old storage box was being replaced, kept using it as a backup and used the new one for live data :D
 
£120k :( we get about £65k, though we don't do the CCTV, just everything else in your list :D not that the site team here ever spend a penny on our CCTV which is so happily falling apart lol

The 9 DCs does date back to pre-VM days where we did have a DC for each building which also acted as print server for the PCs on that Vlan and served mandatory profiles, we've just never consolidated them down, that said they do so little, just serving dns, dhcp, dfs namespaces and so on, that i guess i've never really thought about it since.

Yea I'm on edugeek, same name :)

Same name too, I thought I'd seen you on there.

Back to you're OP,

1. The Backup replica put all of the slowest hardware for that. You are really aiming for storage capacity over performance.

2. Pick any other available hardware for non-comprise performance. 960GB of SSDs in RAID10 ;) The 15k disks for non critical server stuff (WSUS datastore, Applications etc.)

3. Rest for File Storage aim for reduncancy and capacity.

Although I'm sure you already know all of this :)
 
Back
Top Bottom