Virtual (VMware) Guru's wanted!

Knubje · 23 Nov 2010 at 16:14

Hi,

I am looking at studying for a VCP soon and am looking to further my career in virtual systems.

I wondering who currently has a career working with virtualisation and primarily VMware based platforms. If possible I would like to take a little of their time to ask a few questions and their opinions- off the OcUK record of course. Any management experience would be even better.

If anyone fits that bill, please let me know if I can drop you an email in trust (alternatively drop me an email to my trust, I check it frequently).

Cheers guys, happy virtualising

m4cc45 · 23 Nov 2010 at 19:00

Hi,

I'm currently a VCP3 and VCP4 and deal predominently with ESX for the past two-three years. No managerial experience though and I don't just deal with ESX I deal with everything that goes on top of it as well! If I can help then please give me a shout - e-mails in trust!

Edit: After reading the below posts I guess I should say what we have - basically 14 hosts running around 350 servers (quite beefy spec'd servers 4 x 6 cores / 128GB RAM) backups done by Veeam Sure Backup storage is done by IBM SANS replicating to the DR site. I'm a one man team really but the other guys are good with VMware as well. Currently migrating from ESX to ESXi due to VMware moving over to it (i.e. not going to be producing further updates for it).

Also looking at VDI but maybe Citrix instead of VMware View.

M.

Terrier_Jimlad · 23 Nov 2010 at 19:03

I was part of a 3 man team that setup our VMWare environment. Running 8 very beefy specced ESX servers, all luns on VMRecovery on our two SANs. We look a after the resources and all the Windows servers that sit on them, which is about 85 at the moment

As m4cc45 above, drop me a line if I can be of assistance

Little_Crow · 23 Nov 2010 at 20:12

I have been using ESX from 3.5 onwards so about 2-3 years.

I helped setup and now admin 12 ESX hosts split into 2 clusters on 2 sites, the back end runs on HP EVA 6400's with Continuous Access replicating them.
We use VMWare site recovery manager to failover between the 2 sites (Haven't had to do it for a real disaster recovery situation, but I'm confident) and update manager to keep our ESX hosts updated.

Backups are done using vRanger, and put to tape using CommVault

Like the other guys I have to run the stuff on top of them too, which is a nice mix of Windows and Netware servers.

Other VMWare stuff flying my way is VDI that my boss is suddenly hugely keen on.

I'd advise posting any questions in here as it's always good to have a good exchange of views, and I'd like to think that we're somewhat more mature than the graphics card forum!

Knubje · 24 Nov 2010 at 08:58

Wow, thanks for all the replies gents. I appreciate it a great deal! Essentially I am at a potential crossroads in my career. I've been a system admin for nearly 3 years at a respectable employer, all be it small/medium business.

In that time I've set up 3 hosts currently running vSphere 4.0 and have pretty much fully virtualised my employers current business critical services (Exchange, SQL, File, Print, Telecoms, Web, etc).

Essentially, I am wondering what you would say if you had to prove that your respective virtual solutions were resilient. It is a question posed to me at the moment and whilst I have a good idea of my response, I wonder what you would all say?

There is much to talk about when making sure the platform you have in place is resilient and robust enough to deal with any issues that might arise.

Any thoughts or discussion on the matter would be really appreciated!

Cheers!

m4cc45 · 24 Nov 2010 at 13:21

I guess show that you use HA / DRS (if you do) getting White Papers on it from VMware (who normally have 'executive' kids speak language for the execs and lots of graphs which execs love) would be where I would start.

I would also find some recommendations (i.e. SAN replication to the DR site) to show that you've considered everything whilst also showing that you are not daft enough to say that your current solution will never have problems. Perhaps suggesting you could add new hosts and keep the key systems apart (so you could have two DC's always on seperate hosts) which would go as far as possible to alleviate any concerns most people would have.

M.

Knubje · 24 Nov 2010 at 16:04

Thanks M4cc45, I was thinking of going down your route.

We do have DRS plans and have actually acted them out for real as a complete off-site test. It was very successful which is always a plus. (One of the reasons I love VMware).

We do use HA to the point of having production machines replicated every few hours or so to other hosts/SAN. This gives us opportunity to fail over entire Hosts or individual VM's if needed. Everything is currently spread across 3 hosts, 3 SANS and 2 FC switches.

We have other backup/restore plans for VM's and data which is always considered. (Daily differential backups of VMs, Monthly full copies, etc, etc). Data it sent offsite etc.

I have even gone down the lines of rack placement and power distribution from several different UPS to the PSUs in the Host. To negate the impact that power failure might cause.

In terms of Windows domain we have our Exchange sat on different Host/SAN to our SQL sever. All production machines split across 2 hosts (with one as a complete backup, currently running a test environment). We have configured SANs and Hosts with guidance from white papers to ensure that different LUNS are used for Database files and log files. Our DC is on one host and our secondary DC is on another.

I'd like to think that I have covered most bases and whilst I'm sure there is always something that could go wrong that might catch you out, there is no solution that never has problems. There comes an acceptable limit for most businesses where they will not be prepared to spend 10's of 1000's on paying for a solution that they might only ever use once in a blue moon.

In terms of resilience I think our systems are able to bounce back from whatever bad times might get thrown our way. Be it hardware failure, user error or catastrophic unforeseen circumstances.

Getting it all down on paper to show you've thought of as many possible angles is quite an extensive job!

Shaz]sigh[ · 24 Nov 2010 at 22:00

I'm a virt person.

Enterprise design and implementations with heavy storage integration.

Got my VCAP-DCA in December and hoping to sign up for VCDX4 for defence sometime next year (VMworld Europe would be awesome).

I also do other hypervisors and other sorts of virt

Regarding:

Essentially, I am wondering what you would say if you had to prove that your respective virtual solutions were resilient. It is a question posed to me at the moment and whilst I have a good idea of my response, I wonder what you would all say?

Depends what resiliency they want? Are they wanting HA? Or business continuity? Or DR?

Shaz]sigh[ · 24 Nov 2010 at 22:08

Knubje said:
Thanks M4cc45, I was thinking of going down your route.

We do have DRS plans and have actually acted them out for real as a complete off-site test. It was very successful which is always a plus. (One of the reasons I love VMware).

We do use HA to the point of having production machines replicated every few hours or so to other hosts/SAN. This gives us opportunity to fail over entire Hosts or individual VM's if needed. Everything is currently spread across 3 hosts, 3 SANS and 2 FC switches.

Confuddled by this. DRS is a resource scheduler/black box of magic, not a plan? It's essentially a collection of hosts placed in to a virtual resource pool that automagically can balance load/do funkiness dependant on what you plug in to it (HA/FT/DPM etc).

Also, HA's sole function is to restart VMs should they become unavailable/unresponsive. It's based on Legato AAM/EMC AutoStart.

Skype is on the left if you need anything in particular.

oddjob62 · 25 Nov 2010 at 00:22

Shaz]sigh[;17861840 said:
Confuddled by this. DRS is a resource scheduler/black box of magic, not a plan?

I'm guessing he means "DR" plan and has recently been talking about Vmware features too much

Knubje · 25 Nov 2010 at 08:52

oddjob62 said:
I'm guessing he means "DR" plan and has recently been talking about Vmware features too much

Yeah this is true!

My acronyms are going out of the window as we use DRS as DR here, don't ask why. Anyway, in terms of resilience and building a resilient virtual platform I would imagine proof of covering all/as many bases as you can.

Everything from disaster recovery to high availability would need to be covered. Has anyone any more thoughts on the matter and what makes a virtual platform resilient?

m4cc45 · 25 Nov 2010 at 09:49

Well reading what you have put my initial concers would be that you're not running ESX HA.

If you have 3 hosts connected to 3 SANs (I'm assuming each host is connected to it's own SAN - unless you have several HBA's installed) then I would reconfigure it.

I would have the 3 hosts having access to the SANs so that you can use HA whereby the Virtual Machine moves if you take a host down / it runs out of resources / one of the host fails / etc. the reason for doing this is it makes patching and upgrading ESX a lot easier. The downside is licence costs.

M.

Knubje · 25 Nov 2010 at 13:50

Sorry I am obviously not making myself clear.

East host has multiple HBA's which are connected to different Brocade FC Switches. Each switch then in turns connects to SANs which have multiple fibre channel connections. The idea being if a switch fails, they can still communicate. If a host fails, the other hosts can read the data store of the failed one and run the machines that are down. If a SAN fails, the other SANS have replications of the VM's on the one that has gone down.

Overall, it allows us to replicate VM's to other storage and hosts and bring them up in the event of something failing on the original. We aren't using vMotion to do this but a manual process. It takes less than 5 minutes which according to our business is an acceptable time frame for a loss of service. Saves them buying vMotion I guess.

Am I going down the right lines?

What is the benefit of DRS not as in disaster recovery scenario but distributed resource scheduler. How does it work and where is it applicable?

m4cc45 · 26 Nov 2010 at 07:50

I'd be using vMotion and automating as much as possible. There are going to be times when you need to migrate of the hosts and nobody knows what to do. It seems you have spent a lot of cash on getting multiple HBA's, etc. but to not by a licence for vMotion seems strange.

DRS is very good at what it does, especially in a large environment where certain VM's need more resources.

It's hard to say without actually knowing how many VM's, resources available vs resources used / etc. to say if it would be good for your environment but it's great to have something that levels the VM's across multiple hosts so you get the best performance. It works, if I recall correctly, purely on the CPU utilisation and will balance that across, in your case, the 3 hosts. It also works by when you enable maintenance mode it will migrate them off which is great if you schedule a patch install, etc.

M.

ThorpedoUK · 26 Nov 2010 at 21:57

Agree with the above, takes 5mins during business hours, at night at 3am when everybody is asleep it will take 5 hours and 5 mins.

Get vmotion, or better still get rid of VMware and take on Citrix XenServer, it has greater performance (especially on Windows hosts) and gives you host migration and XenServer pools (clustering) for free.

HA failover is about £3k.

I've used both and am not impressed with vSphere performance. XenServer in a DC environment.

iaind · 26 Nov 2010 at 22:47

I'm not really sure how DRS could be used for DR - surely you want HA, FT or SRM for that?

ThorpedoUK · 26 Nov 2010 at 22:53

Got to be honest, I always understood DRS as a resource manager, if one hypervisor is running out of CPU resource whilst its clustered neighbour is idle, it will move the heaviest VM to the quiet neighbour to balance the utilisation

good point

iaind · 26 Nov 2010 at 22:58

That's exactly what it does. It also aids placing newly provisioned VMs onto suitable hosts and learns trends over time too.

No idea how you could even begin to use it for DR

ThorpedoUK · 26 Nov 2010 at 23:03

lol

when we first setup vmware it was on version 3.5 (esx) and what made it really difficult for ha in our environment was SITE A to SITE B (DR) SAN's, sure vmotion // drs and HA worked great on a single SAN, but add two sans with a leased line carrying the snapmirror traffic and ha was a major pita because there was not the tools available now to automatically break the mirror, put the luns online, rescan the fiber // iscsi targets in vmware and do the import

tbh it was a major pita and scripted to hell and back

i dont miss those days!! way better tools out now though to automize all of this

Shaz]sigh[ · 27 Nov 2010 at 19:34

DRS can be used in a BCP situation with stretched L2 or Cisco OTV etc. HA would be used in DR with stretched L2/OTV again.

Both solutions would need sync mirroring or something like EMC VPLEX.