VMWare, Veeam and Offsite Backups?

ajf · 13 Sep 2013 at 08:47

For those using VMWare and Veeam backup, how are you handling offsite backups?
What hardware/software/media are you using?

Obviously Veeam is fine for quick recovery but we have been struggling to develop a good plan for offsite backups, either via tape or replication to another site.

Speed and available time are the main problems encountered due to the amount of files Veeam sees as 'changed'.

Hellsmk2 · 13 Sep 2013 at 11:09

Have installed Veeam for loads of customers for offsite replication.

It's a very wide topic. I find the absolute biggest bottleneck is the site link speed. 100mb would be the absolute lowest I'd recommend, and only if budget was very tight.

In terms of hardware offsite, again it varies wildly. We have some customers with a single ESX host with local storage (crap solution), others with multiple ESX hosts and decent network storage (NAS if just for backups, SAN for replication/ DR). If it's just for backup, QNAP do some decent NASs with decent amounts of space for a few grand (20+ TB).

Whatever you choose should extremely simple to set up though - but invest in a good link. Trying to explain to a customer that 10mb is not enough to replicate 20 - 30 servers in a 8 hour window is painful. I really would argue that 100mb is scrapping the barrel!

ajf · 13 Sep 2013 at 15:50

I have spoken to the guy involved in this for more info as it hasn't been something I have dealt with too much yet.
I was just testing the waters in case the answer was 'good luck'!

It is a bit more than simply the line speed at present as it is something we are still testing.
Even across the internal LAN we are finding it takes 6 to 7 hours for Veeam to backup around 150Gb to a NAS - a QNAP TurboNAS TS-869L.
The network is gigabit.

The setup is as follows:
Main Veeam server.
Data changes (incremental daily) pulled from SAN (Dell equalogic)
Compressed by Veeam.
There is then an additional server running a Veeam agent and the NAS is attached as local iSCSI.
The main Veeam server then pushes the compressed data to the agent to write to the NAS.

The one question that has come up is that 150Gb of daily changes seems an awful lot. There are around 250 networked users.
Total live data is 6Tb.

DRZ · 13 Sep 2013 at 18:44

You've got a problem somewhere if you're taking that long to back up 150GB!

Here's what we do to back up circa 20TB of VMs:

Veeam "controller" - running as a VM with 2 vCPUs and 16GB RAM
Veeam proxy - we did have multiple VM proxies but they are hideously destabilising! We've reverted to a single mega-box which has two quad-core Xeons and lots of RAM which gets the job done.
Veeam repository - this is a 3rd server which has local disk and a couple of iSCSI LUNs on a NetApp filer connected over 10GbE.

We get a full backup (an actual, proper full backup and not a synthetic full) done on a Friday night - takes until late on Saturday to complete, final size on disk is about 7TB after deduplication and compression.

Once that is done, we have a 96-slot LTO6 library which scoops up the .vbk files to tape. We keep two weeks of fulls on disk which covers us for the common restore window.

We then run nightly incrementals. We initially used reverse incrementals because that's very advantageous in a particular scenario we were dealing with, but on the whole it is detrimental because your daily tape incrementals are the size of the fulls. Manageable but not ideal - so we reverted to forward incrementals and our daily tape regime works well with that.

Our VMWare change rate isn't that great really - our incs are tiny (less than 500GB a week) so when you factor in the odd hiccup with CBT (which happens a lot more than I'd like...) we're easily hitting our window. I'm certain that if we ran any sort of virtualised file server that number would be drastically higher. As it is, speaking in this context, our 30TB of file server data has a change rate of circa 200GB a day and that is hit by maybe 500 users maximum. Depending on what your users are doing, 150GB a day doesn't seem too high to me.

Bear in mind that we go to tape during the day because we're done with our backups overnight in our actual backup window. Absolutely everything is disk to disk to tape for this reason. We're shipping roughly 50TB to tape week-in, week-out.

Have you done anything to tune that iSCSI connection? RSS makes a huge difference to performance (IME) and at 1GbE Jumbo Frames and Flow Control should all be managed to give you the best possible throughput.

For the benefit of others posting in this thread, I have about 2.5TB of Veeam backups coming over the WAN over a variety of link speeds from 10M to 100M but they are forever-incremental and are a last resort really (time to restore would be ridiculous). We get around that by having multiple restore points on local replicas (which works really well). To be quite honest, in a smoking crater situation we'd be restoring those VMs to local compute here and moving everything possible client-side to Citrix while we responded to whatever happened. If you don't have that luxury I'd be thinking long and hard about restoration scenarios and what the business is expecting in terms of RPO and RTO. A week of downtime is a realistic proposition for such a scenario (without a backup plan) and you might as well put your efforts into finding a new job at that point.

Tearz · 13 Sep 2013 at 20:33

Veeam backup to local deduplicated disk target (EMC DataDomain)

DataDomain at each branch office, using DataDomain replication to distribute offsite.

Lanz · 14 Sep 2013 at 12:14

We currently robocopy our full jobs to the offsite location. But this doesn't work with reverse incrementals, as it touches all files. Version 7 has a wan accelerator copy option that you can use to send deduped copies to anywhere you want without having to run a separate job. This is something we are going to use soon once we buy another couple of repository servers.