Any backup/storage engineers here?

psd99 · 16 Oct 2013 at 16:44

Proper random right but someone asked me this today:

what kind of impact would a full, differential and incremental on a customer's server and backup/storage environment?

I totally understand the concept of the difference between the backups but I don't know how to answer that question, how would you?
what would you say?

GravyMonster · 16 Oct 2013 at 16:45

Impact in terms of... performance? Data loss? Disaster recovery? All of the above?

Admiral Huddy · 16 Oct 2013 at 16:54

Not sure i understand what you are asking?
Depends on the business policy surely. The question surely would be, "is speedy recovery critical to the business"? "Can the business function during backup operations?"

It's something you'd need to ask the customer?

psd99 · 16 Oct 2013 at 17:03

hi it is more performance Spunkey

what kind of performance impact will the full backup have on resources on the server and on the environments.
now I know I can say high cpu usage, high memory usage and great network utilisation for the full backup as there is more data.

but how can I answer this with respect to a inc or a diff, I think this question also asking about deduplication (something we use it)
so there is a DDB - deduplication database that caches all of the information when the backup runs

I just wondered how you would all answer this?

Even outside of performance, it be good to know how you would answer this.

Burned_Alive · 16 Oct 2013 at 17:13

I think you a word, which is pretty important in the context of the question.

But you're missing a lot of other context, mainly, what do you do now?

EDIT: Must've replied late

james_uk · 16 Oct 2013 at 20:05

Ideally you would hope the client is using a backup product which does CBT (Changed Block Tracking). In this way, at every backup window, the software knows exactly which disk sectors have seen modification since the last backup snapshot. These changed sectors can then be shipped off to the backup repository with minimal load on server disk IOPs, CPU and network bandwidth.

The backup server is then free to do dedupe + other processing using its own resources, as appropriate.

littlepuppy · 19 Oct 2013 at 13:27

There are / will be many variables at play here so there is no black and white answer, if you pm me some details i can try and give you a steer.

psd99 · 21 Oct 2013 at 09:45

littlepuppy said:
There are / will be many variables at play here so there is no black and white answer, if you pm me some details i can try and give you a steer.

I couldn't work out how to pm?

can you tell me on here more about disk IO, network usage, higher cpu usage
and how a full backup impacts server and environment resources?

Vtec9k · 21 Oct 2013 at 12:04

If the DDB is working server side, ie. on the backup server, there will be minimal extra CPU but obviously you're not saving any network bandwidth. Normally this is how it's configured in a LAN environment.

For slower WAN connections you'd want to use client side dedup so there would be a large CPU hit plus data storage for the DDB on the client (that could grow pretty large/require decent amount of iops)

Fulls obviously have a large impact on the IO but synthetic fulls are basically just an incremental with extra processing on the backup servers so much quicker and lower impact on the production servers.

A huge amount depends on the environment - For example are you using vmware with SAN snapshots and a dedicated storage iscsi/fibre connection? Or are you backing up directly off the physical disks?

psd99 · 21 Oct 2013 at 12:31

Vtec9k said:
If the DDB is working server side, ie. on the backup server, there will be minimal extra CPU but obviously you're not saving any network bandwidth. Normally this is how it's configured in a LAN environment.

For slower WAN connections you'd want to use client side dedup so there would be a large CPU hit plus data storage for the DDB on the client (that could grow pretty large/require decent amount of iops)

Fulls obviously have a large impact on the IO but synthetic fulls are basically just an incremental with extra processing on the backup servers so much quicker and lower impact on the production servers.

A huge amount depends on the environment - For example are you using vmware with SAN snapshots and a dedicated storage iscsi/fibre connection? Or are you backing up directly off the physical disks?

the DDB sits on client and on the backup environment, we use huge disk libraries

like the isilon, data domain and this uses fibre connections (not so sure on what type) backup data off of physical drives (customer servers)

what can you tell me further, based on that?

anything I don't mind · 21 Oct 2013 at 13:10

With a full backup you are guaranteed all the data up to the point of the last succesfull full backup. With incremental you have more of a chance of losing a few days of work if for example the backup fails on tuesday and monday was the full weekly backup and on friday you have to do a restore. You will have to use monday tapes. But if you did a full backup everyday you could use thursday's tapes.

http://en.wikipedia.org/wiki/Differential_backup

While differential gets around that problem by doing an "incremental backup" everyday of the week off the weekly full backup, while incremental requires all the previous tapes in the chain. This means that on the friday restore you could use the last successful weekly backup with the last successful differential.

Both incremental and differential are quite risky because if you have a full weekly backup fail you are a whole week+ behind in backup.