Databases and backups

sunama · 4 Apr 2012 at 18:37

I have a question regarding databases for which the data is important.

For example in an online shop, how would customer data base stored or backed up, such that over a decade after placing an order, the details of that order can still be accessed (and no data would've been lost during the decade?

Are periodic backups advised?
Or are 2 databases created, where the data is stored in both places at once and then regularly checked to ensure that they are both identical?

I'm looking into saving data regarding customer account balances, so it is very important that I do not lose any data whatsoever. Even over a decade old.

If I have to backup a database with an older database (replacing new data with old data), this could be catastrophic as I would end up missing some transactions which may have taken place in the intervening time period.

Is there a website I could read on database backup or redundancy techniques?

Right now, I am thinking along the following lines:

1. whenever a customer makes a transaction, a copy of this transaction is emailed. This way, if the database were to ever be corrupted, it can be rebuilt using the data in the emails. Providing every email gets through and no emails are deleted, this should be rock solid.

How do banks get around this problem?
How do online stores get around this problem?
How do online casinos get around this problem?

Advice needed.

Thanks

davetherave2 · 4 Apr 2012 at 20:05

If this is business critical then full backups should be done nightly and then stored on tapes or other backup device.

You could go down the log shipping route or if you have the hardware clustering the sql server so that if the main server fails then a backup server kicks in.

if you are using microsoft sql server then this is a good place to start looking http://www.sqlteam.com/

for other sql tech look at the manufacturer's website they should be able to give you a load of helpful links and advice on best practices.

oxygene · 4 Apr 2012 at 20:11

I would have thought an MS Access database stored on dropbox would do the job?

Goksly · 4 Apr 2012 at 20:27

Remember unless you test the backups every time you've not got a guaranteed backup!

Back it up daily . All data stored in the same db (dont have one per year etc). Store backups in physically different location.

Dolph · 4 Apr 2012 at 23:18

There's backup solutions of various kinds, and there's also data warehousing, where you transform the data into a structure designed for efficient reporting rather than efficient transaction processing (although taking backups of your data warehouse is still a very vital part).

So to do this part:

I'm looking into saving data regarding customer account balances, so it is very important that I do not lose any data whatsoever. Even over a decade old.

What you could do, instead of storing the whole of everything forever, is create a data model containing a record of every customer's balance at 5pm daily. You could even expand it by including changes (eg credits and debits to the balance) as well.

The best method for you depends a lot on how big and complex your DB is, how many concurrent users, how much changes day to day and so on. I wouldn't recommend the email idea though, sounds horrendous...

sunama · 5 Apr 2012 at 08:28

Thanks for your answers guys.

Just to clarify, I am using MySQL (not MS).

The DB itself is quite simple. My only worry is losing customer data.

I'm still trying to get my head around the following problem and perhaps someone can educate me here:

Lets assume that we are dealing with only 1 customer.
Lets assume that backups are performed daily at 5:00PM.

customer balance at 7:00AM = £0
8:00AM: customer deposits £200 (customer balance = £200)
11:00AM: customer deposits £100 (customer balance = £300)
Database fails and gets wiped. All accounts are restored from previous day's backup, where customer balance was £0.
2:00PM: customer deposits £50 (customer balance £50)

Correct customer balance should be £350, not £50.

Now in this example, you can see that although I backup, once per day, at 5:00PM, in the event of a total DB wipe, I still lose the transactions which occurred between the last backup and the exact time of failure.

How do banks get around this problem?
I'm assuming that they have a secondary DB which is kept offsite, with exact duplicate data? Am I right? Is this the only way to get around this problem?

Andre · 5 Apr 2012 at 09:27

Use RAID-5/6/30/50 disc arrays (with mirrored servers if you are that worried.)

You can always move historic transactions to a "history/archive" table on a trigger if the live table is getting too large/slowing searches down

Email of every transaction is not a good idea at all!

During testing, I have written every call to the database to a file on each machine running the system with a date/time stamp so they can be re-applied when the structure was changed/database re-built

sniffy · 5 Apr 2012 at 09:45

Interesting topic. I don't personally have experience with this but I imagine it's about having multiple databases in different geographical locations with mirrored data as well as periodic snapshots of the database saved in different geographical locations. You can't really mess about when it comes to money though. I would consider contacting a specialist. What if some script kiddie got access to your database and updated his balance with a simple UPDATE statement? Would you even notice that this took place?

AshenShugar1873 · 5 Apr 2012 at 10:13

sunama said:
Thanks for your answers guys.

Just to clarify, I am using MySQL (not MS).

The DB itself is quite simple. My only worry is losing customer data.

I'm still trying to get my head around the following problem and perhaps someone can educate me here:

Lets assume that we are dealing with only 1 customer.
Lets assume that backups are performed daily at 5:00PM.

customer balance at 7:00AM = £0
8:00AM: customer deposits £200 (customer balance = £200)
11:00AM: customer deposits £100 (customer balance = £300)
Database fails and gets wiped. All accounts are restored from previous day's backup, where customer balance was £0.
2:00PM: customer deposits £50 (customer balance £50)

Correct customer balance should be £350, not £50.

Now in this example, you can see that although I backup, once per day, at 5:00PM, in the event of a total DB wipe, I still lose the transactions which occurred between the last backup and the exact time of failure.

How do banks get around this problem?
I'm assuming that they have a secondary DB which is kept offsite, with exact duplicate data? Am I right? Is this the only way to get around this problem?

I'll give you an example for how we do ours and you can modify to suit and see how to avoid the above problem.

Weekly Full Backup - Performed at 2.00am on weekends, complete stand only copy of database

Differential Backup - Performed at 2.00am weekdays, changes from last full back up, saves on space and backup time on large databases, but depends on the full backup being available to restore.

Transacation Log backups - Performed hourly every day. Copy of all changes made to the database in last hour, need to be restored in conjunction with a full and (optionally) a differential backup.

In many situations the transaction logs are backed up more frequently, every 15 mins, every 5 minuntes. This allows almost real time restore of data. So when your database fails, you restore the full back-up, that day's differential, then all the transaction logs up to your point of failure.

You then bring your system back online to just before the point of failure losing little or no data. There is also an option to back-up all un-backed-up but committed transactions in certain failure scenarios.

A more robust system is to have mirroring or clustering.

Mirroring - 2 servers (+ optional witness server), the live does the work and then sends all the committed transactions to the mirror so they stay in sync, if the live goes down the mirror takes over (automatically if you have a witness server). Once the Live server is back online, the witness will sync the transactions back over and can automatically make the Live server the master again. You still do your backups of course.

Clustering - Even more fail safe but expensive and more difficult to set-up, with shared disk arrays and virtual subnets, with the servers in different locations. Again, still do your backups as above.

In summary, in your above scenario you aren't backing up often enough. You don't need full backups every moment of the day to have (nearly) all your data available for restore. If you are dealing with money and bank transfers then you really need to get expert advice and have the system set-up correctly with no loss of transactions.

sunama · 5 Apr 2012 at 16:11

Are there any companies out there who specialise is storing data, in a database, reliably (ie. with guaranteed no data loss), with remote access (read/write)?

I think Amazon were doing something along these lines.
Also with regards to security, will an SSL connection suffice?

It just seems overkill for me to spend huge amounts of time/effort/money to deal with what is likely to be about 10 transactions/day. Though, because money is directly involved I cant afford even one error.

I'm sure that there are ready made solutions available to deal with secure/safe databases. I can't believe that all the online stores which we see pop up, who all have online purchasing, have spent huge sums of money dealing with backups of previous orders and current balances of customers.

sunama · 5 Apr 2012 at 16:13

AshenShugar1873 said:
Mirroring - 2 servers (+ optional witness server), the live does the work and then sends all the committed transactions to the mirror so they stay in sync, if the live goes down the mirror takes over (automatically if you have a witness server). Once the Live server is back online, the witness will sync the transactions back over and can automatically make the Live server the master again. You still do your backups of course.

I like this one the best.
The problem is that if I were to implement and fully test this myself, it would take me a few months. Also, I'd have to sort out a lot of hardware, in multiple locations for this.

In any case, I think I need a more "off-the-shelf" solution to this. There is no way that I can get involved in maintaining a complicated database system. Doing this is a full time job in itself.

sunama · 5 Apr 2012 at 16:52

Andre said:
U
Email of every transaction is not a good idea at all!

What is wrong with this idea?

Proposal
The idea being that the web server (main) database will record all the transactions.

Each transaction shall also be emailed, for the secondary database to update the transactions on itself.

The secondary database which works off of the information in the emails, shall read the emails, noting down the customer id and transaction amount (negative, for withdrawals). It shall then update a local database (this database is held on a local server, in a different location to the main database).

X times/per day (where 1 <= X <= 24), the balance of each account in the main database is compared with the account balance of the secondary database. The balances which differ are marked up for attention from the admin. Any account balances which do not match up, have their account "locked" until the admin investigates and unlocks it.

We also have to think of situations where a customer may illegally bump up his balance by hacking in to the database. If this happens, the system I have described above, should be able to flag this up.

We shall also have to create a system such that when a customer wishes to withdraw money, the request is sent to the primary and the secondary database. If the 2 database report back non-identical account balances, the withdrawal is halted.

Bear in mind I am thinking about aloud here on ways to ensure that account balances are correct and there is nothing suspicious going on (ie. someone hasn't hacked their account to withdraw a huge sum of money).

Xelene · 5 Apr 2012 at 18:41

We have a pair of live databases clustered in one location, so that they are both readable and writeable with the same information. There are another two clustered servers in another location that have the data also but aren't accessible normally.

We archive data over a certain age that won't need to be accessed either onto another DB or onto disk. Our main DB goes back about 8-9 years.

This is doable in Oracle, SQL Server and Oracle with various degrees of ease. Probably other database engones also.

We're also strting to use Mongo where you can achieve something similar with replica sets - but the implementation is quite different.

Licensing costs can get pretty high with the commercial RDBMS; Oracle costs us quite a lot of money.

AshenShugar1873 · 5 Apr 2012 at 18:44

The (usually) free content management systems that come bundled with web space will offer things like shopping carts that you may be able to use for your situation. Drupal and Joomala are two of the more popular. How secure/robust and transaction safe they are you would need to investigate.

Some examples can be found here, one might do what you need.

Joomla payment extensions

sunama · 5 Apr 2012 at 21:33

Thanks Ashen,

I spent a while looking through the available plugin for Joomla. Most of them deal with payment systems, but they don't go into depth on storing data in databases.

So, I had a thought of what you wrote earlier, involving 2 databases (different locations), 1 mirroring the other and a witness (different location from the DBs). I've spent the last 90 minutes writing pseudo code to allow for this.

I'm also writing an algorithm, which adds to the task of the witness. Specifically, the witness program is able to check for "tampering". So lets say somehow a hacker hacks into the DB and bumps up his account balance by adding a new transaction, eg deposit: £500, my witness, will pick up on this and alert the admin.

The way in which a user accesses his/her account balance is now looking more secure, so thanks for planting those ideas.

Thanks to the other people who also weighed in with ideas. Much appreciated.

ZombieFan · 5 Apr 2012 at 21:47

sunama said:
I like this one the best.
The problem is that if I were to implement and fully test this myself, it would take me a few months. Also, I'd have to sort out a lot of hardware, in multiple locations for this.

In any case, I think I need a more "off-the-shelf" solution to this. There is no way that I can get involved in maintaining a complicated database system. Doing this is a full time job in itself.

This is actually relatively quick to set up. The magic google search you are looking for is 'mysql replication'. Here are the online mysql docs to set it up - http://dev.mysql.com/doc/refman/5.6/en/replication.html

We use MS SQL Server in work. I'm not actively involved in the backups, but I know we have a main live server which replicates over to a replicaiton server - i.e. every single transaction performed on the live database is sent over to the replication database in real-time. This means that both databases are identical (give or take a few seconds for the replication lag).

If the live database goes down, it switches over to replication server to prevent data loss.

Along with this, regular nightly backups are performed to tape. Multiple copies of the backup are made and stored both onsite and offsite locations (just in case of fire etc).

[edit] Also, when you have got a backup strategy in place make sure you simulate a database falure to make sure it all works as expected. It's better to find out that you can't restore the data during a test instead of when it goes wrong for real!

Also remember the old saying... "Backups always work. It's restoring the data that fails."

sunama · 5 Apr 2012 at 22:34

One problem with a direct replication is that 'what happens if someone hacks into the primary database and increases their balance OR creates a fake deposit'. This will then get replicated in the 2nd DB.

Now, using the "witness" system, effectively, this program manages the 2 DBs. During the management process it shall also be able to check for any erroneous transactions that have either been altered (ie. in DB1, transaction 123, deposits £500, but in DB2, transaction 123, deposits £50...this may indicate that perhaps someone has hacked in a changed their account balance by effectively adding an extra £450). So, this "witness" program can effectively be packed with all sorts of checking systems.

The direct replicator system, is almost like a blind/dumb version of the witness program.

The direct replication system is faster.
The witness system can be made more intelligent, but is going to be slower (a small price to pay, if it can reduce the chances of fake/altered transactions).

ZombieFan · 5 Apr 2012 at 23:30

I'm now confused as to what you want. Replication allows you to perfectly mirror a database in real-time. It's got nothing to do with database security.

If someone hacks into the live database and performs an update then, yes, this will be replicated since it's keeping the live and backup databases in sync. Otherwise it wouldn't be a proper backup.

sunama · 5 Apr 2012 at 23:57

I want redundancy AND some form of security protection.
I've finished writing up the pseudo-code, which now automatically updates the mainDB and the witness then updates the secondaryDB (only with checks).
If nothing fishy has gone on, then both DBs should be identical. If something fishy has gone on, then the admin is notified and the account is locked, until the admin has unlocked the account.

sniffy · 6 Apr 2012 at 17:06

I wouldn't feel confident using a secondary server to check someone hasn't done an unauthorised modification. If they're inside the primary then your secondary is probably open to them.

The following wouldn't be secure but might give you an idea about possible alternatives. If you logged absolutely every statement ran against the database(s), encrypting each statement using the previous statement (before it was encrypted) as the encryption key, you could run a basic validation check for two given database snapshots. You would encrypt the first entry using some randomly generated string and use this when it comes to decrypting the log (decrypt first entry then you have the key to decrypt the second entry and so on). When you want to validate you take the oldest snapshot, run the decrypted statements against it then check the two snapshots for consistency. This has many flaws, namely the hacker could just listen out for a statement then he has a legit encryption key for your log, but would catch-out someone who managed to sneak an UPDATE into the system or a script kiddie out of his depth. There's no doubt some brilliant solutions to this problem, just wish I had some decent info for you. Hopefully this thread catches the eye of someone with experience doing this stuff for a living.