Databases and backups

sunama · 4 Apr 2012 at 18:37

I have a question regarding databases for which the data is important.

For example in an online shop, how would customer data base stored or backed up, such that over a decade after placing an order, the details of that order can still be accessed (and no data would've been lost during the decade?

Are periodic backups advised?
Or are 2 databases created, where the data is stored in both places at once and then regularly checked to ensure that they are both identical?

I'm looking into saving data regarding customer account balances, so it is very important that I do not lose any data whatsoever. Even over a decade old.

If I have to backup a database with an older database (replacing new data with old data), this could be catastrophic as I would end up missing some transactions which may have taken place in the intervening time period.

Is there a website I could read on database backup or redundancy techniques?

Right now, I am thinking along the following lines:

1. whenever a customer makes a transaction, a copy of this transaction is emailed. This way, if the database were to ever be corrupted, it can be rebuilt using the data in the emails. Providing every email gets through and no emails are deleted, this should be rock solid.

How do banks get around this problem?
How do online stores get around this problem?
How do online casinos get around this problem?

Advice needed.

Thanks

sunama · 5 Apr 2012 at 08:28

Thanks for your answers guys.

Just to clarify, I am using MySQL (not MS).

The DB itself is quite simple. My only worry is losing customer data.

I'm still trying to get my head around the following problem and perhaps someone can educate me here:

Lets assume that we are dealing with only 1 customer.
Lets assume that backups are performed daily at 5:00PM.

customer balance at 7:00AM = £0
8:00AM: customer deposits £200 (customer balance = £200)
11:00AM: customer deposits £100 (customer balance = £300)
Database fails and gets wiped. All accounts are restored from previous day's backup, where customer balance was £0.
2:00PM: customer deposits £50 (customer balance £50)

Correct customer balance should be £350, not £50.

Now in this example, you can see that although I backup, once per day, at 5:00PM, in the event of a total DB wipe, I still lose the transactions which occurred between the last backup and the exact time of failure.

How do banks get around this problem?
I'm assuming that they have a secondary DB which is kept offsite, with exact duplicate data? Am I right? Is this the only way to get around this problem?

sunama · 5 Apr 2012 at 16:11

Are there any companies out there who specialise is storing data, in a database, reliably (ie. with guaranteed no data loss), with remote access (read/write)?

I think Amazon were doing something along these lines.
Also with regards to security, will an SSL connection suffice?

It just seems overkill for me to spend huge amounts of time/effort/money to deal with what is likely to be about 10 transactions/day. Though, because money is directly involved I cant afford even one error.

I'm sure that there are ready made solutions available to deal with secure/safe databases. I can't believe that all the online stores which we see pop up, who all have online purchasing, have spent huge sums of money dealing with backups of previous orders and current balances of customers.

sunama · 5 Apr 2012 at 16:13

AshenShugar1873 said:
Mirroring - 2 servers (+ optional witness server), the live does the work and then sends all the committed transactions to the mirror so they stay in sync, if the live goes down the mirror takes over (automatically if you have a witness server). Once the Live server is back online, the witness will sync the transactions back over and can automatically make the Live server the master again. You still do your backups of course.

I like this one the best.
The problem is that if I were to implement and fully test this myself, it would take me a few months. Also, I'd have to sort out a lot of hardware, in multiple locations for this.

In any case, I think I need a more "off-the-shelf" solution to this. There is no way that I can get involved in maintaining a complicated database system. Doing this is a full time job in itself.

sunama · 5 Apr 2012 at 16:52

Andre said:
U
Email of every transaction is not a good idea at all!

What is wrong with this idea?

Proposal
The idea being that the web server (main) database will record all the transactions.

Each transaction shall also be emailed, for the secondary database to update the transactions on itself.

The secondary database which works off of the information in the emails, shall read the emails, noting down the customer id and transaction amount (negative, for withdrawals). It shall then update a local database (this database is held on a local server, in a different location to the main database).

X times/per day (where 1 <= X <= 24), the balance of each account in the main database is compared with the account balance of the secondary database. The balances which differ are marked up for attention from the admin. Any account balances which do not match up, have their account "locked" until the admin investigates and unlocks it.

We also have to think of situations where a customer may illegally bump up his balance by hacking in to the database. If this happens, the system I have described above, should be able to flag this up.

We shall also have to create a system such that when a customer wishes to withdraw money, the request is sent to the primary and the secondary database. If the 2 database report back non-identical account balances, the withdrawal is halted.

Bear in mind I am thinking about aloud here on ways to ensure that account balances are correct and there is nothing suspicious going on (ie. someone hasn't hacked their account to withdraw a huge sum of money).

sunama · 5 Apr 2012 at 21:33

Thanks Ashen,

I spent a while looking through the available plugin for Joomla. Most of them deal with payment systems, but they don't go into depth on storing data in databases.

So, I had a thought of what you wrote earlier, involving 2 databases (different locations), 1 mirroring the other and a witness (different location from the DBs). I've spent the last 90 minutes writing pseudo code to allow for this.

I'm also writing an algorithm, which adds to the task of the witness. Specifically, the witness program is able to check for "tampering". So lets say somehow a hacker hacks into the DB and bumps up his account balance by adding a new transaction, eg deposit: £500, my witness, will pick up on this and alert the admin.

The way in which a user accesses his/her account balance is now looking more secure, so thanks for planting those ideas.

Thanks to the other people who also weighed in with ideas. Much appreciated.

sunama · 5 Apr 2012 at 22:34

One problem with a direct replication is that 'what happens if someone hacks into the primary database and increases their balance OR creates a fake deposit'. This will then get replicated in the 2nd DB.

Now, using the "witness" system, effectively, this program manages the 2 DBs. During the management process it shall also be able to check for any erroneous transactions that have either been altered (ie. in DB1, transaction 123, deposits £500, but in DB2, transaction 123, deposits £50...this may indicate that perhaps someone has hacked in a changed their account balance by effectively adding an extra £450). So, this "witness" program can effectively be packed with all sorts of checking systems.

The direct replicator system, is almost like a blind/dumb version of the witness program.

The direct replication system is faster.
The witness system can be made more intelligent, but is going to be slower (a small price to pay, if it can reduce the chances of fake/altered transactions).

sunama · 5 Apr 2012 at 23:57

I want redundancy AND some form of security protection.
I've finished writing up the pseudo-code, which now automatically updates the mainDB and the witness then updates the secondaryDB (only with checks).
If nothing fishy has gone on, then both DBs should be identical. If something fishy has gone on, then the admin is notified and the account is locked, until the admin has unlocked the account.

sunama · 6 Apr 2012 at 20:44

We have to remember that a DB can be hacked/attacked from different angles.
Without spending huge sums of money, we will never be able to keep out the most determined and inventive hacker.
What we can do though, is create a system which will thwart all, but the best hackers.

Initially, my business/idea/service will be quite low key. While I am a small entity the best hackers are unlikely to attack me. As the entity grows in size and transactions increase in number and in value, I shall upgrade the protection/security (which will obviously be increasingly expensive, the stronger the system).

What I can say is that at present there are many SMEs who use the most basic and simple data storage systems. You can even buy off the shelf online shops, which use little to no security protection.

So, my aim is to release the first version of the DB system, followed by an upgraded 2nd version AFTER the amounts of money running through the system has increased in value.

sunama · 6 Apr 2012 at 22:11

Hi Pho,

Initially, both DBs shall be hosted on shared hosting.

Today, I have been thinking further of improvements.
Rather than creating the 2nd DB, by using a separate program, we shall update both DB1 and DB2, at the same time. In theory, both DBs should be created/updated equally and should be identical at all times.

What I shall then do is create a program which checks that both DBs are identical. If not, then more checks shall be carried out until the offending transaction(s) have been weeded out.

The account which belongs to those particular transactions are then automatically "locked" and an email is automatically generated for the admin to deal with.

I'm actually going to use a re-mix of sniffy's algorithm, to create a check-key system, so that even if someone alters the transaction "amount" after the initial transaction has already taken place (which would be the most tempting thing to do), the check-key should highlight this.

The other obvious thing which a hacker would do is add an extra (fake deposit) transaction. The hacker would need to update both DBs, otherwise my program will spot that there is an extra transaction in 1 DB. Again, this would "lock" the account which that transaction belongs to, until an admin unlocks the account.

Ideally though, I would like another way to ensure that the latest transaction which has been added to the DB, is correct/valid. What if the hacker manages to hack both DBs, at the same time and add the fake transaction in, while also knowing the check-key system? He can successfully add whatever amount, to his account balance.

sunama · 6 Apr 2012 at 23:13

ZombieFan said:
[edit] just read your latest post above which makes a lot of my stuff below void

Indeed. But those points you made about emails not being a great medium on which to build a database on, are still valid and have taught me a few things.

What I do need is a way to verify each deposit. I'm not too fussed about the withdrawals (yet), but I am fussed about the deposits.

I want to think of a way to verify that each deposit is actually genuine, just in case a hacker managers to successfully add a fake deposit transaction, to both databases (eg. if he gets hold of the method which adds a new transaction (deposit), to the database).

sunama · 7 Apr 2012 at 15:24

Pho said:
See, I'd hazard a guess that shared hosting is going to be a much easier attack vector. If someone roots the hosting server you're screwed either way.

Indeed. I spoke with TSOHost who said that the only way to improve security on the DB is by having a dedicated hosting service. This way, they can add a firewall. However, we are talking about increasing costs here. And if we assume that we are dealing with 5-10 transactions per month, initially, this just doesn't seem worth it.

Pho said:
This just sounds like pure replication so just get MySQL to do that for you -

This not pure replication. Repication is good redundancy but useless when it comes to fraud detection.

By using the system which I am proposing, we have 2 DBs which in theory should contain identical data, at all times. If they do not then the admin is flagged.

Why might the 2 DBs have different data?
1. if 1 DB is "down", while the other is "online", only 1 DB will get updated.
2. if a hacker updates or alters 1 DB

In both cases, the admin shall be alerted. In case 1, the admin can make an "adjustment" to correct the difference, to prevent the same "error" from alerting the admin. In case2, the admin can lock the user's account and investigate what has gone on.

To clarify, if we use simple replcation, if the hacker decides to alter DB1, DB2 is automatically altered as well, thus defeating the ability to detect differences between the 2 DBs (as they are always identical).

Pho said:
To me this all seems to point to periodically auditing transactions against your payment gateways, of course, if that gets hacked you and a lot of other people have bigger problems.

I'm looking into this right now. I am going to use Paypal. Now paypal have an API to get transactions which have occurred. The easiest way is to have a program periodically get the transaction ids of transactions and then check the DB against them. If there are erroneous transactions, then we can alert the admin and lock the account. HOWEVER, this is only available on the Pro Account and seeing as I am going to be using Paypal Standard, this facility is not available.

Now, I have created a program which is able to read a webpage, so its still possible that I might get my program to load up Paypal's account page and then extract all the transactions which have occurred.

Pho said:
MSSQL can do transaction log shipping (I'm sure MySQL can but I have no idea what they call it) which is kind of like your email idea - it dumps the transaction details into a file which another server can read and apply to itself. You could have them dumped into a read-only VPN/FTP folder which the other server picks up and processes.

I may use emails to validate the transactions, but the problem here is that we are involving too many servers.
Server1. web host and DB1 host
Server2. DB2 host
Server3. email server
Server4. local server running my program which receives the email

The more servers we have the greater the chance of a point of failure. If somebody has paid money into their account, if any of the above 4 servers are out, their transaction will show as pending = not good. The only way that a transaction is considered valid, if is all the servers involved are working.

The best balance is to have 2 separate databases on Server1 (DB & web host) and Server2 (DB2).

Also consider that if somebody wants to withdraw money, there are additional checks which occur. So, lets assume that a hacker successfully bypasses all the security measures before withdrawal. He may still be caught by a human, at the time of withdrawal.

I shall also be using Paypal's IPN system (ie. when a transaction is successful or cancelled, paypal automatically generates a message which is sent to a url/server). This is another way to get hold of the details of the transaction and perhaps hold this information on a separate server.

sunama · 7 Apr 2012 at 22:41

I think I did state earlier that any security system can be compromised. You just have to look at as many different ways that this may happen and close the doors. Some doors will never be closed.

Emails are not going to be used. This was just an idea thrown up.

Paypal (as far as I am aware), do not have any software which updates your DB. Their part ends once the user has made payment and they have redirected the user back to your website. My problem is 'what happens if the user never makes any payment to Paypal and directly manipulates the DB on which the account data is stored'? In this scenario Paypal are not actually involved.

Over the last hour, I've been looking at PayPal API (in conjunction with their express checkout service, which has no monthly fees). And I am now looking to use their transaction search facility (API) to verify the transactions which are in the DB.

In an earlier post I stated that the big problem will be if someone gets hold of the method which updates the DBs. By using an auditing program to check the transactions in the DB, against the list held on Paypal servers, we shall be able to find any fake transactions. This goes some way to solving that problem.

The auditing program shall check the 2 DBs at regular intervals and check is everything is "ok" or if an admin needs to be notified (and if the account should be locked).

An additional (transactions history) check also occurs when a user wishes to withdraw money.

For sure there are companies who have come up with much better ways to avoid fraud, but I think for my purposes I have come up with a simple solution which is ample (involving 2 DBs, held on separate web servers, an auditing/DB-verifying program, which is held on a 3rd server and then a final check when the user wishes to withdraw money).

sunama · 8 Apr 2012 at 00:03

Hi, thanks for your input.

Goksly said:
Personally I'd just have a single transaction table which is audited to a table that cannot be updated / deleted; only inserted

"Which is audited to a table that cannot be updated". By this do you mean that the auditing program will have to "extract" new transactions from the main DB and then add them to the audited DB?

Goksly said:
When rows are updated you can use in built commands to get the users adding that row.

What do you mean by "get the users adding that row"?

Goksly said:
For a hacker to successful alter their balance they'd have to compromise your system (as db wouldn't be directly accessible from outside), and get the username / password for the web server users accounts. To help combat this, you could use encryption between the web server and the db server (and if the account on the web server was a service only account... the hacker would need to have broken multiple accounts).

The DB(s) must be kept on a web server (I don't want to bring it in-house), due to the reliability of the internet connection. Once I can afford to host the DB on a dedicated server (off-site, with a hosting company), then this is viable, but until then I do not want to host the DB myself.

All DBs are held on a shared web host, with no Firewall.

sunama · 8 Apr 2012 at 01:12

Goksly said:
By audited I mean a trigger which places any information into another table.

This would surely be done by an external program, right?
And if so, we are back to the original issue of having a 2nd DB/table which is duplicated using an external program.

Or are you thinking of an automatic replication, performed by SQL (without any auditing/checking)?

Goksly said:
Avoid multiple databases, instances etc. Avoid external programs to replicate data.

Without having a second DB, we have no redundancy, though? That was the initial problem, right at the start (before we introduced the problem of security). If we were to replicate a database in another location (I mean perfect copy), then this has the problem that if a hacker somehow manages to update the main DB and changes a deposit amount, this will automatically be replicated and change the 2nd DB. This is not good as when the auditing program checks differences between the 2 DBs, no differences will be found.

Goksly said:
The inbuild security of operating systems / databases is pretty good.

Agreed. I want to use as many in-built features as possible, without having to custom-code myself (which is time-consuming).

Goksly said:
As you're storing transactions just make sure your data is encrypted

Agreed.

sunama · 9 Apr 2012 at 01:29

Ok. I understand. I shall certainly look into this method.
Much appreciated.