Database construction help

two00lbwaster · 9 Mar 2013 at 11:14

Hi there,

I'm trying to build a database structure for a little side project for work, however, I've never written PHP before, never mind built a database structure.

I have worked with existing ASP code which had an Access database backend, so I'm not coming at this without the ability to figure out how to build it, but I've never had to build the database structure, and what I want to do is so monumentally colossal in scale I'm struggling with how I'm going to structure the damned thing.

The Project:
cURL a website and get the response code back and store it into a database where I should be able to do some nice things querying the data for SLA purposes and tabular presentation.

The Scale:
cURL every 60 seconds and store the response for, currently, 2000 domains. Our company has less than 10% market capitalisation and we're still growing rapidly so that 2000 figure is really going to ramp up.

Something similar:
Watchmen

The issues with using WatchMen are:

That it uses a Redis database which we don't have anyone who has production experience with that
It is really difficult to get websites into it to be monitored; possibly because we don't have anyone with any production JSON experience
We haven't developed it so we'll struggle to extend the capabilities in the direction that we want
It's written for node.js and we don't have anyone who has any production experience with that

So, I thought that it can't be so hard to write something like this, and so far the PHP is playing nicely, I just need a storage engine that will cope with the amount of data.

Part of my thinking was initially three tables:
Data
id (AutoIncrement PK) | url_id (FK) | time_stamp | status_code
URLs
url_id (AutoIncrement PK) | website_url
IPs
ip_address | friendly_name

However, I have a feeling that performance might go down the drain. I'm not sure if everything that I have in data should be there.

My very first thought was to have >2000 columns so that I had one time_stamp entry with all the status codes for each website url under it's own column.

I'd greatly appreciate all help here as I'm completely new to this (one thought was to just have a go, but I know that the database design is critical to its future scalability.)

Shad · 9 Mar 2013 at 14:12

Interesting problem.

Firstly, do you really need the entire history for each URL you're monitoring? Over the course of just one day you're looking at nearly 3 million records being created, which is going to be a problem if you need to be reading from the table as well as writing to it.

Database systems like Microsoft SQL Server are great at handling very large narrow tables of data, as opposed to wider tables. By wide I mean many columns. A table with many millions of rows will perform well given the right indexing, but indexes are expensive for writes and if you're looking at making over 33 writes a second you're going to need some decent I/O and processing power.

Think about approaching this slightly differently. Rather than recording the state of the URL every second, only record the state when it's different from the previous result. You would dramatically reduce the amount of data involved and would have no problem with the indexing to give you satisfactory read and write performance.

two00lbwaster · 9 Mar 2013 at 14:42

Thank you Shad.

This is exactly what I was looking for, a different perspective. It absolutely makes sense to only record the <> 200 codes, even if I'm only looking to check every 60-300 seconds or so. However, I was seeing the same problem with the amount of data that would be written to the DB.

Now I just need to work on my scripting a bit more.

Database construction help

two00lbwaster

two00lbwaster

Shad

Shad

two00lbwaster

two00lbwaster