Recording/Analysing data

Soldato
Joined
22 Oct 2005
Posts
2,884
Location
Moving...
I’ve picked up a new task at work. In short it involves reviewing several log files that cover a 24 hour period to ensure there are no abnormal results. Some of the errors are known and deemed as acceptable, whereas other may indicate a failure. The frequency of the errors must also be monitored. The problem here is that there are no guidelines as to what is normal and what is not. It’s all done based on experience, but the guy with all the experience is leaving…

My plan is to start grabbing the logs and start recording them somewhere so I have a history. From there I will try to start flagging things which are normal/abnormal, and the ‘normal’ amount of occurrences of that error over a period of time.

My problem is that I don’t know the best way to record the data, or how exactly to analyse it! Does anyone have any advice? I’m not looking for super sophisticated and autonomous with all the bells and whistles, but just something that will help me out a bit. It’s unlikely that I’ll be able to install software, so would prefer if I could achieve this in excel or access.

Thanks for any advice.
 
What's the data format? How is an error identifiable? I presume this could easily be done in Excel via some equations (mostly =if I presume), sorting, and conditional formatting. Would you be able to share part of the data? It doesn't have to real.
 
The data is stored in a .log file which can easily be imported into excel as a csv. I THINK there is a unique error code on each line indicating the type of issue, although I might be mistaken (I don’t have access to the files currently – I’m just going off memory). If there isn’t an error code, there is probably a generic descriptor that could serve as a similar purpose. E.g “Machine X has failed at site Y”, or “Process Z restarted”.
 
As long as the errors are consistent(ie named uniformly) you'd easily be able to collate them from excel. Would be fairly simple to flag unexpected errors/expected errors over a tolerance as by the sound of it those will be the ones to look out for.

As loms said to give a formula we'd need to see an example of data.
 
To look for significant changes in frequency then use statistics to model the historic average and variances and then you can flag anything where the probability is below some threshold, e.g. 1%
 
Excel. Pivot table with count of each error code. As mentioned, if you want to go beyond that you could use stats to separate normal from abnormal variation.
 
I used to use a syslog logger and script all the stuff that was unusual.

I also has a batch file that logged to a text file and then deleted all the normal stuff, whatever was left was reviewed.
You don't have to install batch files, just set them to run on the hour or something.
 
Back
Top Bottom