Screen scrape for non programmer

Associate
Joined
19 Feb 2009
Posts
338
Hi,

Having had many a conversation with the UK Hydrographic Office this week I now have a license to reproduce their data. However, the data is only on their website and they have no plans to produce a data feed/API in the near future. They have suggested I screescrape to get the data and that is where I need your help.

The information I need is on this page: "http://easytide.ukho.gov.uk/EasyTide/EasyTide/ShowPrediction.aspx?PortID=0033&PredictionLength=7". All I need is the image and the 7 day prediction which is all nicely collected together as a series of tables. Can anyone suggest software to do this that will then update my page once a day?

I am no programmer beyond HTML, CSS and my beloved JQuery. I fear PHP will need to be used which I have only used in its most basic forms. Is there any easy answer or will I need to get to grips with another language to make this work the way I need it to? If it helps I am running on a Linux box and all my pages are .php as I am using includes so that may be useful info (or not!).

Your help is appreciated. Thanks.
 
Well the tables are in a nice div with a class of "HWLWPanel" So you could get the page source and drag that out.
The image is more difficult as it's not really an image, it seems to be generated at request time.
 
Very quick PHP example.. open to HTML injection but it works :p.

I decided to just read in the whole of the div containing the data tables for ease, though you could break it down into individual variables if you needed.. more work though.

PHP:
<?php

// Load HTML
$html = @DOMDocument::loadHTMLFile("http://easytide.ukho.gov.uk/EasyTide/EasyTide/ShowPrediction.aspx?PortID=0033&PredictionLength=7");
// Convert to SimpleXML object
$xml = simplexml_import_dom($html);

// Parse image location from HTML
$image = $data = $xml->xpath('//img[@id="_ctl1_imgGraph"]');
// Parse data tables from HTML
$data = $xml->xpath('//div[@id="_ctl1_HWLWTable1_pnlHWLW"]');

// Image / data tables variables for use in current page - assume we only have one so just take the first array index
$imageSrc = 'http://easytide.ukho.gov.uk/EasyTide/EasyTide/'.$image[0]['src'];
$dataTables = $data[0]->asXML();
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/html4/strict.dtd">

<html>
	<head>
		<title>Easy Tide Scrape</title>
		<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
		
		<style type="text/css">
			img { border: 0 }
			.HWLWTable { float: left; padding: 10px; }
			.HWLWTableHeaderCell { background-color: pink; }
		</style>
	</head>

	<body>
		<img src="<?php echo htmlentities($imageSrc);?>" alt="Prediction Graph">
		<?php echo $dataTables; ?>
	</body>
</html>
 
GMT

This is just what I was looking for! Really good post pho. :)

Is there any way to compensate it for GMT? (Easytide does not update its data based on GMT!)
 
Back
Top Bottom