Afternoon all.
I'm designing a website for a friend and have hit a bit of a snag. He wants to display dates from another website on his website, and have it so it updates every month or so. Ok I thought, screen scrape is probably the way forward. So, with a little help from google, I've run up a simple screen scraping script in PHP with a couple of search parameters built in to narrow down the search to the bit I want.
The problem I'm having now is using regular expressions to narrow down the search the bit I want.
If you look at the html below you'll see the bits I need to extract look like this.
<a href="meetinginfo.cfm?Dateofmeeting=10 March 2009">
Yet I cant seem to find a regular expression that will do the job.
Any help appreciated, or other suggestions on how to accomplish what I want.
ps// So it only updates once a month, (don't want to spam the site) I was going to save it in a file with a date stamp and compare that to the current date and if the month is < i'll tell it to update. This a good idea or is there a simpler way?
The html I have to play with.
I'm designing a website for a friend and have hit a bit of a snag. He wants to display dates from another website on his website, and have it so it updates every month or so. Ok I thought, screen scrape is probably the way forward. So, with a little help from google, I've run up a simple screen scraping script in PHP with a couple of search parameters built in to narrow down the search to the bit I want.
The problem I'm having now is using regular expressions to narrow down the search the bit I want.
If you look at the html below you'll see the bits I need to extract look like this.
<a href="meetinginfo.cfm?Dateofmeeting=10 March 2009">
Yet I cant seem to find a regular expression that will do the job.
Any help appreciated, or other suggestions on how to accomplish what I want.
ps// So it only updates once a month, (don't want to spam the site) I was going to save it in a file with a date stamp and compare that to the current date and if the month is < i'll tell it to update. This a good idea or is there a simpler way?
Code:
<?php
$url = "http://localhost/temp/test%5b1%5d.html";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,'2009</td>');
$end = strpos($content,'</table>',$start) + 8;
$table = substr($content,$start,$end-$start);
$start1 = strpos($table,' Sustainable Development ');
$end1 = strpos($table,'</tr>',$start1) + 8;
$table2 = substr($table,$start1,$end1-$start1);
if (preg_match_all("href=[\"\']?((?:[^>]|[^\s]|[^"]|[^'])+)[\"\']?", $table2, $matches, PREG_OFFSET_CAPTURE)) {
print_r($matches);
} else {
echo "The search didn't find any results";
}
?>
The html I have to play with.
Code:
<tr>
<td class="boxBody"> Sustainable Development </td>
<td class="boxBody2" align="center">
<a href="meetinginfo.cfm?Dateofmeeting=13 January 2009">13</a><br></td>
<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=10 February 2009">10</a><br></td>
<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=10 March 2009">10</a><br> <a href="meetinginfo.cfm?Dateofmeeting=24 March 2009">24</a><br></td>
<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=07 April 2009">7</a><br></td>
<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=05 May 2009">5</a><br></td></tr>