PHP and regular expression help

Associate
Joined
11 Mar 2007
Posts
1,741
Afternoon all.

I'm designing a website for a friend and have hit a bit of a snag. He wants to display dates from another website on his website, and have it so it updates every month or so. Ok I thought, screen scrape is probably the way forward. So, with a little help from google, I've run up a simple screen scraping script in PHP with a couple of search parameters built in to narrow down the search to the bit I want.

The problem I'm having now is using regular expressions to narrow down the search the bit I want.

If you look at the html below you'll see the bits I need to extract look like this.

<a href="meetinginfo.cfm?Dateofmeeting=10 March 2009">

Yet I cant seem to find a regular expression that will do the job.

Any help appreciated, or other suggestions on how to accomplish what I want.


ps// So it only updates once a month, (don't want to spam the site) I was going to save it in a file with a date stamp and compare that to the current date and if the month is < i'll tell it to update. This a good idea or is there a simpler way?


Code:
<?php

$url = "http://localhost/temp/test%5b1%5d.html";
$raw = file_get_contents($url);

$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));

$start = strpos($content,'2009</td>');
$end = strpos($content,'</table>',$start) + 8;

$table = substr($content,$start,$end-$start);


$start1 = strpos($table,' Sustainable Development ');
$end1 = strpos($table,'</tr>',$start1) + 8;

$table2 = substr($table,$start1,$end1-$start1);



if (preg_match_all("href=[\&quot;\']?((?:[^&gt;]|[^\s]|[^&quot;]|[^'])+)[\&quot;\']?", $table2, $matches, PREG_OFFSET_CAPTURE)) { 
   print_r($matches); 
} else {
echo "The search didn't find any results";
}


?>

The html I have to play with.

Code:
<tr>
<td class="boxBody"> Sustainable Development </td>
<td class="boxBody2" align="center">
<a href="meetinginfo.cfm?Dateofmeeting=13 January 2009">13</a><br></td>

<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=10 February 2009">10</a><br></td>

<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=10 March 2009">10</a><br> <a href="meetinginfo.cfm?Dateofmeeting=24 March 2009">24</a><br></td>

<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=07 April 2009">7</a><br></td>

<td class="boxBody2" align="center"><a href="meetinginfo.cfm?Dateofmeeting=05 May 2009">5</a><br></td></tr>
 
Would some one mind giving me a little more help?

Ok so I got the above working and it's really really cool. It displays the dates in a cool little array and I can call them out easily enough. The problem I have is that I can only call them out if I know how many of them there are and as the array will be changing it'll break pretty quick. If I set the number any higher it just errors on empty rows saying there's nothing in there.

So, is there a way to tell it to only pull rows that have date contained in them?


Code:
for ($row = 0; $row < 5 ; $row++)
{
echo "<li>".$matches[$row][1]."</li>";
}
 
Code:
$intItems = count($matches)

for ($row = 0; $row < $intItems; $row++) {
   echo "<li>".$matches[$row][1]."</li>";
}

Should do what you want.
 
Thankyou very much!!

It didn't like the $intItems = count($matches) bit for some reason yet if I put it inlike this ($row = 0; $row < (count($matches)); $row++) it works a charm.

Thanks again :D
 
Back
Top Bottom