xpath - getting php to display the tex content of a remote table cell

craptakular · 2 Feb 2011 at 21:15

Hi,

I'm trying to create a script that will pull in data from a remote html page, more accurately a cell within a table.

I have used the bbc Top prem goal scorers table for this example I am trying to echo the text "Nolan".

However, I get a this error with the script below:

Catchable fatal error: Object of class DOMNodeList could not be converted to string in C:\wamp\www\scrape.php on line 26

Now from googling, it seems that a 'DOMNodeList' cannot be displayed as a string.

I'm not sure how to proceed. Any ideas?

If I can fix this script, I'm wondering in it's current form does it execute every time the page loads? If it does how would I get it to only execute once every hour?
So, if I had 10 people on the page it would execute 10 times?

PHP:

    <?php

      $my_url = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html = file_get_contents($my_url);

      $dom = new DOMDocument();

      @$dom->loadHTML($html);

      $xpath = new DOMXPath($dom);

      $my_xpath_query = "/html/body[@id='body']/div[1]/div[@id='blq-container']/div[@id='blq-container-inner']/div[@id='blq-main']/div/table/tbody/tr/td[2]/table[1]/tbody/tr/td[1]/table[1]/tbody/tr[6]/td[1]";

$text = $xpath->query($my_xpath_query);

 
echo "$text";


      ?>

craptakular · 3 Feb 2011 at 07:17

Thank you so much!

PHP:

 foreach ($text as $node) 
 { 
    echo $node->nodeValue."<br/>"; 
 }

Your above example worked, but I couldn't get the bottom one to work.

Essentially in the future I may need to collect anything up to 30 table cells, all on different pages but if I created something like this:

PHP:

<?php

      $my_url = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html = file_get_contents($my_url);

      $dom = new DOMDocument();

      @$dom->loadHTML($html);

      $xpath = new DOMXPath($dom);

      $my_xpath_query = "//div[@id='blq-main']//table/tr[6]/td[1]"; 

$text = $xpath->query($my_xpath_query);

 
 foreach ($text as $node)
 {
    echo $node->nodeValue."<br/>";
 } 


//scrape number 2

$my_url1 = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html1 = file_get_contents($my_url);

      $dom1 = new DOMDocument();

      @$dom1->loadHTML($html);

      $xpath1 = new DOMXPath($dom);

      $my_xpath_query1 = "//div[@id='blq-main']//table/tr[6]/td[1]"; 

$text1 = $xpath1->query($my_xpath_query1);

 
 foreach ($text1 as $node1)
 {
    echo $node1->nodeValue."<br/>";
 } 



      ?>

Is this the most effiecient way?

I'll research about how to dump this into a database, thats my next objective hehe.

craptakular · 4 Feb 2011 at 09:55

Thanks very much, I'm really new to PHP, it's also my first computer language.

So I assume all I have to learn is how to create a database and get the script you posted to connect to it and put the data in the right field based off an id that I would setup when creating the database?

Essentially I'm trying to grab the same cell, which is on 30 different pages, so it's the same kind of info just different figures for a different type of product. I would then need to display 6 of these values on my homepage and also display each one on a separate page on my wordpress site.

What I think I need, and i'm more than likely wrong or not understanding, I need a script that purely writes to the database (checks every hours based on the timestamp column and only updates if the value for X has changed.), then mini scripts that will display one table cell from the mysql database.

craptakular · 4 Feb 2011 at 20:37

cron I know from my linux use, so at least thats one thing I can do lol.

Will follow that guide and I have bought some textbooks to help me get the basics.

I'll see if I can work out how to do some "my first" mysql databases and data entry tommorrow.

craptakular · 6 Feb 2011 at 12:49

I have made good progress in learning how to create a database, add a table and call a selected table, the tigaz guide is great!

However, I have a major headache with xpath, I can call data in from sources with no namespaces without any issue. However, my main source of info I will need to use when I build my proper script uses namespaces, I think this is why it doesn't work atm, the php script doesn't release there is namespaces because I haven't defined it!

If I have a namespace of:

Code:

http://www.w3.org/1999/xhtml      x

This is what Xpath Checker firefox plugin outputs:

Code:

http://www.w3.org/1999/xhtml      x


id('tdtestbox')/x:table/x:tbody/x:tr[2]/x:td[2]

As I am not defining the name space in php script I am convinced this is why I return no data. Any ideas?

craptakular · 6 Feb 2011 at 14:59

This page is exactly like the one I am having issue with:

http://www.skysports.com/

id('ss-content')/x:div[1]/x:div[1]/x:div[3]/x:ul[1]/x:li[2]/x:h4/x:a

It has all the funny x: stuff in...

If the xpath is "normal" (.i.e. no X: stuff) like the bbc example then it works perfectly, been at this since 9 am and my head is going to explode.

Thank you for still helping me!

craptakular · 6 Feb 2011 at 15:39

I still can't get it to work, did you get it working?

craptakular · 6 Feb 2011 at 16:49

http://php.net/manual/en/domxpath.registernamespace.php

The 3rd comment down on that link the guy is saying that namespaces need to be prefixed, xhtml loaded, which we have done. then we cannot use the function loadhtml. It simply will not ever work, maybe this is the reason?

craptakular · 6 Feb 2011 at 16:58

this is actually driving me insane now lol... wil;l try it, if i can't can I email your trust?

craptakular · 6 Feb 2011 at 17:07

How would you convert this?

id('tdboxDetails')/x:table/x:tbody/x:tr[2]/x:td[2]

If you get that working I'll send you over some beer money/ enough for a game

craptakular · 6 Feb 2011 at 17:37

I can't get it to work, going to fire off an email to you now!

EDIT: Sent one to your hotmail account.

craptakular · 6 Feb 2011 at 18:03

Pho said:
Hmm I haven't received anything. Can you send it to my gmail instead? (akiller@)

sent to your gmail

craptakular · 6 Feb 2011 at 19:07

emailed u back

craptakular · 7 Feb 2011 at 07:22

Well with the help of Pho we can now grab data from any site, except the one I need which only works on WAMP lol....

I thought php was platform independent?

craptakular · 7 Feb 2011 at 19:36

Getting these errors now:

Seems I need to tell the script what character encoding to use. Any ideas how to do this?

PHP:

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0x9C 0xAC 0xE8 0xAA in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: Attribute border redefined in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0x9C 0xAC 0xE8 0xAA in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: encoder errorAttValue: " expected in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: Couldn't find end of Start Tag a in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51
£0

PHP:

<?php

    Class Scrape
    {
        var $url;
        var $xpathQuery;
        var $xpathResults;

        function Scrape($url, $query)
        {
            $this->setURL($url);
            $this->setXpathQuery($query);
        }

        function getURL()
        {
            return $this->url;
        }

        function setURL($url)
        {
            $this->url = $url;
        }

        function getXpathQuery()
        {
            return $this->xpathQuery;
        }

        function setXpathQuery($query)
        {
            $this->xpathQuery= $query;
        }
        
        function getXpathResults()
        {
            return $this->xpathResults;
        }
        
        function setXpathResults($result)
        {
            $this->xpathResults = $result;
        }
        
        function execute()
        {                
            $html = file_get_contents($this->getURL());

            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);                
            $results = $xpath->query($this->getXpathQuery());
            $this->setXpathResults($results);
        }
    }

// Query site
    $scrape1 = new Scrape(' INSERT URL HERE' ,'//td[@id="tdtestbox"]//td[@class="SingleRowTableCell"][2]');
    $scrape1->execute();
   
    // Get result
    // (for some reason I get: Â£ 2,400,000 so we'll remove that bit next)
    $output =$scrape1->getXpathResults()->item(0)->nodeValue;
   
    // Remove everything but numbers from $jackpot
    $output = preg_replace("/\D/", "", $output);
   
    // Show the result
    echo '£'.number_format((double)($output));


    
    ?>

The page i'm attempting to scrape has this:

Code:

http-equiv="content-type" content="text/html; charset=windows-1255"