xpath - getting php to display the tex content of a remote table cell

Soldato
Joined
7 Jan 2007
Posts
10,607
Location
Sussex, UK
Hi,

I'm trying to create a script that will pull in data from a remote html page, more accurately a cell within a table.

I have used the bbc Top prem goal scorers table for this example I am trying to echo the text "Nolan".

However, I get a this error with the script below:

Catchable fatal error: Object of class DOMNodeList could not be converted to string in C:\wamp\www\scrape.php on line 26

Now from googling, it seems that a 'DOMNodeList' cannot be displayed as a string.

I'm not sure how to proceed. Any ideas?

If I can fix this script, I'm wondering in it's current form does it execute every time the page loads? If it does how would I get it to only execute once every hour?
So, if I had 10 people on the page it would execute 10 times?


PHP:
    <?php

      $my_url = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html = file_get_contents($my_url);

      $dom = new DOMDocument();

      @$dom->loadHTML($html);

      $xpath = new DOMXPath($dom);

      $my_xpath_query = "/html/body[@id='body']/div[1]/div[@id='blq-container']/div[@id='blq-container-inner']/div[@id='blq-main']/div/table/tbody/tr/td[2]/table[1]/tbody/tr/td[1]/table[1]/tbody/tr[6]/td[1]";

$text = $xpath->query($my_xpath_query);

 
echo "$text";


      ?>
 
Thank you so much!

PHP:
 foreach ($text as $node) 
 { 
    echo $node->nodeValue."<br/>"; 
 }
Your above example worked, but I couldn't get the bottom one to work.

Essentially in the future I may need to collect anything up to 30 table cells, all on different pages but if I created something like this:

PHP:
<?php

      $my_url = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html = file_get_contents($my_url);

      $dom = new DOMDocument();

      @$dom->loadHTML($html);

      $xpath = new DOMXPath($dom);

      $my_xpath_query = "//div[@id='blq-main']//table/tr[6]/td[1]"; 

$text = $xpath->query($my_xpath_query);

 
 foreach ($text as $node)
 {
    echo $node->nodeValue."<br/>";
 } 


//scrape number 2

$my_url1 = 'http://news.bbc.co.uk/sport1/hi/football/eng_prem/top_scorers/default.stm';

      $html1 = file_get_contents($my_url);

      $dom1 = new DOMDocument();

      @$dom1->loadHTML($html);

      $xpath1 = new DOMXPath($dom);

      $my_xpath_query1 = "//div[@id='blq-main']//table/tr[6]/td[1]"; 

$text1 = $xpath1->query($my_xpath_query1);

 
 foreach ($text1 as $node1)
 {
    echo $node1->nodeValue."<br/>";
 } 



      ?>
Is this the most effiecient way?

I'll research about how to dump this into a database, thats my next objective hehe.
 
Thanks very much, I'm really new to PHP, it's also my first computer language.

So I assume all I have to learn is how to create a database and get the script you posted to connect to it and put the data in the right field based off an id that I would setup when creating the database?

Essentially I'm trying to grab the same cell, which is on 30 different pages, so it's the same kind of info just different figures for a different type of product. I would then need to display 6 of these values on my homepage and also display each one on a separate page on my wordpress site.


What I think I need, and i'm more than likely wrong or not understanding, I need a script that purely writes to the database (checks every hours based on the timestamp column and only updates if the value for X has changed.), then mini scripts that will display one table cell from the mysql database.
 
Last edited:
cron I know from my linux use, so at least thats one thing I can do lol.

Will follow that guide and I have bought some textbooks to help me get the basics.

I'll see if I can work out how to do some "my first" mysql databases and data entry tommorrow.
 
I have made good progress in learning how to create a database, add a table and call a selected table, the tigaz guide is great!

However, I have a major headache with xpath, I can call data in from sources with no namespaces without any issue. However, my main source of info I will need to use when I build my proper script uses namespaces, I think this is why it doesn't work atm, the php script doesn't release there is namespaces because I haven't defined it!

If I have a namespace of:

Code:
http://www.w3.org/1999/xhtml      x

This is what Xpath Checker firefox plugin outputs:

Code:
http://www.w3.org/1999/xhtml      x


id('tdtestbox')/x:table/x:tbody/x:tr[2]/x:td[2]

As I am not defining the name space in php script I am convinced this is why I return no data. Any ideas?
 
This page is exactly like the one I am having issue with:

http://www.skysports.com/

id('ss-content')/x:div[1]/x:div[1]/x:div[3]/x:ul[1]/x:li[2]/x:h4/x:a

It has all the funny x: stuff in...

If the xpath is "normal" (.i.e. no X: stuff) like the bbc example then it works perfectly, been at this since 9 am and my head is going to explode.

Thank you for still helping me!
 
How would you convert this?

id('tdboxDetails')/x:table/x:tbody/x:tr[2]/x:td[2]

If you get that working I'll send you over some beer money/ enough for a game :p
 
Well with the help of Pho we can now grab data from any site, except the one I need which only works on WAMP lol....

I thought php was platform independent? :p
 
Getting these errors now:

Seems I need to tell the script what character encoding to use. Any ideas how to do this?

PHP:
Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0x9C 0xAC 0xE8 0xAA in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 2 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: Attribute border redefined in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: input conversion failed due to input error, bytes 0x9C 0xAC 0xE8 0xAA in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: encoder errorAttValue: " expected in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51

Warning:  DOMDocument::loadHTML() [domdocument.loadhtml]: Couldn't find end of Start Tag a in Entity, line: 16 in /home/freeonli/public_html/oxygrab_test1.php on line 51
£0
PHP:
<?php

    Class Scrape
    {
        var $url;
        var $xpathQuery;
        var $xpathResults;

        function Scrape($url, $query)
        {
            $this->setURL($url);
            $this->setXpathQuery($query);
        }

        function getURL()
        {
            return $this->url;
        }

        function setURL($url)
        {
            $this->url = $url;
        }

        function getXpathQuery()
        {
            return $this->xpathQuery;
        }

        function setXpathQuery($query)
        {
            $this->xpathQuery= $query;
        }
        
        function getXpathResults()
        {
            return $this->xpathResults;
        }
        
        function setXpathResults($result)
        {
            $this->xpathResults = $result;
        }
        
        function execute()
        {                
            $html = file_get_contents($this->getURL());

            $dom = new DOMDocument();
            @$dom->loadHTML($html);
            $xpath = new DOMXPath($dom);                
            $results = $xpath->query($this->getXpathQuery());
            $this->setXpathResults($results);
        }
    }

// Query site
    $scrape1 = new Scrape(' INSERT URL HERE' ,'//td[@id="tdtestbox"]//td[@class="SingleRowTableCell"][2]');
    $scrape1->execute();
   
    // Get result
    // (for some reason I get: £ 2,400,000 so we'll remove that bit next)
    $output =$scrape1->getXpathResults()->item(0)->nodeValue;
   
    // Remove everything but numbers from $jackpot
    $output = preg_replace("/\D/", "", $output);
   
    // Show the result
    echo '£'.number_format((double)($output));


    
    ?>
The page i'm attempting to scrape has this:

Code:
http-equiv="content-type" content="text/html; charset=windows-1255"
 
Back
Top Bottom