php: get content of another website onto mine

Joined
12 Feb 2006
Posts
17,605
Location
Surrey
is it possible to have php go to another website, e.g. this one, and get specific parts of it like get the titles of a=the latest 10 threads on the HG&P section?

i ask as i visit many forums but only sections of them and would be easier to have a site that just lists like an rss feed the certain parts of the forum i want, so perhaps list the titles of all the latest 10 threads, and then at another website do the same thing for another part of theres.

I have no clue how you'd even think about doing this, or even if it is a legal thing, copyright etc, though google do a similiarish thing so maybe it's ok. at a guess you'd get php to go to an address and find the div with a specifid name and then take first 10 if they are there or something.



thanks
 
Last edited:
Yes it's possible, it's called screenscraping. So long as you're reasonable about it (i.e. cache intelligently so you don't hammer the page with a request a every few seconds) site owners should neither notice nor care.

Be prepared to have a strong knowledge of regex, though.
 
See http://en.wikipedia.org/wiki/Web_scraping

Basically you'll need to grab the content of each url using (one of) file_get_contents() / CURL / HTTP_Request PEAR library, then build code to parse the bits you want out of the result.

Depending on the site it could be fairly easy or a real ballache, just depends how regular the code is and how reliable you want it to be.

Obviously this is the kind of thing RSS was built for, but not all sites provide that kind of resource.

You could always go the cheat-route and load the pages in iFrames ;)
 
Be prepared to have a strong knowledge of regex, though.

could just use the DOM traversal stuff to get page contents. In my experience, PHP's regex functions slow right down and sometimes fail when they match large amounts of data
 
Back
Top Bottom