Web Scrape Help!

Soldato
Joined
18 Oct 2002
Posts
5,586
Location
Stone, Staffordshire
I'm looking for some assistance in scraping the content off a particualr website so that I can create a mysql database of information.

Anyone got any good guides or who fancies offering some assistance in doing this?
 
What I'd do in php is use get_file_contents() to get the page then use strpos() to locate what you want to chop out then use substr to plonk it into a variable for you to put in the db.

There's probably better ways....
 
That vaguely sounds like a plan but I have no experience at all on this.

Are there any guides or tutorials that you can recommend?

Maybe the easiest thing will be to pay someone to knock this out for me?
 
Yeah, you can try to pick it up, you'd need to learn how to access and write to a db (how to create a db, tables etc as well). How to deal with variables and how to use the functions I've mentioned. I doubt there's one tutorial that would cover the bases, you'd need to combine a few I think.

If you have no php (or other language) experience it would probably not be worth the hassle versus paying someone to do it. Depending on the page being scraped and what you want to get out of it I doubt it would take very long for someone who knows what they're doing.
 
Last edited:
This is the kind of thing that I am looking to scrape.

I'd like to grab each individual product page from say this category http://www.warehouseexpress.com/category/basecategory.aspx?cat03=3036 and then read write to a db all of the details including the tabs for stuff like images / reviews / specification etc. I am especially interested in grabbing th individual specification elements.

When scraping images do you just grab the URL or could you grab the image itself?

Does that sound like a difficult job?
 
You might want to look at scraping with XPATH as it's much easier to access specific page elements. It treats the page as an XML document and you can then use XML queries to get specific data. You can even use the FireBug addon in FireFox to assist you with writing the XPATH queries.

Just google "XPATH scraping" and the programming language you are using (eg: PHP XPATH scraping).
 
There's a class file called snoopy.php in the wordpress functions folder that has a lot of useful stuff for content scraping.
 
Back
Top Bottom