Web scraping?

Associate
Joined
11 Mar 2009
Posts
1,061
Hi there, I need to pull off some data on a web page which uses a username and password as well as a few drop down boxes as navigation.

Would anyone know if there was a free program i could use to get the data from my website programmatically?

any ideas, hints or suggestions would be most welcome. thanks!!
 
depending on the language, you could use a http request to pull the page/site then i suppose a quick way to pull something out of it would be using string manipulation/reg expression to find the 'container' of whatever part and pulling that into a variable to place on your site.
for example, if you have a news scroller in a DIV called 'latestNews' then you could look for the start and end of that DIV tag.
 
sorry guys, i'm not that clued up on such things. I'm hoping that there would be a free programme i could use to get the data i want.. :(
 
I would jump for Perl and Mechanize myself......its quite easy to pick up.

If you are still looking for help later, feel free to send the page to my trust email - if I have a spare 10 minutes I will have a look to see how easy it would be to Perl/Mech.
 
im sure you'll be able to get something to help you along in doing what you want but you'll deffo have to program something to get results.
hard to say for certain though without seeing what you want to scrape and what language you're site is coded in.
 
Use curl and either Regex or tidy/xpath in PHP.

You could use the software my company makes, but it's probably out of your price range :)
 
Easiest way IMO would be to use PHP and cURL. :)

This is the way to go. I taught myself PHP so that I could scrape prices off the Betfair website. It was an interesting journey.

If you don't feel up to the task, you could always go on a site like Freelancer dot com and put up a job description. I'm sure you'd quickly find some bloke in India who would do a decent job for peanuts.
 
Sounds like a good solution. cURL is just a method of doing (amongst other things) a HTTP Request isn't it?

You're pulling a webpage from a URL into a text string then searching and lifting the particular part you want?

If so, you can do the same thing in pretty much any language so it all depends on what your setup is.
 
Sorry for the hijack but is it possible to scrape a number of websites and then show the results on a summary page for each product?
 
PHP and cURL would be a good solution as others have suggested.

I have a script which pulls specific product data from a page and stores it in a MySQL database which is then exported to Excel. The hardest part is stripping the data you want from a page, especially if it's messy.
 
PHP and cURL would be a good solution as others have suggested.

I have a script which pulls specific product data from a page and stores it in a MySQL database which is then exported to Excel. The hardest part is stripping the data you want from a page, especially if it's messy.

That's where tidy & xpath come in :)
 
Back
Top Bottom