PHP cURL POST - Web Scraper

Soldato
Joined
26 Nov 2003
Posts
4,656
Location
Brentwood
Creating a web scraper at work for a database we don't have direct access to but we can get the data in HTML.

Problem is its behind a login, I am assuming I can send premeditated POST data (username and password) into tricking the ASP script into letting me have a cookie.

Then, PHP would have to show the cookie, request a certain page and scrape the data into a MySQL table.

Never really looked into cURL and if I am honest, I am only a PHP novice. I've looked at Wez Furlongs guide, but I think I need something more basic ;)

Any pointers? (Sorry if this is really simple, its late on a friday and I am on a coffee high.)
 
I guess I don't have to use PHP, but its the language I know.
I'll have a look at that python library when I get on a desktop.
 
Thanks RobH, I am having a play with cURL in Terminal =)

Here is what I've stolen so far.

Code:
<?php
// INIT CURL
$ch = curl_init();

// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, 'http://quote.ashwyk.com/pricing/login.asp');

// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);

// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'Username=NotOn&Password=YourNelly');

// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');

# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

// EXECUTE 1st REQUEST (FORM LOGIN)
$store = curl_exec ($ch);

// SET FILE TO DOWNLOAD
curl_setopt($ch, CURLOPT_URL, 'http://quote.ashwyk.com/pricing/admin/quotes_reporting.asp');

// EXECUTE 2nd REQUEST (FILE DOWNLOAD)
$content = curl_exec ($ch);

// CLOSE CURL
curl_close ($ch); 

echo $content; 
?>

Which I get back:

Code:
<head><title>Object moved</title></head>
<body><h1>Object Moved</h1>This object may be found <a HREF="./">here</a>.</body>

This is regardless of the login and password are correct or not.

What I want it to do,

Login & get cookie, go to /admin/quotes_reporting.asp, give me html. :D

Any ideas? I think its not posting correctly.

Edit: thinking about it, is it not following a redirect?
 
Last edited:
Code:
<?php
// INIT CURL
$ch = curl_init();

// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, 'http://quote.ashwyk.com/pricing/login.asp');

// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);

// ENABLE REDIRECT FOLLOW
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);

// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'Username=NotOn&Password=YourNelly');

// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');

# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

// EXECUTE 1st REQUEST (FORM LOGIN)
$store = curl_exec ($ch);

// SET FILE TO DOWNLOAD
curl_setopt($ch, CURLOPT_URL, 'http://quote.ashwyk.com/pricing/admin/quotes_reporting.asp');

// EXECUTE 2nd REQUEST (FILE DOWNLOAD)
$content = curl_exec ($ch);

// CLOSE CURL
curl_close ($ch); 

echo $content; 
?>


Strange.
 
Last edited:
Blargh, the cookie seems to hold a plan text Username & password in too which isn't being saved by cURL. Would this be the issue?
 
Back
Top Bottom