PHP Sessions & Crawlers

Soldato
Joined
27 Dec 2005
Posts
17,316
Location
Bristol
Basically I'm having troubles with a PHP script I've made that logs users visits - their IP, the pages they view, date, etc.

When a user first comes on the site it assigns them a random md5 code - it then uses this to track their steps throughout the site. However a fair amount (maybe 30-50%) of visitors seem to be unable to store this PHP sess code, and as a result each page they view is entered into a different database row instead of "a -> b -> c" (the output tells me if the visitor has been on before and what their visitor id was).

Now I've tested it in everything - Firefox, IE back to v6, Safari and even Opera Mini on my mobile and they all get logged correctly. So my question is is there a problem with my code or is what I'm experiencing web trawlers? My un-educated guess is yes, mainly based on the times they're visiting are unusual times for human activity.

And if I'm right is there any way to not log crawlers (as Google Analytics is seemingly able to do), or is that way too advanced and I need to change my script to base itself more on IP and not session id's?
 
And if I'm right is there any way to not log crawlers (as Google Analytics is seemingly able to do), or is that way too advanced and I need to change my script to base itself more on IP and not session id's?
Analytics doesn't log spider activity because the logging script is fired using javascript - which isn't run by 99% of spiders and bots. Mint, and a number of other big stats tools also work on this basis. As such it's a more 'true' representation of real activity on your site than server-side logging. And since JS runs client-side, it's a lot more revealing about users - browser, OS, screen/window size etc.

As to the PHP issue, you might want to look the way sessions are being persisted on the client side. It may be an issue with session.use_cookies and /or session.use_only_cookies (http://uk3.php.net/manual/en/session.configuration.php) i.e. some clients might not be accepting cookies, causing the session to timeout and having to be reissued.
 
Back
Top Bottom