What is wrong with my CAPTCHA?

Adz · 30 May 2006 at 09:46

I've had to implement one on ochostreview.co.uk to try to combat the spam. I've adapted someone else's script which, to me, looks quite secure but the spammers are still getting through! Surely this can't actually be a human? You can see the posts at the bottom of the hosts list.

Can anyone tell me where I'm going wrong or give me any pointers? I'm convinced it's a bot as the spam arrives every day without fail at roughly the same time.

Dj_Jestar · 30 May 2006 at 10:07

Often it is a flaw in the code/logic and not the bot being able to read the captcha.

Post your code for the captcha and we should be able to help

Adz · 30 May 2006 at 10:16

There's far too much code to post but I'm positive it's not a flaw in the logic. I've been writing this kind of stuff for many years, including some professional projects

.

I've checked, double checked and triple checked. The secret 'word' is completely random and is stored in the session data so it's not possible that it is being read in any way other than via the captcha image.

Dj_Jestar · 30 May 2006 at 10:21

I hate to be captain obvious/devil's advocate, but clearly there is an error somewhere - be it the spam bots are reading the image or an error in the logic

Perhaps is overly complicated and there is an error somewhere? CAPTCHA's don't need much code..

Without any insight into the code, there's not a lot we can help with.

Adz · 30 May 2006 at 10:29

Actually I must admit I hadn't considered the code for the captcha itself, only my own which is far too simplistic to have any kind of bug in it...

Here it is, largely borrowed from a free open source script with a few obvious modifications:

http://ochostreview.co.uk/image.phps

Thanks for your help

Beansprout · 30 May 2006 at 10:44

What GET/POST requests are the bots passing? Please don't tell me you've got regsiter_globals on

Edit: And if it's a fault with the script blame me, I pointed you to it

Adz · 30 May 2006 at 10:51

That's a good point actually, I wasn't logging the form POSTs. I am now

.

Dj_Jestar · 30 May 2006 at 10:53

Unfortunately I don't have a php ready machine at hand, so can't thoroughly test, but a couple things spring to mind.

Does your site have any indication (in normal text, headers, keywords/meta's, get/post variables with common-to-puremango-captcha names)?

If it does, there may be a common flaw that is being used by the bots.

The other thing that springs to mind is:

Code:

session_id($_GET['session']);
session_start();

Now, the session ID.. is it always in the GET vars, and is it always named 'session'? If it's not, the secret 'word' will be blank.. thus allowing anyone to 'successfully' pass the captcha as a blank hash will equate to true when compared to a blank hash

Adz · 30 May 2006 at 11:08

Variables names are all non-descript. Passing the session ID using GET is only for the image display - if the image doesn't display, it doesn't matter, the code is still in place in the main script so no submissions would be accepted.

Edit: Note that I've bodged the image script and set the string to $_SESSION['random'] after the dictionary checks. That variable is set within the main site.

Dj_Jestar · 30 May 2006 at 11:30

Without seeing at least pseudo (if you are worried about someone leeching the code..) of how your logic goes through the captcha process, we can't help.

If you're that confident it works just fine and is secure, and 'know' it must be OCR that is defeating the captcha, then you'll need to either 'improve' the image, or use a different method altogether to prevent the spam.

Adz · 30 May 2006 at 11:45

I'm dubious about posting the entire code because it's, frankly, crap. I knocked it up on a Sunday afternoon after a trip to the pub

. There seems to be a few people on this forum with a grudge against particular companies on the list (namely Clook and Register1) who I'm sure would love to exploit my dodgy code to air their grievances.

Literally all the captcha related code does is...

If ($_SESSION['random']) is not set, generate a random string (letters only) and set it as $_SESSION['random'].

Echo <img src='/image.php?session=<?=session_id()?>' blah />

Then when the form is posted back, compare the code sent to $_SESSION['random']. If they match, accept the submission.

lookitsjonno · 30 May 2006 at 12:04

do you kill $_SESSION after each individual post?

if you dont then a bot could just use the first capatcha over and over again.

Dj_Jestar · 30 May 2006 at 12:16

jonno.co.uk said:
do you kill $_SESSION after each individual post?

if you dont then a bot could just use the first capatcha over and over again.

^ Would have been my next suggestion

Adz · 30 May 2006 at 12:21

I don't and that had occurred to me, I'll change that, but it doesn't explain how they got the first one... There doesn't appear to be any sign of a brute force attack.

lookitsjonno · 30 May 2006 at 13:02

maybe it was first filled in manually (by a human) and since then they have just kept the session alive by sending a request every X minutes?

toastyman · 30 May 2006 at 13:03

Why don't you log the IP to see if all the attacks are coming from one place? Then you could block that IP accordingly as a deterrant...

Adz · 30 May 2006 at 13:08

toastyman said:
Why don't you log the IP to see if all the attacks are coming from one place? Then you could block that IP accordingly as a deterrant...

They're coming from random addresses - they've obviously got some kind of network of drone machines. It's not even good spam

.

I'm now killing the random string after every successful submission. We'll see if that helps

.

Sic · 30 May 2006 at 13:21

toastyman said:
Why don't you log the IP to see if all the attacks are coming from one place? Then you could block that IP accordingly as a deterrant...

i tried that on mine. there's little point as they're more than likely coming from dynamic IPs. grr. good luck Adz - i'd be interested in taking a look at your code if/when you get it sorted...i'm after a CAPTCHA that works, as mine doesnt

Beansprout · 30 May 2006 at 13:26

Adz said:
I don't and that had occurred to me, I'll change that, but it doesn't explain how they got the first one... There doesn't appear to be any sign of a brute force attack.

Check for hits to the image from other sites - what they commonly do is throw the images onto other free spammy websites and get the visitors to fill it out.

Adz · 31 May 2006 at 13:17

I'm sure you'll all be very amused to hear that it was my dodgy code at fault

.

What was happening was that I was initialising the random string the first time someone viewed the form. However, our spammers weren't viewing the form, they were making a single POST request. This launched the script straight into the form processing function *before* showing the form and initialising the variable thus the random string was still blank. As someone pointed out earlier, blank when compared to blank will always match

.

I'm such a noob

.

Thanks everyone

.