OCR, CAPTCHAs and auto-form completion

Soldato
Joined
12 Dec 2003
Posts
8,141
Location
East Sussex
Hey guys,

I've very recently undertaken a research project to develop CAPTCHA technology. I'm curious if anyone could shed some light on the attacker's prospective. What software do they use to carry out the character recognition process? Do they just feed it into something like FineReader or is there something else they use? At this moment in time I'm not too concerned about how the whole process works; I'm interested in how a CAPTCHA image is automatically processed into a string of text that can be passed to the form. Specific program names and links would be brilliant. Email me at sniffy 15 at hotmail couk if the content is slightly on the dodgy side :) I am searching but getting a lot of garbage. I'm hoping someone has experience with this so can point me in the right direction.

I'd be interested to hear anything on the subject even if it isn't strictly related to my OP so please post :)
 
Last edited:
Oh man, that's amazing. Thank you so much :)

I've been having a play with FineReader. Damn this thing is smart :p
 
Indeed, that has come up in my initial research. Their poor English skills is a possible angle of attack but I haven't thought up anything decent. I was thinking of obtaining a collection of short English sentences, removing a word that anyone reasonably competent at English could work out then asking for the word. There's so many problems with that though. Firstly I can't think of a sensible way to write an algorithm that determines what word should be removed. It needs to be somewhat random but ultimately a sensible choice that won't have many replacements. Secondly I can't think of a way to obtain a collection of short English sentences that attackers won't have access to. Thirdly the variety of words in the English language might make the tests quite hard for legitimate users which is a big no-no. It's a bitch :/
 
They use a similar one in the zen cart forums, which asks for specific information, such as
What is 2+2?
Type the third word in this sentence

etc.
There are very very few variations though!
 
The problem with those CAPTCHA implementations is that they aren't randomly generated. More effort needs to be put into thinking up sensible questions than collecting a dictionary of answers that can cheaply sourced from cheap labour.

Its success in its lack of success, so to speak :) In other words the fact it isn't used by many sites is the only reason it currently works.
 
Back
Top Bottom