CAPTCHAs

2009-09-17

in Geek stuff, Internet matters, Security

Salad at Zen Garden, Ottawa

Like many web users, I am of two minds about Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHAs). On the one hand, I see their importance in fighting several types of spam. In particular, they are an important defence against the spam blogs that have become so prevalent recently. These sites are set up based on a high-value keyword. They then trawl through real blogs, copy content, and put it up. To Google, this looks like a real blog specializing in that keyword. People find it through Google searches, and sometimes end up clicking the ads that are invariably strewn across these robot-created sites.

When it comes to creating new blogs and email accounts, I find CAPTCHAs entirely reasonable.

Where I object is with more mundane uses, such as vetting comments on blogs. Using a CAPTCHA can seriously annoy readers: especially those who have poor vision, or who are using browser add-ons like NoScript for extra security. To me, when a blog owner chooses CAPTCHAs as a security feature, they are saying that they are happy to waste the time of all of their commenters, rather than invest a bit of their own setting up a spam filtering system and occasionally checking for false positives and false negatives. If your blog gets 5,000 comments a day, you have a good excuse. If it gets less than 20, it really seems like a combination of Akismet and some .htaccess rules should be just fine.

reCAPTCHA (which Google recently purchased) has at least two redeeming features. For one, it does useful work. Unlike most CAPTCHAs, which simply garble text for users to decipher, reCAPTCHA uses text from real documents being scanned. It gives users two words to decipher: one known word to perform the CAPTCHA function, and one unknown word for use in digitizing the book. This leads directly to the second good feature: since these books have already been scanned by the best optical character recognition (OCR) software available, they are fundamentally protected against automated CAPTCHA attacks. Of course, you can always pay real people a small fee for solving the puzzles. reCAPTCHA is thus a relatively robust system, against automated attack, with the additional benefit of adding to the sum of useful digitized information.

Hopefully, future CAPTCHA systems will be less annoying for users and more difficult for computers to game. Experimental forms have included tasks like picking out only kittens from photos showing a number of types of animals. This is apparently a task that is easy for humans, but quite beyond the capability of automatic image recognition software.

Personally, I prefer to think of them as Computer Automated Person Checking Algorithms. It lacks the Turing shout-out, but is more concise and comprehensible.

Report a typo or inaccuracy

{ 4 comments… read them below or add one }

R.K. September 19, 2009 at 3:06 pm

Another possible advantage of reCAPTCHA is that the people working to foil it are actually doing useful work – they are building the better book scanning software of the future.

Hacks don’t usually produce broader societal benefits.

. August 9, 2010 at 10:10 am

ReCAPTCHA.net Now Vulnerable to Algorithmic Attack

“reCAPTCHA.net algorithms have been developed to solve the current CAPTCHA at an efficacy of 30%. The algorithms were disclosed at DEFCON 18 over the weekend and have since been made available online. Also available is a video demonstration of random reCAPTCHA.net CAPTCHAs being subjected to the algorithms.” There’s probably an excellent Firefox plugin to render this page’s color scheme more bearable. Note: the PowerPoint presentation linked opens fine in OpenOffice, and the video speaks for itself.

. October 26, 2011 at 7:47 pm

Dr Yan’s group looked at a popular CAPTCHA technique known as “crowding characters together” (CCT) in which letters simply overlap. CCTs were considered a hard computer science problem, and no algorithm had yet been capable of disentagling the twists and skews of layered text, whereas the human visual cortex performs the task swiftly. The team’s method can pick out the telltale holes in letters like “a” or “p”, the vertical dashes in “t” and “f” or dots in “i” or “j”. It also captures letters like “s” with three horizontal segments on top of each other (and distinguishes these from “e” or “a”, which have a similar property, by dismissing characters where lines intersect). Their assorted techniques recognise anywhere between half and nearly all letters and numbers, depending on the particular CAPTCHA algorithm in use.

Will S. February 1, 2013 at 12:15 am

“Hopefully, future CAPTCHA systems will be less annoying for users and more difficult for computers to game. ”

Nope

More annoying for users; and also more less difficult to game, given increases in computer power

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Previous post:

Next post: