Most are familiar with the captcha system know as reCAPTCHA. It’s slogan “Stop Spam, Read Books” implies it’s purpose, which is to have human minds “OCR” text from books which have been damaged and are in poor condition.
OCR stands for Optical Character Recognition. It’s the process of software recognising letters, numbers and words and outputting them as digital text. The process is tricky though, and unless the text is in pristine condition, software based OCR can fail. By contrast, the human mind is excellent at recognising things. Our brain continually attempts to make sense of the visual stimulus it receives. Which is why optical illusions work.
reCAPTCHA offers a word of damaged text to users to solve. The common question here, is how does it know if we got the word right? The system always displays two words, this is the crucial part. One of the words (the one on the left) is computer generated and is pre-known. The word on the right is the unknown text to be recovered.
This method works by the theory that if you get the first word, the known word correct, then it can trust your response for the second word too. The word you recovered will then be added to the digital reproductions being compiled this way.
There are other ways the system makes sure your response can be trusted, but the company has not divulged these secrets.
It’s important that they stay secret and that the method alters from time to time to prevent people becoming aware and only solving the first word, then typing garbage for the second word.
It’s a good guess that the system shows the same captcha to more than one person, if they got the first word right, it’ll go into a “best of 3″ situation where 3 accepted responses are compared and if at least two of the second word responses match, then that word will be used in the recovery process, if none match, it’ll go back into the queue for another try.
People have noted recently that the reCAPTCHA system seems to be getting much harder to solve. With increasingly corrupt words to solve. The two theories behind this are a) The puzzles were being solved by software, so they’ve increased the complexity of the display word b) The theory mentioned above where at least 2 of the 3 response for that word didn’t match are being re-served for another attempt, causing an increasing number of “difficult” captcha puzzles.
The reCAPTCHA system had to offer a method to allow blind users to pass the puzzles, so blind users, with the aid of screen reader software, can click an audio button to be played a corrupt, distorted audio version of the puzzle.
For more information about the now Google owned reCAPTCHA, visit reCAPTCHA.