Brett_Tabke - 2:42 pm on Jan 12, 2011 (gmt 0)
> how can the challange be compared to the response?
- reCaptcha only uses real words in a dictionary
- google compares the answer to a real word.
- They use two words and only 1 must match most times.
- One of the words is known to have a very high percentage of correct human answers (easy to read)
- One of the words is not known. It is there to track answers.
- Any match can be off by 1 character.
- OCR comparisons are often exchanges of characters (1 for l, or o for 0, or I for l). You can compare them and assume that 1ike is actually like.
- Semantical comparison. Syntax analysis would be a good verification. Plug the word back into the sentence it came from and see if it is accurate 1.
After the word is shown to XX number of people, a comparison of the answers is done. If 91% think it the word is Y and the rest of the answers don't match or are ambiguous, then it is a safe bet that the word is Y and that it is a real word. If they get a bad word - then they can send it to a human editor for editing.
1 aside: think about all the text that the Google machine has seen through the book scanning system. Think about how that could be poured back into - oh say - a search engine. Semantic analysis, quality of verbiage, human vs machine generated....whew. that will bake your noodle for awhile.