Forum Moderators: goodroi
Google has acquired reCAPTCHA, a company that provides CAPTCHAs to help protect more than 100,000 websites from spam and fraud.
Another good one is to require a person to enter a specific word that can be found by clicking on a link.
[edited by: sgietz at 8:39 pm (utc) on Sep. 16, 2009]
well, today it's a kind of greyish white. and tonight, it'll be black with little white dots. A good captcha question is one with only one right answer ;)
sgietz, I kind of agree, but it depends how sensitive an app is to non-human behaviour. I could reload your page repeatedly until it shows the "sky color" question again. Being able to recognize a string and respond with a corresponding answer is not a turing test.
But that's going off-topic into a discussion that has come up many [webmasterworld.com] times [webmasterworld.com] before [webmasterworld.com].
I'm curious what Google plans to do with Recaptcha. It was designed to digitize books - are they going to leave it as is, or will they enhance it? CAPTCHAs are often used on pages that contain sensitive information. Do you want a hotlinked script to Google on the same page as a registration form? Is it really a beacon that they'll use to sniff out URLs and analytics? Will they keep their noses out of the DOM?
And oh oh oh just wait: next month you're all going to get ads showing in your captcha box. ba-dum-pshh! thank you thank you
It's not perfect, but image captchas are pretty weak these days, and the simple question approach may in fact be better.
A funny captcha I saw was a complex equation on a math forum. I guess that weeds out some bots (and most humans) :)
...words come from scanned archival newspapers and old books. Computers find it hard to recognize these words because the ink and paper have degraded over time, but by typing them in as a CAPTCHA, crowds teach computers to read the scanned text.
That is so cool. So, as people solve CAPTCHA's they are also training the software to solve the CAPCTHA's ...
but wait, that means that eventually the software will be able to solve all the CAPTCHA's .oO
hmmm, better start working on the next thing now =)
Would Google be getting access to that many more sites? Probably already hooked into them through one tool or another already.
Much more powerful is simply having a Google account and being logged in. Might as well let them install a keystroke logger. That's about all that's missing.
Live tracking of website traffic is one piece of the puzzle. Live tracking of specific users' movements another. Pretty good one-two KO.
Wonder how long it'll be until they take this beyond text captchas and use the newly available human computing power to improve Google Image Search through integrating something like this [images.google.com...] with reCAPTCHA.
von Ahn was, I believe, the inventor of the ESP Game, which is licensed by Google and is in fact used as the core of Google Image Labeler... so it's already integrated.
...newly available human computing power...
I'm guessing that von Ahn's expertise in this area is a good part of the reason that Google bought the company. von Ahn gave a Google TechTalk in July, 2006, on the subject of "Human Computation". It's available on Google Video....
Human Computation [video.google.com]
Google Video
51:31 TRT
The video struck me at the time I saw it as providing lots of clues about how Google looks at things...
anyone played Google Image Labeler game yet?
[webmasterworld.com...]
...I think that many of the principles of identification involved are being used by Google in lots of areas... in everything from Google Co-Op and Custom Search Engines to Google's assessment of link anchor text....
The next move will be to move the recaptcha server name to recaptcha.google.com
The implication of this of course is that it creates a google domain cookie so that it can serve as a super cookie.
The readers here will recognise that this is the way cookies are designed to work. As long as they mark it with /, then the one cookie can be used to track across search, gmail, adsense, adwords, and any page featuring a recaptcha
With the right javascript, it will be possible to identify down to which comment belongs to which cookie. It is then trivial to scrape the comment, and the sitename/email address attributed to that comment and associate them with a particular cookie and machine. Come to think of it, javascript is not required if the page is scraped often enough to note differences over time correlated with issuance of the recaptcha + super cookie.
If you then login to your adwords, adsense, gmail account from another machine, the correlation can be updated.
See how deep this goes?
All blogs that use recaptcha will see some reduction in postings as privacy aware visitors decide not to comment. They may even see a decline in traffic from people who just don't want to be indexed as visitors.
The drop might be tiny, but it would be nice if it was more like falling off a cliff.
Remember the google cookie expiry in 2037?
How about what color is the blue sky?
Actually, the sky is colorless. Its the refraction and scatternig of the blue light that creates the illusion that the sky is blue. You may test it out at night.
If I ever have to answer a CAPTCHA with an answer like that I'm leaving the internet...
But on topic - I got a feeling this will be more closely related to identifying images rather than Google spying on everyone who completes a CAPTCHA, unless they place ads within CAPTCHA's!
potentially a lot, or nothing. They're worth nothing themselves, but they can be worth something as a means to accomplish something else.
like say, using a script to open ten thousand hotmail accounts to use as spam delivery agents. How much money is in it depends on what they're selling, how much they're spamming, how well people respond to the mailout.
I suppose someone might crack a CAPTCHA to do some vandalism, but that's an odd scenario
Google scan a lot of books as part of Google Books, a lot of the books they scan are very old, and will have damage, perhaps the work of the Recaptcha project will be of assistance to improve the algos that drive their OCR technology.
There is however a lot of other theories. Being able to track users is one of them. One thing that might be very importaint is the ability to follow a user through a captcha code and see what they do afterwards. Say a user is posting a comment on a blog, they have to get past the captcha first. By being able to work out what user places what comments Google will be able to work out quite a lot about the user and possibly ue this data to display more personalised advertising.
Mack.
How much money is there in these spam registrations?
I don't know whether this is about scanning books or tracking users, but there's another possibility: using the captcha to ask "is this spam?", or "is this true?" in the case of a statement. Something like Microsoft's Page Hunt, except cut down to a captcha size.
Google's idea of rotating images is interesting, but completely absurd. I won't put my visitors through that. I think if we all sat down and really thought about it, we would come up with ingenious (and ridiculously simple) ways to filter out bots.