Google ReCaptcha v3: invisible data slurping shark

Once again Google wants to slurp even more of your visitor data, yet again under the guise of a free valuable service...
* Google’s new reCAPTCHA has a dark side [fastcompany.com], 27-June-2019.

The latest version of the bot detector reCaptcha is invisible to users
...
We’ve all tried to log into a website or submit a form only to be stuck clicking boxes of traffic lights or storefronts or bridges in a desperate attempt to finally convince the computer that we’re not actually a bot.
...
But last fall, Google launched a new version of the tool, with the goal of eliminating that annoying user experience entirely.
...
“You have to understand what behavior on the site should be and mimic that well enough to fool us,” he says. “That’s a really hard problem versus the general problem of, ‘Pretend like I’m a human.'” Website administrators then get access to their visitors’ risk scores and can decide how to handle them
...
According to two security researchers who’ve studied reCaptcha, one of the ways that Google determines whether you’re a malicious user or not is whether you already have a Google cookie installed on your browser.
...
To make this risk-score system work accurately, website administrators are supposed to embed reCaptcha v3 code on all of the pages of their website, not just on forms or log-in pages. Then, reCaptcha learns over time how their website’s users typically act, helping the machine learning algorithm underlying it to generate more accurate risk scores. Because reCaptcha v3 is likely to be on every page of a website, if you’re signed into your Google account there’s a chance Google is getting data about every single webpage you go to that is embedded with reCaptcha v3—and there many be no visual indication on the site that it’s happening, beyond a small reCaptcha logo hidden in the corner.

This is a type of bot behaviour detection that a few of us have been working on/doing for several years, and it can work very well. In my instance I'm using Redis streams to watch, identify, and act on data in real time while simultaneously saving data in Postgres for later/ongoing machine learning analysis results of which are fed back into the real time engine.

I'm quite certain that Google is far better at this than I, however I'm able to get the benefit without giving up my visitor data. There have been conversations about this new Google 'feature' for some months now and it has been brought up that reCaptcha v3 shifts legal responsibility from Google to the site - reCaptcha v1, v2 whether one 'failed' or was passed through was Google's decision; with v3 Google just provides a score and the site decides. Fastco noted this as well:

Google did not address any potential privacy problems and insisted that reCaptcha v3 is a matter of corporate responsibility.

And as usual Google totally can't get their story straight on what they are doing, will or may do with data collected... Ah, poor Google has so many heads it's hard to keep the PR story straight... The telephone aka Chinese whispers game defeats tech behemoth yet again!

Google ReCaptcha v3: invisible data slurping shark

iamlost

JS_Harris

JS_Harris

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week