Forget security – Google's reCAPTCHA v2 is exploiting users for profit | Web puzzles don't protect against bots, but humans have spent 819 million unpaid hours solving them
Web puzzles don't protect against bots, but humans have spent 819 million unpaid hours solving them
Research Findings:
reCAPTCHA v2 is not effective in preventing bots and fraud, despite its intended purpose
reCAPTCHA v2 can be defeated by bots 70-100% of the time
reCAPTCHA v3, the latest version, is also vulnerable to attacks and has been beaten 97% of the time
reCAPTCHA interactions impose a significant cost on users, with an estimated 819 million hours of human time spent on reCAPTCHA over 13 years, which corresponds to at least $6.1 billion USD in wages
Google has potentially profited $888 billion from cookies [created by reCAPTCHA sessions] and $8.75–32.3 billion per each sale of their total labeled data set
Google should bear the cost of detecting bots, rather than shifting it to users
"The conclusion can be extended that the true purpose of reCAPTCHA v2 is a free image-labeling labor and tracking cookie farm for advertising and data profit masquerading as a security service," the paper declares.
In a statement provided to The Register after this story was filed, a Google spokesperson said: "reCAPTCHA user data is not used for any other purpose than to improve the reCAPTCHA service, which the terms of service make clear. Further, a majority of our user base have moved to reCAPTCHA v3, which improves fraud detection with invisible scoring. Even if a site were still on the previous generation of the product, reCAPTCHA v2 visual challenge images are all pre-labeled and user input plays no role in image labeling."
The objective of reCAPTCHA (or any captcha) isn't to detect bots. It is more of stopping automated requests and rate limiting. The captcha is 'defeated' if the time complexity to solve it, whether human or bot, is less than what expected. Now humans are very slow, hence they can't beat them anyway.
I thought captcha's worked in a way where they provided some known good examples, some known bad examples, and a few examples which aren't certain yet. Then the model is trained depending on whether the user selects the uncertain examples.
Also it's very evident what's being trained. First it was obscured words for OCR, then Google Maps screenshots for detecting things, now you see them with clearly machine-generated images.
[…] reCAPTCHA […] isn’t to detect bots. It is more of stopping automated requests […]
which is bots. bots do automated requests and every automated request doer can also be called a bot (i.e. web crawlers are called bots too and -if kind- also respect robots.txt which has "bots" in its name for this very reason and bots is the shortcut for robots)
use of different words does not change reality behind it, but may add a fact of someone trying something on the other.