Text-based reCaptcha, Defeated By An AI Bot
Anyone who browses the web is very familiar with the anti-robot technology we call CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), that blocks everyone to our destination until we finish solving the text-based puzzle.
The technology first originated in Carnegie Mellon University, which was later acquired by Google. The current version disposes of the text-based ‘puzzle’ in favor of matching images like below:
This change is very timely, as the old text-based CAPTCHA apparently is no longer effective in differentiating a human against an automated script, because of the new breakthrough made by a partnership of Peking University of China and Lancaster University of UK. The earlier assumption that only a thinking human being can solve the deliberately confusing texts is broken by a team of researchers from both universities.
The artificial intelligence system they have developed proved to have enough accuracy to automatically solve text-based Captcha puzzles. It may even be considered as better than a human at solving text-based Captchas, as the average resolution speed is a mere 0.05 second in a typical PC configuration.
The specially designed AI utility mimics the basic processes of the human brain; it is a ‘thinking engine’ that learns how to distinguish characters and learns variations as it goes along. The researchers use teaching methods akin to teaching a child to develop its identification skills from a young age. Also, called machine learning, the AI learns with the complexity of the variations submitted to its database, hence can reproduce these variations to compare with the text Captcha presented to it on the screen.
With an advanced machine learning algorithm, even as small as a 500 captcha sample is good enough for a reasonable level of accuracy when it comes to solving captchas. Of course, being a learning as it goes along the type of algorithm, the more time researchers invest in ‘teaching’ the AI, the better it identifies characters from one another.
Prior to this breakthrough, previous auto-resolving captcha bots were still unable to solve a number of captchas even tens of thousands of samples were already submitted to it for ‘learning.’
“We show for the first time that an adversary can quickly launch an attack on a new text-based captcha scheme with very low effort,” emphasized Zheng Wang, Lancaster University’s Senior Lecturer.
Google’s release of image-based Captcha is coincidentally a very nice response to the development of this text-based AI that solves text-based Captcha. Many websites that used to have text-based captcha have migrated to use image-based versions for at least a year now. Captcha is historically used to deliberately ‘slow down’ the completion of a form submission. If the text-based captcha is not replaced by the image-based captcha, this breakthrough of the UK and Chinese Universities can bypass security, flooding websites with bot-based sign-ups.
“It allows an adversary to launch an attack on services, such as Denial of Service attacks or spending spam or fishing messages, to steal personal data or even forge user identities. Given the high success rate of our approach for most of the text captcha schemes, websites should be abandoning captchas,” explained Guixin Ye, the Lead of the study.