Hi fellas, today is my first article of a long series dedicating to CAPTCHAs. I will show you how to properly analyze and bypass several types of CAPTCHA. You can find below, a list, but not exhaustive, of the subjects that will be approached :
- OCR
- Randomness applicability
- audio CAPTCHA
- 3D
- reCAPTCHA
Each of them will be structured as follow : an, cause theory is the crux of any decent hackers, academical analysis showing weaknesses and fallacies and a practical example.
Before diving in, I would do a quick and brief historic introduction about CAPTCHAs.
CAPTCHA or Completely Automated Public Turing test to tell Computer and Human Apart has been proposed by Dr. Moni Naor in 1996 in order to secure online services which faced to ticket scalping, DDOS attack, automatic URL submission, chat spamming, … The idea was simple, create something which is computationally impossible to resolve by a computer but trivial for humans.
Several proposals have been made, approaching different issues that humans faced each day :
- Gender recognition
- Facial expression understanding
- Handwriting understanding
- Speech recognition
- Disambiguation in sentences
- ….
One year later Andrei Broder from Astalavista designed the first OCR based CAPTCHA, implementing the following process :
- Generation of a pseudo-random string
- Selection of a random string from the previous output
- Randomization of the background
- Randomization of the appearance
A dozen of CAPTCHAs have been created, following the proposals of those two researchers e.g HumanAuth, ASSIRA, Teabag, HotCAPTCHA (which is, by the way, quite funny ) …
However, none of them was really successful and had a lifetime inferior at 6 months. Due to the apparition of neural networks, deep and machine learning. Moreover, human farms render it difficult to produce a scheme robust enough to counteract this bane. Lastly, CAPTCHAs suffer from a real lack of security concern. Indeed, it became a profitable business and companies, to ensure competitively and rentability, overlooked the security aspect of their products. As stated Auguste Kerckhoffs in 1883 : “There is no security through obscurity”, exposing the fact that private engines and algorithms are doomed.
The worst fact is that once broken, companies have two choices to handle it, increase the distortion, randomization, … or rethink a new scheme. Nonetheless, for financial reasons, the first option is often chosen, reducing the usability of the CAPTCHA and allowing hackers to easily adapt their tools to the new version.
Nowadays, CAPTCHAs are, from a security point of view, completely useless but they, at least, ensure a minimum security standard over the internet. Indeed, without them, it would be chaos.
That is it for today, I hope you will enjoy as much as me this series.
P.S : If you want me to study a specific type of CAPTCHA, ask me.
Best,
Nitrax