CAPTCHA - Introduction

Hi fellas, today is my first article of a long series dedicating to CAPTCHAs. I will show you how to properly analyze and bypass several types of CAPTCHA. You can find below, a list, but not exhaustive, of the subjects that will be approached :

Each of them will be structured as follow : an, cause theory is the crux of any decent hackers, academical analysis showing weaknesses and fallacies and a practical example.

Before diving in, I would do a quick and brief historic introduction about CAPTCHAs.

CAPTCHA or Completely Automated Public Turing test to tell Computer and Human Apart has been proposed by Dr. Moni Naor in 1996 in order to secure online services which faced to ticket scalping, DDOS attack, automatic URL submission, chat spamming, … The idea was simple, create something which is computationally impossible to resolve by a computer but trivial for humans.

Several proposals have been made, approaching different issues that humans faced each day :

  • Gender recognition
  • Facial expression understanding
  • Handwriting understanding
  • Speech recognition
  • Disambiguation in sentences
  • ….

One year later Andrei Broder from Astalavista designed the first OCR based CAPTCHA, implementing the following process :

  • Generation of a pseudo-random string
  • Selection of a random string from the previous output
  • Randomization of the background
  • Randomization of the appearance

A dozen of CAPTCHAs have been created, following the proposals of those two researchers e.g HumanAuth, ASSIRA, Teabag, HotCAPTCHA (which is, by the way, quite funny :grin:) …
However, none of them was really successful and had a lifetime inferior at 6 months. Due to the apparition of neural networks, deep and machine learning. Moreover, human farms render it difficult to produce a scheme robust enough to counteract this bane. Lastly, CAPTCHAs suffer from a real lack of security concern. Indeed, it became a profitable business and companies, to ensure competitively and rentability, overlooked the security aspect of their products. As stated Auguste Kerckhoffs in 1883 : “There is no security through obscurity”, exposing the fact that private engines and algorithms are doomed.
The worst fact is that once broken, companies have two choices to handle it, increase the distortion, randomization, … or rethink a new scheme. Nonetheless, for financial reasons, the first option is often chosen, reducing the usability of the CAPTCHA and allowing hackers to easily adapt their tools to the new version.

Nowadays, CAPTCHAs are, from a security point of view, completely useless but they, at least, ensure a minimum security standard over the internet. Indeed, without them, it would be chaos.

That is it for today, I hope you will enjoy as much as me this series.

P.S : If you want me to study a specific type of CAPTCHA, ask me.



The CAPTCHA that CloudFlare presents when you try to use TOR would be nice! :smile_cat:


Awesome introduction dude! I really like the level of depth you’ve used, it was very easy to read, and I enjoyed reading through it! Nice work :wink:

1 Like

@oaktree It is the new reCAPTCHA created by google ! I added it to the list


@Nitrax: I’ve gotten great at identifying rivers, street signs, and storefronts.


Does the cracking of CAPTCHA include image processing or is it based on analysing the algorithm used to create it? Some clarity on cracking is really appreciated. Great post!



1 Like

Can’t wait mate! :smiley:

@Cal0X The creation algorithm will be, anyway, studied in order to improve the engine understanding and to detect potential flaws that could be exploited later. However, some CAPTCHAs, as the humanAuth, only requires image processing to be broken and further study would be pointless. Consequently, it will depend on the CAPTCHA targeted and its complexity.


I myself have gotten good at identifying cupcakes. Coincidentally I receive food based CAPTCHA’s whenever I’m hungry.

This topic was automatically closed after 30 days. New replies are no longer allowed.