Crypto Algs (Part 1.1): Cracking Caesar

Yesterday I dropped an article about the Caesar Cipher encryption “scheme”. Now, it’s time to break it.

This one will be easy. Think about it. Our key can only be between 0 and 25. Sounds like a brute-forcable problem.


Method 1: Brute Force

First, here are the encryption and decryption functions we’ll be using.

encrypt.cpp

#include <vector>
#include <string>
#include <iostream>

namespace Caesar {

    std::string encrypt(const std::string& str, const int& key) {
        
        std::string s = str;

        for (auto& c : s) {
            if (c >= 'a' && c <= 'z') {

                c = ((c - 'a' + key) % 26) + 'a';

            } else if (c >= 'A' && c <= 'Z') {

                c = ((c - 'A' + key) % 26) + 'A';

            }
        }

        return s;
    }

    std::string decrypt(const std::string& str, const int& key) {
        return encrypt(str, 26 - key);
    }
};

Straight forward, right? If not, read my last article again.

Now let’s get down to decrypting with brute force. We’re just going to use a simple for loop:

brute.cpp

#include "./encrypt.cpp"

int main() {
    std::string s; std::getline(std::cin, s);
    std::cout << std::endl;
    
    for (int i = 1; i < 26; i++) {
        std::cout << "[ " << i << " ] "; 
        std::cout << Caesar::decrypt(s, i) << std::endl;
    }
    return 0;
}

Let’s say I encrypt a message.

$ ./caesar-cipher
Hello World
5
Mjqqt Btwqi

Now let’s metaphorically swallow the key and then try to unlock our message via brute force.

$ ./brute
Mjqqt Btwqi

[ 1 ] Lipps Asvph
[ 2 ] Khoor Zruog
[ 3 ] Jgnnq Yqtnf
[ 4 ] Ifmmp Xpsme
[ 5 ] Hello World
[ 6 ] Gdkkn Vnqkc
[ 7 ] Fcjjm Umpjb
[ 8 ] Ebiil Tloia
[ 9 ] Dahhk Sknhz
[ 10 ] Czggj Rjmgy
[ 11 ] Byffi Qilfx
[ 12 ] Axeeh Phkew
[ 13 ] Zwddg Ogjdv
[ 14 ] Yvccf Nficu
[ 15 ] Xubbe Mehbt
[ 16 ] Wtaad Ldgas
[ 17 ] Vszzc Kcfzr
[ 18 ] Uryyb Jbeyq
[ 19 ] Tqxxa Iadxp
[ 20 ] Spwwz Hzcwo
[ 21 ] Rovvy Gybvn
[ 22 ] Qnuux Fxaum
[ 23 ] Pmttw Ewztl
[ 24 ] Olssv Dvysk
[ 25 ] Nkrru Cuxrj

A quick look-through reveals our message, “Hello World,” with a key of 5.

Now, that’s not all that efficient. Plus, we must burden our eyes with looking through each of the 25 results.

Let’s make the computer do the work instead.


Method 2: Frequency Analysis

#include "./encrypt.cpp"

#define REG_MAX 4 // the index of 'e' in alpha ('a' = 0)

int main() {
    std::string s; std::getline(std::cin, s);

    std::vector<unsigned int> freq(26, 0);

    /*
        Collect frequencies
    */
    for (auto& c : s) {
        if (c >= 'a' && c <= 'z') {
            freq[c - 'a']++;
        } else if (c >= 'A' && c <= 'Z') {
            freq[c - 'A']++;
        }
    }

    // find most frequent letter
    unsigned int max_idx = 0;
    for (int i = 1; i < 26; i++) {
        if (freq[i] > freq[max_idx]) {
            max_idx = i;
        }
    }

    int key = max_idx - REG_MAX;
    while (key < 0) {
        key += 26;
    }

    std::cout << Caesar::decrypt(s, key) << std::endl;
}

We count the occurrences of each letter and store it in freq. Taking advantage of the fact that ‘e’ is the most common letter in English, finding the most frequent letter and calculating its distance from ‘e’ reveals the key.

This will sometimes fail on smaller sample sets. For example, “Hello” only has ‘e’ once, but ‘l’ appears twice. I’ll be using a larger input to ensure that we have enough data to make a good key assumption.

Here’s the ciphertext we’ll be using:

Wg obm ct wh fsoz? W asob, zccy oh hvwg. Zccy oh wh! O kcfzr piwzh cb tobhogm. Gmbhvshwq sachwcbg wb hvs tcfa ct dwzzg. Dgmqvczcuwqoz koftofs wb hvs tcfa ct orjsfhwgwbu. Awbr-ozhsfwbu qvsawqozg wb hvs tcfa ct… tccr! Pfowbkogvwbu gsawbofg wb hvs tcfa ct asrwo. Qcbhfczzsr wgczohsr pippzsg wb hvs tcfa ct gcqwoz bshkcfyg. Fsoz? Mci kobh hc hozy opcih fsozwhm? Ks vojsb’h zwjsr wb obmhvwbu fsachszm qzcgs hc wh gwbqs hvs hifb ct hvs qsbhifm. Ks hifbsr wh ctt, hccy cih hvs pohhsfwsg, gboqysr cb o pou ct UACg kvwzs ks hcggsr hvs fsabobhg wb hvs sjsf-sldobrwbu Riadghsf ct hvs viaob qcbrwhwcb. Ks zwjs wb pfobrsr vcigsg hforsaofysr pm qcfdcfohwcbg piwzh cb pwdczof biapsfg xiadwbu id obr rckb cb rwuwhoz rwgdzomg, vmdbchwnwbu ig wbhc hvs pwuusgh gziapsf aobywbr vog sjsf gssb. Mci vojs hc rwu dfshhm rssd, ywrrc, pstcfs mci qob twbr obmhvwbu fsoz. Ks zwjs wb o ywburca ct pizzgvwh. O ywburca mci’js zwjsr wb tcf tof hcc zcbu. Gc rcb’h hszz as opcih bch pswbu fsoz. W’a bc zsgg fsoz hvob hvs tiqywbu psst dohhm wb mcif Pwu Aoq.

I’ve given you all the tools you need. You can decrypt it yourself. Bonus brownie points if you get the key, too!

Later…
@oaktree

So, do you recognize the message? How about where it’s from?

5 Likes

Great job again oak! Really look forward to when you do something like this with an encryption like AES and so on!

Using brute force is probably a lot more efficient than using frequency analysis in this situation. Usually, the effectiveness of a method is measured by the time it would take to break ciphered messages, not the overall average performance of the method on other ciphering algorithms. Taking into consideration that you only need to shift the characters 25 times, it conserves so much more time than having to perform a frequency analysis and then manually deciphering the ciphered text one character at a time. In fact, it would be considered pretty much overkill.

1 Like

Well, @dtm: We can evaluate the time complexities of each method:

Brute Force

for (int i = 1; i < 26; i++) {
    std::cout << "[ " << i << " ] "; 
    std::cout << Caesar::decrypt(s, i) << std::endl;
}
//...

25 times, we decrypt a string of n characters, giving us a time complexity of 25n, which simplifies to O( n ).


Frequency Analysis

/*
Collect frequencies
*/
for (auto& c : s) {
    if (c >= 'a' && c <= 'z') {
        freq[c - 'a']++;
    } else if (c >= 'A' && c <= 'Z') {
       freq[c - 'A']++;
    }
}

We iterate through each character here, which takes n time.

unsigned int max_idx = 0;
for (int i = 1; i < 26; i++) {
    if (freq[i] > freq[max_idx]) {
        max_idx = i;
    }
}

Here, we iterate through the freq array, which is always 26 elements. Starting from element 1 to 25 gives us a constant time, O ( 1 ).

std::cout << Caesar::decrypt(s, key) << std::endl;

Next, we decrypt, which takes n time.

The total time for frequency analysis thus comes to n + n + 25, which is simplified through the limit process to n. Thus, frequency analysis of the Caesar Cipher takes O( n ) time.


Conclusion

Looking back, we see that the derived time complexities are identical; they both run in linear time, O( n ). Thus, either method may be used. However, frequency analysis is cheaper, because it is more deterministic. A brute force approach requires the end user to sift through all of the results. Why do that?

Further, the un-simplified time complexities favor frequency analysis, because its raw complexity is 2n + 25, compared to brute force’s 25n complexity.

So, actually, frequency analysis is better. Note that brute force is certainly more reliable for smaller inputs, since determining the key with small inputs based on character frequency is not always “for sure.”

Another time, I may have agreed with you, @dtm, but what I’ve shown above proves otherwise.

3 Likes

Though the time complexity you have shown may be true, the frequency analysis is entirely based on the assumption of statistical values, i.e. the frequency of the occurrence of a letter where e is the highest, t being second, et cetera. Consider this: What would happen if you attempted to break a ciphered text which has been manipulated to not conform to the statistic? Would a frequency analysis be considered reliable? Theory and practice are two sides of the same coin, they are not equivalent.

Speaking of statistics, is it not possible to have a program to detect common words to potentially find the correct deciphered text? This would easily remove the need for a person to manually look through each and every possible result.

1 Like

Well I assume if you had some sort of dictionary set up you could set a program like that. Only thing is that the plain text might actually be in another language. Meaning that if it isn’t it would just seem like gibberish to the program and it would move on. So you could actually have what would be a valid deciphered text, except since it is in another lang it just looks like cipher text and it moves on. Many people actually use different languages in their encryptions, to make it just a bit harder to figure out. I suppose however you would be right in it being more convenient.

1 Like

Or you could just have the program detect most words regardless of language, except putting something like that together for a cipher such as the Caesar cipher would just take up time, as the cipher itself is simple enough to crack with the aforementioned program @oaktree set up.

@dtm:
Of course. There exist infinite samples. Of those, some n samples wouldn’t conform to the statistics. That is why I picked out a rather large sample for this article. If you attempt a frequency analysis of a ciphered “hello”, you will fail to break the cipher, because there are two ls and only one e.

Yes, you could also scan the brute-force results for common words like “the” or “and.” While I did not show that here, it can certainly be done.

I had setup a Caesar cracking tool some time ago too and I used a combination of both techniques: First I used frequency analysis to order the 26 keys. Then I printed out the first 80 digits (Only the first 80, because it should be enough to see if it’s rubbish or not) of every decrypted text in the previously calculated order. With this method you are able to identify the right key as easy as possible for the user.

Of course this takes more time than normal brute-force, but I think the losses are low enough - when we speak about Caeser cipher -, that it can be seen as useful :wink:.

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.