Crypto Algs (Part 1.2): Cracking Caesar [Brute Force + Dictionary]

oaktree · July 17, 2016, 11:30pm

Sometimes, frequency analysis may fail, as @dtm pointed out.

Let’s send a message to our buddy, Bob. We want to tell him about an animal. Upon receiving the message and no key, Bob is confused. What does “rkssotm” mean!? He tries to use our frequency analysis program and this is his result: “dweeafy.”

What is “dweeafy?” That means nothing to Bob. We see that our program did as intended. It found the most frequent letter in “rkssotm,” s, and calculated the offset from e. As you see, such a method is not a silver bullet.

Let’s build out a dictionary scanner to help us figure out what to make of the message.

brute2.cpp

#include "dict-scan.cpp"
#include "encrypt.cpp"
#include <utility>

#define RED "\033[0;31m"
#define NORMAL "\033[0;39m"

int main(int argc, char** argv) {
    if (argc != 2) {
        std::cout << "Usage: ./brute2 /path/to/dict\n";
        exit(1);
    }

    std::string s; std::getline(std::cin, s);
    
    std::vector<std::string> dictionary;
    if ( !DictionaryScanner::load(std::string(argv[1]), dictionary) ) {
        exit(1);
    } else {
        std::cout << RED << "[*] Loaded dictionary at "
            << NORMAL << argv[1] << std::endl;
    }

    std::pair<int,int> max_found(0,0); // key, num_found
    std::string tmp;
    int found;
    std::cout << RED << "[*] Scanning brute forces...\n" << NORMAL;

    for (int i = 0; i < 26; i++) {
        tmp = Caesar::decrypt(s, i);
        found = DictionaryScanner::scan( tmp, dictionary );
        if (found > max_found.second) {
            max_found.second = found;
            max_found.first = i;
        }
    }

    std::cout << RED << "[ key: " << max_found.first << ", found: " << max_found.second
                << " ] " << NORMAL << Caesar::decrypt(s, max_found.first) << std::endl;
    return 0;
}

Woah. A lot happens in that file, huh?

Let’s run through it.

Take input from the user: std::string s; std::getline(std::cin, s);
Load the specified dictionary into a vector of strings.
Keep track of the maximum words found over all iterations. Remember the key used to get that max, too: std::pair<int,int> max_found(0,0); and the following for loop.
Print out the result.

That’s all there is to it. But what is under the hood, in the DictionaryScanner namespace?

The first thing we encounter is load(...), which (obviously) loads the dictionary into a vector for quick access.

bool load(const std::string& file_path, std::vector<std::string>& dictionary)  {
    
    std::ifstream fp(file_path);

    if (!fp) {
        std::cout << "[*] Error opening dictionary file at " << file_path
            << ".\n";
        return false;
    }

    std::string tmp;
    while(fp >> tmp) {
        dictionary.push_back(tmp);
    }

    fp.close();

    return true;
}

Next, let’s take a look at the scan(...) subroutine.

int scan(const std::string& text, const std::vector<std::string>& dictionary) {
    
    int found = 0;

    for (auto& s : dictionary) {
        //std::cout << "checking " << s << std::endl;
        if ( find(text, s) )
            found++;
    }

    return found;
}

We see that it is just keeping track of how many words are found in the dictionary. But, it relies on a function called find(...), which seems to take two parameters: the cipher- or plaintext, and a word from the dictionary.

find(...) is implemented by us (me), not a standard library. I got a weird compile error, okay? And I was too lazy to figure that out, so I made my own function. I understand that this is like popping a tire and then reinventing the wheel just to avoid a mechanic.

bool find (const std::string& text, const std::string& s) {
    size_t text_len = text.length(), s_len = s.length();

    int j,k;
    for (int i = 0; i < text_len; i++) {
        j = i;
        k = 0;
        while(k < s_len && j < text_len && tolower(text[j]) == tolower(s[k])) {
            /*std::cout << "\ttext[j] == s[k]: " << text[j]
                << " == " << s[k] << std::endl;*/
            j++;k++;
        }
        if (k == s_len && j <= text_len)
            return true;
    }

    return false;
}

The function – case-insensitively – checks for the first (if any) occurrence of a word in the text.

Okay, so Bob has just spent the last 3 hours of his life writing all of this code, just because he wants to avoid reading over 25 different versions of “rkssotm.”

Let’s load this dictionary, adapted from this wordlist I found online.

Okay:

$ ./brute2 ./animals.txt
rkssotm
[*] Loaded dictionary at ../animals.txt
[*] Scanning brute forces...

cute lemming

[ key: 6, found: 1 ] lemming

Awwwww, a lemming! How cute! Bob now feels like that 3 hours of coding (with breaks for finger-icing) was all worth it!

Conclusion

I wasn’t going to do this post until @dtm brought up such a good point. Frequency analysis isn’t the do all end all. Neither is this. You could get some weird false-positives for some words.

Some improvements to the code would be

Make sure the word you are finding is not some segment of a word that doesn’t exist. This could happen when scanning for it, or other small words.
Handle plurality, maybe? You’d think this would be easy, but if your dictionary only has singular animals, you can’t just tack on an s. For example, Ox.

Later…
@oaktree

Cromical · July 18, 2016, 4:31am

Good job mate. Bob was a smart guy.

0x00pf · July 18, 2016, 5:16pm

pastebin with the whole code?

oaktree · July 18, 2016, 5:26pm

For @0x00pf or anyone else:

brute2.cpp
dict-scan.cpp
encrypt.cpp
freq-analysis.cpp

pry0cc · July 19, 2016, 10:35pm

Am I smelling a cheeky github repo?

oaktree · January 21, 2018, 12:37am

This topic was automatically closed after 30 days. New replies are no longer allowed.