# Crypto Algs (Part 1.2): Cracking Caesar [Brute Force + Dictionary]

(oaktree) #1

Sometimes, frequency analysis may fail, as @dtm pointed out.

Let’s send a message to our buddy, Bob. We want to tell him about an animal. Upon receiving the message and no key, Bob is confused. What does “rkssotm” mean!? He tries to use our frequency analysis program and this is his result: “dweeafy.”

What is “dweeafy?” That means nothing to Bob. We see that our program did as intended. It found the most frequent letter in “rkssotm,” `s`, and calculated the offset from `e`. As you see, such a method is not a silver bullet.

Let’s build out a dictionary scanner to help us figure out what to make of the message.

brute2.cpp

``````#include "dict-scan.cpp"
#include "encrypt.cpp"
#include <utility>

#define RED "\033[0;31m"
#define NORMAL "\033[0;39m"

int main(int argc, char** argv) {
if (argc != 2) {
std::cout << "Usage: ./brute2 /path/to/dict\n";
exit(1);
}

std::string s; std::getline(std::cin, s);

std::vector<std::string> dictionary;
if ( !DictionaryScanner::load(std::string(argv[1]), dictionary) ) {
exit(1);
} else {
std::cout << RED << "[*] Loaded dictionary at "
<< NORMAL << argv[1] << std::endl;
}

std::pair<int,int> max_found(0,0); // key, num_found
std::string tmp;
int found;
std::cout << RED << "[*] Scanning brute forces...\n" << NORMAL;

for (int i = 0; i < 26; i++) {
tmp = Caesar::decrypt(s, i);
found = DictionaryScanner::scan( tmp, dictionary );
if (found > max_found.second) {
max_found.second = found;
max_found.first = i;
}
}

std::cout << RED << "[ key: " << max_found.first << ", found: " << max_found.second
<< " ] " << NORMAL << Caesar::decrypt(s, max_found.first) << std::endl;
return 0;
}
``````

Woah. A lot happens in that file, huh?

Let’s run through it.

1. Take input from the user: `std::string s; std::getline(std::cin, s);`
2. Load the specified dictionary into a `vector` of `string`s.
3. Keep track of the maximum words found over all iterations. Remember the key used to get that max, too: `std::pair<int,int> max_found(0,0);` and the following `for` loop.
4. Print out the result.

That’s all there is to it. But what is under the hood, in the DictionaryScanner namespace?

The first thing we encounter is `load(...)`, which (obviously) loads the dictionary into a `vector` for quick access.

``````bool load(const std::string& file_path, std::vector<std::string>& dictionary)  {

std::ifstream fp(file_path);

if (!fp) {
std::cout << "[*] Error opening dictionary file at " << file_path
<< ".\n";
return false;
}

std::string tmp;
while(fp >> tmp) {
dictionary.push_back(tmp);
}

fp.close();

return true;
}
``````

Next, let’s take a look at the `scan(...)` subroutine.

``````int scan(const std::string& text, const std::vector<std::string>& dictionary) {

int found = 0;

for (auto& s : dictionary) {
//std::cout << "checking " << s << std::endl;
if ( find(text, s) )
found++;
}

return found;
}
``````

We see that it is just keeping track of how many words are found in the dictionary. But, it relies on a function called `find(...)`, which seems to take two parameters: the cipher- or plaintext, and a word from the dictionary.

`find(...)` is implemented by us (me), not a standard library. I got a weird compile error, okay? And I was too lazy to figure that out, so I made my own function. I understand that this is like popping a tire and then reinventing the wheel just to avoid a mechanic.

``````bool find (const std::string& text, const std::string& s) {
size_t text_len = text.length(), s_len = s.length();

int j,k;
for (int i = 0; i < text_len; i++) {
j = i;
k = 0;
while(k < s_len && j < text_len && tolower(text[j]) == tolower(s[k])) {
/*std::cout << "\ttext[j] == s[k]: " << text[j]
<< " == " << s[k] << std::endl;*/
j++;k++;
}
if (k == s_len && j <= text_len)
return true;
}

return false;
}
``````

The function – case-insensitively – checks for the first (if any) occurrence of a word in the `text`.

Okay, so Bob has just spent the last 3 hours of his life writing all of this code, just because he wants to avoid reading over 25 different versions of “rkssotm.”

Okay:

``````\$ ./brute2 ./animals.txt
rkssotm
[*] Scanning brute forces...
``````

``````[ key: 6, found: 1 ] lemming
``````

Awwwww, a lemming! How cute! Bob now feels like that 3 hours of coding (with breaks for finger-icing) was all worth it!

# Conclusion

I wasn’t going to do this post until @dtm brought up such a good point. Frequency analysis isn’t the do all end all. Neither is this. You could get some weird false-positives for some words.

Some improvements to the code would be

1. Make sure the word you are `find`ing is not some segment of a word that doesn’t exist. This could happen when scanning for `it`, or other small words.
2. Handle plurality, maybe? You’d think this would be easy, but if your dictionary only has singular animals, you can’t just tack on an `s`. For example, Ox.

Later…
@oaktree

6 Likes

#2

Good job mate. Bob was a smart guy.

2 Likes

(pico) #3

pastebin with the whole code?

0 Likes

(oaktree) #4

For @0x00pf or anyone else:

2 Likes