Continuing the discussion from [KEYGENME - EASY] Cracking Your First Program:
As requested by @pry0cc, this is a crash course on reverse engineering. Before reading this, you should try to solve @dtm challenge:
If you didn’t manage or you want to see somebody else solution keep reading.
Our Target
I will start writing our target program. You can try what is described in this paper against the original challenge binaries the process is the same. Showing the C code at the beginning makes the explanation easier. Also, you will note that the code from this example is roughly the same that the one you see in the original challenge.
This is my free implementation of @dtm’s original challenge.
#include <stdio.h>
#define PEXIT(s) {printf ("%s",s); return 0;}
int
main (void)
{
FILE *f;
char buffer[16];
int n;
if (((f = fopen ("key.txt", "rb")) != NULL) &&
(n = fread (buffer, 1, 16, f)) == 16)
{
for (;n >= 0 && buffer[--n] == 'a';);
if (n < 0) PEXIT("Key is correct\n");
}
PEXIT("Wrong Key\n");
}
Fine, let’s compile it for a 32bits platform (as the original challenge) and then, let’s strip all symbols from it to make the process a bit more difficult. In this case, removing the symbols means that we will have to find the main
function ourselves.
gcc -m32 -o kg kg.c
strip -s kg
EXERCISE:
You can usereadelf -s
to check the symbols in the program before and after running thestrip
command. Note how themain
function is gone after stripping the binary.
Gathering information
In this case it is not really needed, but in general, you will want to gather as much information about the target as you want… you may also want to run it on a VM and actually take a snapshot before you start… specially if you do dynamic analysis and you do not know what the sample does (backdoor, worm, virus,…)
Some tools you may want to use: file
, strings
, readelf -a
and also objdump
.
In this case, we already know everything for this program. After all, we wrote it ourselves, so let’s go straight into the reverse stuff.
EXERCISE:
Run the tools mentioned above on the program you just compile.
Reversing with gdb
Let’s start this tutorial reversing the sample using gdb
.
Let’s start:
$ gdb ./kg
(… gdb messages here…)
(gdb)
As this sample does not have symbols (that’s normal in real life) we do not know right away where our main
function is (yep, I have already said that). But we can ask gdb
about the program entry point:
(gdb) info file
(…)
Entry point: 0x80483f0
(…)
Sometimes you cannot even get this information (when using sstrip -z
for instance). In those cases… well, let’s leave those for another post.
Now we can start disassembling the entry point. If you have been following the “Programming for Wannabes” course you should know that the main
function is not the first function the OS executes when we run a program. You don’t know… go and read the “Programming for Wannabe” series now!. The main
function gets called from the entry point (the entry point is usually named _start
).
Let’s see what we can find at our entry point:
(gdb) x/15i 0x80483f0
0x80483f0: xor %ebp,%ebp
0x80483f2: pop %esi
0x80483f3: mov %esp,%ecx
0x80483f5: and $0xfffffff0,%esp
0x80483f8: push %eax
0x80483f9: push %esp
0x80483fa: push %edx
0x80483fb: push $0x8048620
0x8048400: push $0x80485b0
0x8048405: push %ecx
0x8048406: push %esi
0x8048407: push $0x80484ed
0x804840c: call 0x80483d0 <__libc_start_main@plt>
0x8048411: hlt
0x8048412: xchg %ax,%ax
NOTE: The
x
command on gdb dumps memory. After the slash you can indicate the number of items to dump and the format. In this case we are usingi
that indicatesgdb
to dump instructions. You can also use the commanddisassemble
:
(gdb) disassemble 0x80483f0,+30
In this case the second parameter is the size in bytes of the dump, instead of the number of instructions to dump. You can use the flag
/r
to also dump the machine code.
Well, as you see not all symbols have been removed. Being dynamically linked, the program has to keep some information to be able to find the functions provided by other libraries. So the call
instruction at 0x804840c
is the one that is going to run main
and therefore, it receives as a parameter, among other things, the pointer to the main
function. In this case it is the last parameter pushed into the stack… So, let’s take a look to the main
function!
NOTE: You will find this kind of sequences in many programs out there… you will learn to easily identify them as you keep practising.
The main
Function
We have finally found the main
function. It is at position 0x80484ed
. And this is how it does look like:
(gdb) x/50i 0x80484ed
0x80484ed: push %ebp
0x80484ee: mov %esp,%ebp
0x80484f0: and $0xfffffff0,%esp
0x80484f3: sub $0x30,%esp
0x80484f6: mov %gs:0x14,%eax
0x80484fc: mov %eax,0x2c(%esp)
0x8048500: xor %eax,%eax
0x8048502: movl $0x8048640,0x4(%esp)
0x804850a: movl $0x8048643,(%esp)
0x8048511: call 0x80483e0 <fopen@plt>
0x8048516: mov %eax,0x18(%esp)
0x804851a: cmpl $0x0,0x18(%esp)
0x804851f: je 0x8048587
0x8048521: mov 0x18(%esp),%eax
0x8048525: mov %eax,0xc(%esp)
0x8048529: movl $0x10,0x8(%esp)
0x8048531: movl $0x1,0x4(%esp)
0x8048539: lea 0x1c(%esp),%eax
0x804853d: mov %eax,(%esp)
0x8048540: call 0x80483a0 <fread@plt>
0x8048545: mov %eax,0x14(%esp)
0x8048549: cmpl $0x10,0x14(%esp)
0x804854e: jne 0x8048587
0x8048550: cmpl $0x0,0x14(%esp)
0x8048555: js 0x804856d
0x8048557: subl $0x1,0x14(%esp)
0x804855c: lea 0x1c(%esp),%edx
0x8048560: mov 0x14(%esp),%eax
0x8048564: add %edx,%eax
0x8048566: movzbl (%eax),%eax
0x8048569: cmp $0x61,%al
0x804856b: je 0x8048550
0x804856d: cmpl $0x0,0x14(%esp)
0x8048572: jns 0x8048587
0x8048574: movl $0x804864b,(%esp)
0x804857b: call 0x80483b0 <puts@plt>
0x8048580: mov $0x0,%eax
0x8048585: jmp 0x8048598
0x8048587: movl $0x804865a,(%esp)
0x804858e: call 0x80483b0 <puts@plt>
0x8048593: mov $0x0,%eax
0x8048598: mov 0x2c(%esp),%ecx
0x804859c: xor %gs:0x14,%ecx
0x80485a3: je 0x80485aa
0x80485a5: call 0x8048390 <__stack_chk_fail@plt>
0x80485aa: leave
0x80485ab: ret
0x80485ac: xchg %ax,%ax
0x80485ae: xchg %ax,%ax
0x80485b0: push %ebp
This is our whole main
function. You can see the calls to all the functions we used in our C program (fopen
, fread
, puts
… yes, gcc
automatically converts a printf
with a single static string parameter into a puts
call). All those functions are provided by libc and therefore, dynamically linked. You can refer to the great paper from @_py on the topic to know more about this process:
Now, we can start reversing each function one by one.
Calling fopen
Let’s skip to address 0x8048502
. The instructions before just set the stack to store our local variables. You will see that type of code all the time just ignore it for this simple case. So at 0x8048502
we have
0x8048502: movl $0x8048640,0x4(%esp)
0x804850a: movl $0x8048643,(%esp)
0x8048511: call 0x80483e0 <fopen@plt>
0x8048516: mov %eax,0x18(%esp)
0x804851a: cmpl $0x0,0x18(%esp)
0x804851f: je 0x8048587
The first thing to do is to put the function parameter in the stack (pointed by register ESP
). At offset 0 in the stack (%esp)
we store one pointer, and at offset 4 0x4(%esp)
or [ESP + 0x4]
if you prefer, another pointer. Let’s see to what are those pointers are pointing to:
(gdb) x/s 0x8048640
0x8048640: "rb"
(gdb) x/s 0x8048643
0x8048643: "key.txt"
NOTE: Type
s
for thegdb
dump commandx
will dump a zero-terminated string
Nice, those are actually the two parameters passed to the fopen
function. Then we see how the program stores the value of eax
at offset 0x18
in the stack. 0x18(%esp)
actually means; Give me the content of ESP + 0x18
… it’s called indexed addressing mode).
Let’s create our local variable tables, to keep track of them to help us during the reversing process. For now we only have one:
ESP + 0x18 -> FILE *f;
Finally, the function checks the return value. If it is 0 (or NULL
in this case, check the C code above), the program will jump to 0x8048587
.
From these 6 assembly lines we have got the following information:
- The key has to be stored in a file named ‘key.txt’ in the current directory
- It is open in binary read mode (this is actually irrelevant for this case)
Good, that is the file we have to create. Now we need to figure out what do we have to put in it.
Calling fread
The next function we find is fread
. We can follow the same process to reverse it. I’m not going to repeat it here, you should manage yourself. This is the part that pushes the parameters and calls the function:
0x8048521: mov 0x18(%esp),%eax
0x8048525: mov %eax,0xc(%esp)
0x8048529: movl $0x10,0x8(%esp)
0x8048531: movl $0x1,0x4(%esp)
0x8048539: lea 0x1c(%esp),%eax
0x804853d: mov %eax,(%esp)
0x8048540: call 0x80483a0 <fread@plt>
0x8048545: mov %eax,0x14(%esp)
Carefully looking at this code and knowing the prototype of the fread
function (man fread
) we can infer that:
ESP + 0x18 -> FILE *f; (we already knew that)
ESP + 0x1c -> buffer
ESP + 0x14 -> n;
EXERCISE: Good through the code instruction by instruction and draw the content of the stack (whatever is indexed from
%esp
). Identify the parameters passed to the function and the local variables we identified in the table above.
Again the result of the function (the number of items read) is stored at offset 0x14
in the stack. After that, we find the check of the number of bytes read (take a look to the C code if you do not believe me).
0x8048549: cmpl $0x10,0x14(%esp)
0x804854e: jne 0x8048587
....
0x8048587: movl $0x804865a,(%esp)
0x804858e: call 0x80483b0 <puts@plt>
0x8048593: mov $0x0,%eax
0x8048598: mov 0x2c(%esp),%ecx
0x804859c: xor %gs:0x14,%ecx
0x80485a3: je 0x80485aa
0x80485a5: call 0x8048390 <__stack_chk_fail@plt>
0x80485aa: leave
0x80485ab: ret
You can clearly see the comparison of the value returned by fread
and stored at offset 0x14
against the value 0x10
(16 in decimal). If the two values are Non-Equal
(that’s they are different ), the program will jump to 0x8048587
. This is the same address we jumped to in the case of a problem with fopen
(see above). At that position, we will find the puts
call with the message shown when the key is wrong:
(gdb) x/s 0x804865a
0x804865a: "Wrong Key"
The rest of the code is just the stack cleaning and the return from the main function.
The Key Check
The code left in the midle of our ASM listing is the key check. For a real program, the key generation is a lot more complex that these few lines, but let’s start with this simple example to master the basics:
0x8048550: cmpl $0x0,0x14(%esp)
0x8048555: js 0x804856d
0x8048557: subl $0x1,0x14(%esp)
0x804855c: lea 0x1c(%esp),%edx
0x8048560: mov 0x14(%esp),%eax
0x8048564: add %edx,%eax
0x8048566: movzbl (%eax),%eax
0x8048569: cmp $0x61,%al
0x804856b: je 0x8048550
0x804856d: cmpl $0x0,0x14(%esp)
0x8048572: jns 0x8048587
0x8048574: movl $0x804864b,(%esp)
0x804857b: call 0x80483b0 <puts@plt>
0x8048580: mov $0x0,%eax
0x8048585: jmp 0x8048598
If you remember our original C code, we are reusing the variable n
, the one storing the number of bytes read from fread
, as our loop counter. We go from n
(n - 1
actually) to -1
. So the end loop condition consists on checking against 0
and do a js
(Jump if Sign
)… In other words, we will jump (leave the loop) if our counter is negative. Otherwise, we decrease the counter by 1 unit, set edx
to point to our buffer (check fread
above), eax
to our counter/index, get the value from the buffer and compare it with 0x61
that is ASCII for the lowercase a
letter. Confused… OK, let’s do it again in slow-mo.
First, our local variables table:
ESP + 0x18 -> FILE *f;
ESP + 0x1c -> char* buffer
ESP + 0x14 -> int n;
Then the code and its interpretation with the help of our local variables table:
0x8048550: subl $0x1,0x14(%esp) ; n--
0x8048555: lea 0x1c(%esp),%edx ; edx = &buffer
0x8048559: mov 0x14(%esp),%eax ; eax = n
0x804855d: add %edx,%eax ; edx = &buffer + n = &buffer[n]
0x804855f: movzbl (%eax),%eax ; eax = buffer[n]
0x8048562: cmp $0x61,%al ; if (al == 'a')
0x8048564: je 0x8048549 ; continue;
So, we are checking that our buffer is filled with n
a
’s. If any of the characters read from the file is other than an a
, the code will continue like this:
0x8048566: cmpl $0x0,0x14(%esp)
0x804856b: jns 0x8048580 ; if (n > 0) goto Fail_msg_and_exit
0x804856d: movl $0x804864b,(%esp)
0x8048574: call 0x80483b0 <puts@plt> ; puts "Key is Correct"
0x8048579: mov $0x0,%eax
0x804857e: jmp 0x8048591 ; return 0
We check again the value of n
… If it is positive (jns
), that means that we have left the loop before processing the 0x10
characters and the key in the file is wrong, so we have to jump to the code showing the Wrong Key
message. Otherwise we show the ‘Key is Correct’ string and we jump to the exit code to leave the main
function.
So, this is it for a simple program like this.
Note that you may get different addresses in your system and even slightly different code, depending on your gcc or libc version. But the important thing is the process and that you should be able to reproduce.
Also note that this is static analysis. We haven’t run the code at all. In this case it is not needed, but sometimes it will save you a lot of time (think for instance on a crypter…)
Reversing with Radare2
The process above is a bit tedious, and actually, there is no fun on it once you know what you are doing. It is interesting to do it at the beginning to fully understand what is going on. And it is also good to know how to do it when your super-fancy tool fails. However, if you are in a hurry or this is your everyday work, you will want to use more advanced tools. Radare2 is one of those tools.
Radare2 will automate a lot of things for us. This is a very brief introduction on how to reverse this specific challenge using this tool.
First you load the program in radare2 (the program is called r2
)
$ r2 ./kg
-- Setup dbg.fpregs to true to visualize the fpu registers in the debugger view.
[0x080483f0]>
Yes, you get a different message every time you start the program.
You can see that it automatically detects the entry point 0x080483f0
and starts from there. We have already saved a few keystrokes.
After that you will usually ask radare to analyse the binary. That is the a
command family. Just type a
and press ENTER to get a list of the options. For this example, running aa
will locate most of the information we need… including the main
function (usually you will do the aaaa
to get as much as you can from the analysis):
[0x080483f0]> aa
[x] Analyze all flags starting with sym. and entry0 (aa)
[0x080483f0]>
And now we can just look at the main
function
[0x080483f0]> pdf @main
/ (fcn) main 191
| ; DATA XREF from 0x08048407 (main)
| 0x080484ed 55 push ebp
| 0x080484ee 89e5 mov ebp, esp
| 0x080484f0 83e4f0 and esp, 0xfffffff0
| 0x080484f3 83ec30 sub esp, 0x30
| 0x080484f6 65a114000000 mov eax, dword gs:[0x14] ; [0x14:4]=1
| 0x080484fc 8944242c mov dword [esp + 0x2c], eax
| 0x08048500 31c0 xor eax, eax
| 0x08048502 c74424044086. mov dword [esp + 4], 0x8048640 ; [0x8048640:4]=0x6b006272 ; "rb" @ 0x8048640
| 0x0804850a c70424438604. mov dword [esp], str.key.txt ; [0x8048643:4]=0x2e79656b LEA str.key.txt ; "key.txt" @ 0x8048643
| 0x08048511 e8cafeffff call sym.imp.fopen
| 0x08048516 89442418 mov dword [esp + 0x18], eax
| 0x0804851a 837c241800 cmp dword [esp + 0x18], 0
| ,=< 0x0804851f 7466 je 0x8048587
| | 0x08048521 8b442418 mov eax, dword [esp + 0x18] ; [0x18:4]=0x80483f0 section..text
| | 0x08048525 8944240c mov dword [esp + 0xc], eax
| | 0x08048529 c74424081000. mov dword [esp + 8], 0x10 ; [0x10:4]=0x30002
| | 0x08048531 c74424040100. mov dword [esp + 4], 1
| | 0x08048539 8d44241c lea eax, [esp + 0x1c] ; 0x1c
| | 0x0804853d 890424 mov dword [esp], eax
| | 0x08048540 e85bfeffff call sym.imp.fread
| | 0x08048545 89442414 mov dword [esp + 0x14], eax
| | 0x08048549 837c241410 cmp dword [esp + 0x14], 0x10 ; [0x10:4]=0x30002
| ,==< 0x0804854e 7537 jne 0x8048587
| .---> 0x08048550 837c241400 cmp dword [esp + 0x14], 0
| ,====< 0x08048555 7816 js 0x804856d
| |||| 0x08048557 836c241401 sub dword [esp + 0x14], 1
| |||| 0x0804855c 8d54241c lea edx, [esp + 0x1c] ; 0x1c
| |||| 0x08048560 8b442414 mov eax, dword [esp + 0x14] ; [0x14:4]=1
| |||| 0x08048564 01d0 add eax, edx
| |||| 0x08048566 0fb600 movzx eax, byte [eax]
| |||| 0x08048569 3c61 cmp al, 0x61 ; 'a'
| |`===< 0x0804856b 74e3 je 0x8048550
| `----> 0x0804856d 837c241400 cmp dword [esp + 0x14], 0
| ,===< 0x08048572 7913 jns 0x8048587
| ||| 0x08048574 c704244b8604. mov dword [esp], str.Key_is_correct ; [0x804864b:4]=0x2079654b LEA str.Key_is_correct ; "Key is correct" @ 0x804864b
| ||| 0x0804857b e830feffff call sym.imp.puts
| ||| 0x08048580 b800000000 mov eax, 0
| ,====< 0x08048585 eb11 jmp 0x8048598
| |```-> 0x08048587 c704245a8604. mov dword [esp], str.Wrong_Key ; [0x804865a:4]=0x6e6f7257 LEA str.Wrong_Key ; "Wrong Key" @ 0x804865a
| | 0x0804858e e81dfeffff call sym.imp.puts
| | 0x08048593 b800000000 mov eax, 0
| | ; JMP XREF from 0x08048585 (main)
| `----> 0x08048598 8b4c242c mov ecx, dword [esp + 0x2c] ; [0x2c:4]=0x280009 ; ','
| 0x0804859c 65330d140000. xor ecx, dword gs:[0x14]
| ,=< 0x080485a3 7405 je 0x80485aa
| | 0x080485a5 e8e6fdffff call sym.imp.__stack_chk_fail
| `-> 0x080485aa c9 leave
\ 0x080485ab c3 ret
EXERCISE:
Try to run thepdf @main
command before running theaa
command and see what happens.
So what do we see in here?
- On the left side we see some ASCII graphing indicating the flow of the program. Roughly the jumps the program will do.
- It has already assigned symbols to all function calls
- It is showing as a comment after the assembly, the content of the memory pointed by any pointer in the code. For instance, take a look to the two
fopen
parameters at the beginning of the program - It will also assign symbols to program data. See how it creates the symbols
str.Key_is_correct
and symbolstr.Wrong_key
.
Overall, we are just getting a pretty good overview of the program just with this two commands (aa
and pdf @main
).
Now you need to read the assembly just like before. If this is not easy enough for you, just jump into the radare2 visual mode. First set the main function (s main
) and then type V
(has to be capital)+ENTER. Then press V
again… there you go!
EXERCISE:
Crack the program so it always shows the “Key is correct” message
EXERCISE:
Recompile for 64bits (remove -m32 from the compilation command) and repeat the process.
Now go and Reverse’em all!