The hardest GNU/Linux reverse engineering in the 0x00sec CTF 2017 event was OBFTEANATION. Only 2 person solved this challenge during the competition, so I guess some of you have been waiting for this write up.
I just wrote it to be hard but I didn’t try to solve it myself until now. Of course it is not fair because I wrote it so I know where to look. I mean it would be way easier for me than it was for you. In all honesty I’m not sure I could solve it myself if I were in the competition. Anyway, I will try to follow the same approach I did for [Shuffled Love] ([0x00CTF] Shuffled Love Write-Up) and try to explain the different concepts and techniques used in the challenge. I will do my best to make reasonable assumption not based on my inside knowledge of the challenge but based on what could be spotted by anyone.
Analysing the binary
As usually, we will start analysing the binary using the standard tools:
$ file sc04 sc04: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=d716025f49379937e5875f45ce36eaf8fc8e595f, stripped
So we are facing a 64-bit GNU/Linux stripped dynamically linked binary. Trying
strings does not show anything useful and
gdb cannot deal with the file:
$ objdump -d sc04 objdump: sc04: File truncated [email protected]:/tmp$ gdb -q sc04 "/tmp/sc04": not in executable format: File truncated
radare2 however can open it and you could just start working on the binary with that tool. However, as I said at the beginning, let’s try to learn as much as we can from this challenge.
So, let’s start trying to figure out what is wrong with this binary and how we could fix it.
Fixing the binary
readelf against the binary produces the following output:
$ readelf -h sc04 ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry point address: 0x400690 Start of program headers: 64 (bytes into file) Start of section headers: 2105984 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 9 Size of section headers: 64 (bytes) Number of section headers: 25901 Section header string table index: 28 readelf: Error: Reading 0x194b40 bytes extends past end of file for section headers
readelf shows most of the ELF header fields but it reports an error at the end, saying that it failed to read the section headers. Looking at its output we see that the number of headers reported by the tool is
25901 that is a lot for a normal binary. So let’s start fixing that entry.
Normally, the section table of an ELF binary is located at the end of the file. From the header info we can see that the section table starts at offset
2105984 and that each section header requires 64 bytes. This last value could also be fake but if you check any other 64bits ELF in your system you will find out that 64 bytes is the usual value for this parameter.
Taking into account that the file is
2107840 bytes long and, as we said, the section table is at the end of the file, we can conclude that the number of sections in the binary is:
(2107840 - 2105984) / 64 = 29 (0x1d)
Now we can try to patch the binary with the right number of sections and see what happens:
$ cp sc04 sc04.patch $ echo -en "\x1d\x00" | dd of=sc04.patch seek=60 count=2 bs=1 conv=notrunc
Note: You can easily get ELF header field offsets in this [wiki page] (https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header)
Now you can try again
gdb and everything will look normal. So we can start!
A Decoy Challenge
If you just start reversing the binary the normal way, you will find a relatively easy challenge. You can go and solve it to improve your skills but, once you are done, you will find a flag like this:
The flags says it clearly: “This is not the flag you’re looking for”. If you tried to enter this flag you would find out that it was not correct. That means that the real code to reverse is somewhere else.
To get to the decoy challenge you had probably analysed the entry point and extracted the
main function. There is not much there except the
__libc_start_main function that actually call the real
main (as you already know). However this function does a few other things. Let’s look at its prototype once again (what?.. you haven’t look at it before?).
int __libc_start_main(int *(main) (int, char * *, char * *), int argc, char * * ubp_av, void (*init) (void), void (*fini) (void), void (*rtld_fini) (void), void (* stack_end));
Do you see the
fini parameters?. Well these are initialisation function that get executed before and after
main respectively. So the program should do something before running the normal
main, and that
init function is a reasonable place to look at.
The entry point for this challenge looks like this:
gdb-peda$ x/20i 0x400690 0x400690: xor ebp,ebp 0x400692: mov r9,rdx 0x400695: pop rsi 0x400696: mov rdx,rsp 0x400699: and rsp,0xfffffffffffffff0 0x40069d: push rax 0x40069e: push rsp 0x40069f: mov r8,0x4011f0 0x4006a6: mov rcx,0x401180 0x4006ad: mov rdi,0x400daa 0x4006b4: call 0x400650 <[email protected]> 0x4006b9: hlt
0x400daa is the address of the decoy
RCX contains the pointer to the
init function, that in this case is
Assuming that this is a normal dynamic binary, the
init function is normally set up to
__libc_csu_init. Well, this will be obvious as soon as you look at the code of this function.
Finding the initial code
So, in case you do not know what
__libc_csu_init does, you can google the details, but roughly, the function goes through the pointers stored in a special binary section named
.init_array and runs all the functions in there.
It looks like the right place to look. Actually, unless something was modified on
__libc_csu_init itself, there is no other place to look for the code that I’m aware of, at least at this early stage… Of course, if you do not find anything useful following the next steps you should start looking to the code of those functions.
As our binary is stripped, we will have to find the pointer to the
.init_array section manually. Let’s look at the
$ x/50i 0x401180 0x401180: push r15 0x401182: mov r15d,edi 0x401185: push r14 0x401187: mov r14,rsi 0x40118a: push r13 0x40118c: mov r13,rdx 0x40118f: push r12 0x401191: lea r12,[rip+0x200c70] # 0x601e08 0x401198: push rbp 0x401199: lea rbp,[rip+0x200c78] # 0x601e18 0x4011a0: push rbx 0x4011a1: sub rbp,r12 0x4011a4: xor ebx,ebx 0x4011a6: sar rbp,0x3 0x4011aa: sub rsp,0x8 0x4011ae: call 0x4005e0 (...)
You can see two pointers in there. The first one is the one we want. Why?, well, because we have already seen this code a bunch of times. An exercise you can do is to write a simple hello world program and compile it without stripping the symbols. Then load it on your preferred debugger and you will see the references to these sections clearly in the disassembly.
Anyway, let’s check what is in the
gdb-peda$ x/4a 0x601e08 0x601e08: 0x400750 0x400a1c 0x601e18: 0x400730 0x0
The first function is always there. Be free to further research how a binary is started up on a GNU/Linux system, but for the sake of brevity (we still have to talk about a few things), the function we are interested on is the one at
Analysing the constructor
The functions executed by
__libc_csu_init are usually named constructors, because they are executed at the very beginning of the program.
The function is a bit long so we will be analysing it in pieces as we have done in other write-ups. Let’s take a look to our constructor:
0x400a1c: push rbp 0x400a1d: mov rbp,rsp 0x400a20: sub rsp,0x20 0x400a24: mov DWORD PTR [rbp-0x20],0x0 0x400a2b: mov eax,0xb62 0x400a30: mov DWORD PTR [rbp-0x1c],eax 0x400a33: mov QWORD PTR [rbp-0x18],0x3df512 0x400a3b: mov QWORD PTR [rbp-0x10],0x3e8ce9 0x400a43: mov QWORD PTR [rbp-0x8],0x3ded36 0x400a4b: add QWORD PTR [rbp-0x8],0x212ca 0x400a53: mov DWORD PTR [rbp-0x20],0x0 0x400a5a: jmp 0x400aa0
Nothing special here. The usual function prologue and initialisation of some local vars. As we can see this function does not seem to receive any parameter. Let’s write down our local variables.
[rbp-0x8] Var1 = 0x3ded36 + 0x212ca = 0x400000 [rbp-0x10] Var2 = 0x3e8ce9 [rbp-0x18] Var3 = 0x3df512 [rbp-0x1c] Var4 = 0xb62 [rbp-0x20] Var5 = 0
Then we find an unconditional jump that may indicate the start of a loop. Let’s check it:
0x400a5a: jmp 0x400aa0 0x400a5c: mov eax,DWORD PTR [rbp-0x20] ; Var5 0x400a5f: movsxd rdx,eax 0x400a62: mov rax,QWORD PTR [rbp-0x8] ; Var1 0x400a66: add rax,rdx 0x400a69: movzx eax,BYTE PTR [rax] 0x400a6c: cmp al,0xc3 0x400a6e: jne 0x400a9c 0x400a70: mov eax,DWORD PTR [rip+0x201716] ; 0x60218c 0x400a76: mov edx,DWORD PTR [rbp-0x20] ; Var5 0x400a79: movsxd rcx,edx 0x400a7c: mov rdx,QWORD PTR [rbp-0x8] ; Var1 0x400a80: add rdx,rcx 0x400a83: cdqe 0x400a85: mov QWORD PTR [rax*8+0x6021a0],rdx ; global g 0x400a8d: mov eax,DWORD PTR [rip+0x2016f9] ; 0x60218c 0x400a93: add eax,0x1 0x400a96: mov DWORD PTR [rip+0x2016f0],eax ; 0x60218c 0x400a9c: add DWORD PTR [rbp-0x20],0x1 ; Var5 ++ 0x400aa0: mov eax,DWORD PTR [rbp-0x20] ; Var5 0x400aa3: cmp eax,DWORD PTR [rbp-0x1c] ; Var4 0x400aa6: jge 0x400ab3 0x400aa8: mov eax,DWORD PTR [rip+0x2016de] ; 0x60218c 0x400aae: cmp eax,0x8 0x400ab1: jle 0x400a5c
We have identified the loop counter as
Var5 and we will rename it to
i. We have also found a global var array and a global variable at
0x60218c. We can also infer that
Var1 is a pointer that we will rename to
Let’s look for a sec to the instructions related to the global vars:
0x400a85: mov QWORD PTR [rax*8+0x6021a0],rdx ; global g 0x400a8d: mov eax,DWORD PTR [rip+0x2016f9] # 0x60218c 0x400a93: add eax,0x1 0x400a96: mov DWORD PTR [rip+0x2016f0],eax # 0x60218c
For the first one you can see how register
rax is multiplied by 8, meaning that we are accessing a
QWORD array. The next 3 instructions increments in 1 the value of a global
DWORD a 32bits integer. Let’s call the first global
g and the second global
Let’s update our variables table
Globals 0x6021a0 qword g 0x60218c dword g_cn
Locals [rbp-0x8] ptr = 0x3ded36 + 0x212ca = 0x400000 [rbp-0x10] Var2 = 0x3e8ce9 [rbp-0x18] Var3 = 0x3df512 [rbp-0x1c] len = 0xb62 [rbp-0x20] i = 0
With all this information we can conclude that this loop iterates from 0 to
0xb62 looking for value
0xc3 and it finished either when we complete this loop or when we find 9
What does this mean?. If you look carefully, you will notice that the search starts at address
0x400000 that is usually the starting address for the
.text segment. Check it with
readelf. Knowing that
0xc3 is the opcode of the assembly mnemonic
RET, what this loop does is to find 9 address in the text code containing a
Analysing the Constructor 2
Let’s continue with the analysis of our constructor.
0x400ab3: add QWORD PTR [rbp-0x10],0x17fe0 ; Var2 += 0x17fe0 0x400abb: push QWORD PTR [rbp-0x10] ; PUSH Var2 0x400abe: mov DWORD PTR [rbp-0x20],0xffffffff ; i = -1; 0x400ac5: add DWORD PTR [rbp-0x20],0x1 ; i++
The next thing we find is an update of
Var2 that is immediately pushed in the stack, followed by what looks like part of a loop. We will find soon if that is the case.
Before continuing, let’s update the value of
[rbp-0x10] Var2 = 0x3e8ce9 + 0x17fe0 = 0x400CC9
OK, that looks like a text code address again. It is early to make assumption on this, but the fact that this value has been pushed in the stack is important. Let’s keep analysing the code.
0x400ac9: mov rax,QWORD PTR [rip+0x2016b0] ; 0x602180 <environ> 0x400ad0: mov edx,DWORD PTR [rbp-0x20] ; EDX = i 0x400ad3: movsxd rdx,edx ; RDX = i 0x400ad6: shl rdx,0x3 ; RDX >>=3 | RDX = RDX *8 0x400ada: add rax,rdx ; RAX = &environ[i*8] 0x400add: mov rax,QWORD PTR [rax] ; RAX = environ[i*8] 0x400ae0: test rax,rax ; TEST 0 0x400ae3: jne 0x400aee ; Jump if not NULL 0x400ae5: mov rax,QWORD PTR [rip+0x2016bc] ; 0x6021a8 -> g 0x400aec: jmp rax ; JMP g
In this case
gdb has already identified the
environ variable. This variable is a global that contains an array of strings storing all the environmental variables accessible to this process. The code above is checking that the item
environ[i] is not NULL. In case it is null, the program will jump to one of the address found in the initial loop that we know points to a
So the code above is leaving the function, but… Do you remember the
PUSH we have seen some instructions before?. Sure, that is the address where the function is returning:
0x400CC9. It is not actually returning to
__libc_csu_init but to this other address. Let’s see what does it contain:
gdb-peda$ x/50i 0x400CC9 0x400cc9: jmp 0x400cd4 0x400ccb: mov rax,QWORD PTR [rip+0x2014de] # 0x6021b0 0x400cd2: jmp rax 0x400cd4: leave 0x400cd5: ret
OK, it effectively is returning after all. Therefore, if the loop that goes through all the environmental variables gets to the end of the
environ global var, the program will just return normally.
Let’s see what happens otherwise:
0x400aee: mov rax,QWORD PTR [rip+0x20168b] # 0x602180 <environ> 0x400af5: mov edx,DWORD PTR [rbp-0x20] 0x400af8: movsxd rdx,edx 0x400afb: shl rdx,0x3 0x400aff: add rax,rdx 0x400b02: mov rax,QWORD PTR [rax] ; RAX = environ[i*8] 0x400b05: test rax,rax 0x400b08: je 0x400ccb
This code just checks if
environ[i*8] is NULL. If it is null, the program jumps to
0x400ccb. Let’s see what is in there:
gdb-peda$ x/20i 0x400ccb 0x400ccb: mov rax,QWORD PTR [rip+0x2014de] # 0x6021b0 0x400cd2: jmp rax 0x400cd4: leave 0x400cd5: ret
Does this sound familiar?. Yes, this is again a
RET instruction. Or in other words, if the content of
environ[i*8] is equal to NULL, we will return in the exact same way that we had described above.
Back to the main loop, let’s look at the next chunk of code:
0x400b0e: mov rax,QWORD PTR [rip+0x20166b] # 0x602180 <environ> 0x400b15: mov edx,DWORD PTR [rbp-0x20] 0x400b18: movsxd rdx,edx 0x400b1b: shl rdx,0x3 0x400b1f: add rax,rdx 0x400b22: mov rax,QWORD PTR [rax] 0x400b25: mov rdi,rax 0x400b28: call 0x400630 <[email protected]> 0x400b2d: cmp rax,0xa 0x400b31: jne 0x400ccb
I bet you can directly read this by now. Exactly!. The function is also returning if the length of
environ[i*8] is different of 10 (
So far we can say that the program is looking for an environmental variable of 10 characters!.
Figuring Out the Env Variable
The rest of the loop is composed of a sequence of similar instruction blocks. Let’s analyse one of them and then you can work out the rest!
0x400b37: mov rax,QWORD PTR [rip+0x201642] # 0x602180 <environ> 0x400b3e: mov edx,DWORD PTR [rbp-0x20] 0x400b41: movsxd rdx,edx 0x400b44: shl rdx,0x3 0x400b48: add rax,rdx 0x400b4b: mov rax,QWORD PTR [rax] 0x400b4e: movzx eax,BYTE PTR [rax] ; EAX = environ[i*8] 0x400b51: cmp al,0x4d ; EAX == 0x4d == 'M' 0x400b53: jne 0x400ac5
So this code is checking that the first character of
environ[i*8] is equal to ‘M’… We are finally getting somewhere.
If you go through the whole sequence of instructions like this one, you will find out that the program is looking for the environmental variable… I won’t say it :P… you will have to find it yourself!
When the character found does not match, the program will jump to
0x400ac5: add DWORD PTR [rbp-0x20],0x1 ; i++ 0x400ac9: mov rax,QWORD PTR [rip+0x2016b0] # 0x602180 <environ> 0x400ad0: mov edx,DWORD PTR [rbp-0x20] ; EDX = i 0x400ad3: movsxd rdx,edx ; RDX = EDX 0x400ad6: shl rdx,0x3 ; RDX *= 8 0x400ada: add rax,rdx ; RAX = &environ[i*8] 0x400add: mov rax,QWORD PTR [rax] ; RAX = environ[i*8] 0x400ae0: test rax,rax ; NULL? 0x400ae3: jne 0x400aee ; Repeat
That looks like part of the loop we identified before. Very good, so we just need to figure out what happens when the proper environmental variable has been set:
0x400caf: mov rax,QWORD PTR [rbp-0x18] ; RAX = Var3 0x400cb3: add rax,0x212ca ; RAX = Var3 + 0x212ca 0x400cb9: mov QWORD PTR [rbp-0x8],rax ; ptr = RAX 0x400cbd: mov rax,QWORD PTR [rip+0x2014dc] ; 0x6021a0 g 0x400cc4: push QWORD PTR [rbp-0x8] ; PUSH Var3 + 0x212ca 0x400cc7: jmp rax ; JMP g 0x400cc9: jmp 0x400cd4 ; 0x400ccb: mov rax,QWORD PTR [rip+0x2014de] ; 0x6021b0 0x400cd2: jmp rax 0x400cd4: leave 0x400cd5: ret
OK, so this is the thing. This code will push in the stack
Var3 + 0x212ca or in other words
0x409FB3, that again is an address in the
text segment. Then it will jump into one of the
RET instruction in the global array
g, in this case
g, effectively calling the function located at that address.
Did we find the challenge?.. Let’s see
Note: In case you were using
radare2, the values in the comparisons for each character will just be shown there and just browsing through the function, seeing those characters and the reference to
environ, will be enough for you to find the variable you have to set.
So, let’s see if all this effort brought us somewhere.
$ VAR=VALUE /tmp/sc04 Congrats!. You have found the real challenge!!! Let's Play Password :
Pretty good. We have found the real challenge!!!
Now we can do two things. Keep going as per now reverse the function at
0x409FB3 or using the debugger to find the code we are interesting in. I will just follow the second case as this is going to be a pretty long write-up. You can try yourself to follow the reverse path for you to practice. Be free to ask in the comments if you find problems reversing that function.
The debugger path is easy. Just run the program in the debugger and press CTRL+C when the program asks for the password. Alternatively you can run the program normally and attach to it using the
-pid PROCESS_PID flag. I used this last path as, for some reason, setting the environmental variable in the shell and launching
gdb didn’t work for me. If anybody know why just comment below, so I do not have to figure that out
Facing the real challenge
$ VAR=VALUE ./sc04.patch (... in another terminal in a far away galaxy ...) $ sudo gdb -pid /tmp/sc04.patch (... messages...) gdb-peda $ bt #0 0x00007fb74ec42330 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fb74ebcd5b0 in _IO_new_file_underflow (fp=0x7fb74ef16640 <_IO_2_1_stdin_>) at fileops.c:613 #2 0x00007fb74ebce53e in __GI__IO_default_uflow (fp=0x7fb74ef16640 <_IO_2_1_stdin_>) at genops.c:435 #3 0x00007fb74ebc2274 in __GI__IO_getline_info ([email protected]=0x7fb74ef16640 <_IO_2_1_stdin_>, [email protected]=0x7ffcca1c86e0 "", n=0x7f, [email protected]=0xa, [email protected]=0x1, [email protected]=0x0) at iogetline.c:69 #4 0x00007fb74ebc2368 in __GI__IO_getline ([email protected]=0x7fb74ef16640 <_IO_2_1_stdin_>, [email protected]=0x7ffcca1c86e0 "", n=<optimised out>, [email protected]=0xa, [email protected]=0x1) at iogetline.c:38 #5 0x00007fb74ebc1206 in _IO_fgets (buf=0x7ffcca1c86e0 "", n=<optimised out>, fp=0x7fb74ef16640 <_IO_2_1_stdin_>) at iofgets.c:56 #6 0x0000000000402175 in ?? () #7 0x00007fb74eb74f45 in __libc_start_main (main=0x402115, argc=0x1, argv=0x7ffcca1c8858, init=<optimised out>, fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7ffcca1c8848) at libc-start.c:287 #8 0x00000000004006b9 in ?? ()
We have interrupted the program while waiting for the user input so we get a pretty long stack backtrace. Quickly examining it we can see that frame
#6 is the one interesting for us.
gdb-peda$ x/20i 0x0000000000402175 0x402175: lea rax,[rbp-0x90] 0x40217c: mov rdi,rax 0x40217f: call 0x402000 0x402184: mov eax,0x0 0x402189: mov rcx,QWORD PTR [rbp-0x8] 0x40218d: xor rcx,QWORD PTR fs:0x28 0x402196: je 0x40219d 0x402198: call 0x400640 <[email protected]> 0x40219d: leave 0x40219e: ret
OK, you can now try to find the beginning of the function to get more context, but, if you are in a hurry, it is pretty safe to assume that, the user input will be stored at
[rbp-0x90] and that it will be passed to the function at
0x402000. After that, the function just returns and, looking to the stack backtrace, the program will just finish. So, that function is the one we have to actually reverse.
We are getting close. The interesting function is again pretty long and we will go once again by chunks
gdb-peda$ x/100i 0x402000 0x402000: push rbp 0x402001: mov rbp,rsp 0x402004: sub rsp,0x20 0x402008: mov QWORD PTR [rbp-0x18],rdi 0x40200c: mov DWORD PTR [rbp-0x8],0x0 0x402013: mov rax,QWORD PTR [rbp-0x18] 0x402017: mov rdi,rax 0x40201a: call 0x400630 <[email protected]> 0x40201f: mov edx,DWORD PTR [rip+0x20007b] # 0x6020a0 0x402025: movsxd rdx,edx 0x402028: cmp rax,rdx 0x40202b: ja 0x40203b 0x40202d: mov rax,QWORD PTR [rbp-0x18] 0x402031: mov rdi,rax 0x402034: call 0x400630 <[email protected]> 0x402039: jmp 0x402041 0x40203b: mov eax,DWORD PTR [rip+0x20005f] # 0x6020a0 0x402041: mov DWORD PTR [rbp-0x4],eax
By now, you should be able to analyse this one by yourself. It basically calculates the size of the user input (
RDI stored in local
[rbp-0x18]) and compares it to the value of the global at
0x6020a0 and stores at
[rbp-0x4] the maximum of both values.
0x402044: mov DWORD PTR [rbp-0xc],0x0 0x40204b: jmp 0x40207f 0x40204d: mov eax,DWORD PTR [rbp-0xc] 0x402050: cdqe 0x402052: mov rax,QWORD PTR [rax*8+0x6020c0] 0x40205a: mov edx,DWORD PTR [rbp-0xc] 0x40205d: movsxd rcx,edx 0x402060: mov rdx,QWORD PTR [rbp-0x18] 0x402064: add rdx,rcx 0x402067: movzx edx,BYTE PTR [rdx] 0x40206a: movsx ecx,dl 0x40206d: mov edx,DWORD PTR [rbp-0xc] 0x402070: mov rsi,QWORD PTR [rbp-0x18] 0x402074: mov edi,ecx 0x402076: call rax 0x402078: add DWORD PTR [rbp-0x8],eax 0x40207b: add DWORD PTR [rbp-0xc],0x1 0x40207f: mov eax,DWORD PTR [rbp-0xc] 0x402082: cmp eax,DWORD PTR [rbp-0x4] 0x402085: jl 0x40204d 0x402087: cmp DWORD PTR [rbp-0x8],0x0 0x40208b: jne 0x4020f5 0x40208d: mov DWORD PTR [rbp-0xc],0x0 0x402094: jmp 0x4020e8
Then we find a new loop. The counter is at
[rbp-0xc] and it runs as many iterations as calculated in the previous block. If we carefully look at this block we will see that it basically calls a series of functions (
jmp rax) whose addresses are stored in a global variable at
0x6020c0, The return values of this functions are then accumulated in
We can see the check against 0 towards the end of the function. You can further analyse the function, but in order to keep this shorter, I can tell you that we need to provide an input that will make the value of
Before starting the analysis of the different functions, let’s look at the parameters used to call all those functions.
Par1 -> EDI = user_input[i]
Par2 -> ESI = user_input
Par3 -> RDX = i
So we are calling a function for each character in the user provided password!.
Reversing the function array
So, the first thing we have to do is to get the addresses of the functions in that global array:
gdb-peda$ x/20a 0x6020c0 0x6020c0: 0x40219f 0x4024d5 0x6020d0: 0x40278f 0x402ac5 0x6020e0: 0x402daf 0x40309c 0x6020f0: 0x40337d 0x403656 0x602100: 0x40390c 0x403c67 0x602110: 0x403f8b 0x40421b 0x602120: 0x4044d7 0x4047cb 0x602130: 0x404ab5 0x404dbe 0x602140: 0x4050ae 0x4053a5 0x602150: 0x0 0x0
There are a few… If you look at the first one you will find a pretty long function doing all kinds of operations. This looks like a great candidate for using symbolic execution. But as I’m not familiar with that type of analysis, I have to try to solve this in a different way.
As we said, we want the function to return 0. So let’s check what the first function returns:
0x4024cb: mov eax,DWORD PTR [rbp-0x28] 0x4024ce: add rsp,0x58 0x4024d2: pop rbx 0x4024d3: pop rbp 0x4024d4: ret
So, it is returning the content of local variable
[rbp-0x28]. Now, let’s identify were the parameters are stored:
0x40219f: push rbp 0x4021a0: mov rbp,rsp 0x4021a3: push rbx 0x4021a4: sub rsp,0x58 0x4021a8: mov eax,edi 0x4021aa: mov QWORD PTR [rbp-0x60],rsi 0x4021ae: mov DWORD PTR [rbp-0x58],edx 0x4021b1: mov BYTE PTR [rbp-0x54],al
What leads us to the following table:
[rbp-0x54] = Par1 = user_input[i]
[rbp=0x58] = Par3 = i (string index)
[rbp-0x60] = Par2 = user_input
Now, we can try to isolate the code that modifies the local var that is returned and, that way, try to get a simpler version of the function. Let’s find the instructions that modifies that local variable:
gdb-peda$ x/260i 0x40219f (...) 0x4021b4: mov DWORD PTR [rbp-0x3c],0x0 0x4021bb: mov eax,DWORD PTR [rbp-0x3c] 0x4021be: mov DWORD PTR [rbp-0x28],eax (...) 0x402307: movzx eax,BYTE PTR [rax] 0x40230a: movsx eax,al 0x40230d: sub eax,0x2d 0x402310: add DWORD PTR [rbp-0x28],eax (...) 0x4023c9: movsx eax,BYTE PTR [rbp-0x54] 0x4023cd: xor eax,DWORD PTR [rbp-0x58] 0x4023d0: mov DWORD PTR [rbp-0x28],eax ; 0x4023d3: mov DWORD PTR [rbp-0x3c],0x56 0x4023da: mov eax,DWORD PTR [rbp-0x3c] 0x4023dd: xor DWORD PTR [rbp-0x28],eax (...) 0x402476: xor DWORD PTR [rbp-0x28],0x26 (...) 0x4024a4: mov eax,DWORD PTR [rbp-0x58] 0x4024a7: mov DWORD PTR [rbp-0x20],eax 0x4024aa: mov eax,DWORD PTR [rbp-0x20] 0x4024ad: xor DWORD PTR [rbp-0x28],eax (...) 0x4024cb: mov eax,DWORD PTR [rbp-0x28] (.. and then returns ...)
We can see that the third block just override the value on the local var, invalidating any previous operation. After all that, only xor operations are performed until the function returns. So, this first function can be simplified to:
0x4023c9: movsx eax,BYTE PTR [rbp-0x54] ; EAX = user_input[i] 0x4023cd: xor eax,DWORD PTR [rbp-0x58] ; XOR = user_input[i] ^ i 0x4023d0: mov DWORD PTR [rbp-0x28],eax ; Val = user_input[i] ^ i 0x4023d3: mov DWORD PTR [rbp-0x3c],0x56; 0x4023da: mov eax,DWORD PTR [rbp-0x3c] ; EAX = 0x56 0x4023dd: xor DWORD PTR [rbp-0x28],eax ; Val = (user_input[i] ^ i) ^0x56 (...) 0x402476: xor DWORD PTR [rbp-0x28],0x26; Val = (user_input[i] ^ i) ^0x56 ^ 0x26 (...) 0x4024a4: mov eax,DWORD PTR [rbp-0x58] ; EAX = i 0x4024a7: mov DWORD PTR [rbp-0x20],eax 0x4024aa: mov eax,DWORD PTR [rbp-0x20]; EAX = i 0x4024ad: xor DWORD PTR [rbp-0x28],eax; Val = (user_input[i] ^ i) ^0x56 ^ 0x26 ^i
The first function implements the following expression:
Val = (user_input[i] ^ i) ^0x56 ^ 0x26 ^i = user_input[i] ^ 0x56 ^ 0x26 = user_input ^ 0x70
Therefore, the first key value is
p and we are just in front of an obfuscated XOR encoder. You can take a look to another function to be sure, but I can tell you that that is the case for all the functions we have to analyse.
Getting the key and then the flag
Knowing that the checking is just a XOR operation we can easily get the password setting a breakpoint just before calling each of the functions, setting
EDI to 0 (that is the only value used by the functions for its calculation) and then running the function. The result of any XOR operation against 0 is the other operator, i.e. the password. This way we can easily get the password and, after entering it also get the flag.
The proper way to do this, I guess, is using a gdb script. If you want to write one, remember that you can set the value of a register with the command:
set $edi = 0
So, this was it for the OBFTEANATION challenge. The name comes from the two main difficulties it has. The first one is the obfuscation of the password checking which, at the end was not that hard. The TEA is related to the fact that the program was crypted with XXTEA. Using the dynamica analysis, jumping straight into debugging the program allow us to skip that part, but it was there.
As I said at the beginning I didn’t try to solve the challenge myself after writing it… I just wrote something that I hoped it would be hard enough to be fun for more advanced contestants. This means that I have just solved it now in order to be able write this, so there are many chances that this is not the best way of solving it. Be free to come up with an alternative write-up explaining how did you solve it. I will be interested on seeing alternative ways.
Finally, in principle I will not write write-ups for HexaPASS and challenge-004. They will be pretty much a repetition of Shuffled Love and this one. However if any of you have issues with then, just let me know.