(pico) #1

The hardest GNU/Linux reverse engineering in the 0x00sec CTF 2017 event was OBFTEANATION. Only 2 person solved this challenge during the competition, so I guess some of you have been waiting for this write up.

I just wrote it to be hard but I didn’t try to solve it myself until now. Of course it is not fair because I wrote it so I know where to look. I mean it would be way easier for me than it was for you. In all honesty I’m not sure I could solve it myself if I were in the competition. Anyway, I will try to follow the same approach I did for [Shuffled Love] ([0x00CTF] Shuffled Love Write-Up) and try to explain the different concepts and techniques used in the challenge. I will do my best to make reasonable assumption not based on my inside knowledge of the challenge but based on what could be spotted by anyone.

Analysing the binary

As usually, we will start analysing the binary using the standard tools:

$ file sc04
sc04: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=d716025f49379937e5875f45ce36eaf8fc8e595f, stripped

So we are facing a 64-bit GNU/Linux stripped dynamically linked binary. Trying strings does not show anything useful and objdump and gdb cannot deal with the file:

$ objdump -d sc04
objdump: sc04: File truncated
[email protected]:/tmp$ gdb -q  sc04
"/tmp/sc04": not in executable format: File truncated

radare2 however can open it and you could just start working on the binary with that tool. However, as I said at the beginning, let’s try to learn as much as we can from this challenge.

So, let’s start trying to figure out what is wrong with this binary and how we could fix it.

Fixing the binary

Running readelf against the binary produces the following output:

$ readelf -h sc04
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400690
  Start of program headers:          64 (bytes into file)
  Start of section headers:          2105984 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         25901
  Section header string table index: 28
readelf: Error: Reading 0x194b40 bytes extends past end of file for section headers

readelf shows most of the ELF header fields but it reports an error at the end, saying that it failed to read the section headers. Looking at its output we see that the number of headers reported by the tool is 25901 that is a lot for a normal binary. So let’s start fixing that entry.

Normally, the section table of an ELF binary is located at the end of the file. From the header info we can see that the section table starts at offset 2105984 and that each section header requires 64 bytes. This last value could also be fake but if you check any other 64bits ELF in your system you will find out that 64 bytes is the usual value for this parameter.

Taking into account that the file is 2107840 bytes long and, as we said, the section table is at the end of the file, we can conclude that the number of sections in the binary is:

(2107840 - 2105984) / 64 = 29 (0x1d)

Now we can try to patch the binary with the right number of sections and see what happens:

$ cp sc04 sc04.patch
$ echo -en "\x1d\x00" | dd of=sc04.patch seek=60 count=2 bs=1 conv=notrunc

Note: You can easily get ELF header field offsets in this [wiki page] (https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#File_header)

Now you can try again readelf, objdump or gdb and everything will look normal. So we can start!

A Decoy Challenge

If you just start reversing the binary the normal way, you will find a relatively easy challenge. You can go and solve it to improve your skills but, once you are done, you will find a flag like this:


The flags says it clearly: “This is not the flag you’re looking for”. If you tried to enter this flag you would find out that it was not correct. That means that the real code to reverse is somewhere else.

To get to the decoy challenge you had probably analysed the entry point and extracted the main function. There is not much there except the __libc_start_main function that actually call the real main (as you already know). However this function does a few other things. Let’s look at its prototype once again (what?.. you haven’t look at it before?).

int __libc_start_main(int *(main) (int, char * *, char * *), 
    	              int argc, char * * ubp_av, 
                      void (*init) (void), void (*fini) (void), 
                      void (*rtld_fini) (void), void (* stack_end));

Do you see the init and fini parameters?. Well these are initialisation function that get executed before and after main respectively. So the program should do something before running the normal main, and that init function is a reasonable place to look at.

The entry point for this challenge looks like this:

gdb-peda$ x/20i 0x400690
   0x400690:	xor    ebp,ebp
   0x400692:	mov    r9,rdx
   0x400695:	pop    rsi
   0x400696:	mov    rdx,rsp
   0x400699:	and    rsp,0xfffffffffffffff0
   0x40069d:	push   rax
   0x40069e:	push   rsp
   0x40069f:	mov    r8,0x4011f0
   0x4006a6:	mov    rcx,0x401180
   0x4006ad:	mov    rdi,0x400daa
   0x4006b4:	call   0x400650 <[email protected]>
   0x4006b9:	hlt

0x400daa is the address of the decoy main, and RCX contains the pointer to the init function, that in this case is 0x401180.

Assuming that this is a normal dynamic binary, the init function is normally set up to __libc_csu_init. Well, this will be obvious as soon as you look at the code of this function.

Finding the initial code

So, in case you do not know what __libc_csu_init does, you can google the details, but roughly, the function goes through the pointers stored in a special binary section named .init_array and runs all the functions in there.

It looks like the right place to look. Actually, unless something was modified on __libc_start_main or __libc_csu_init itself, there is no other place to look for the code that I’m aware of, at least at this early stage… Of course, if you do not find anything useful following the next steps you should start looking to the code of those functions.

As our binary is stripped, we will have to find the pointer to the .init_array section manually. Let’s look at the init function:

$ x/50i 0x401180
   0x401180:	push   r15
   0x401182:	mov    r15d,edi
   0x401185:	push   r14
   0x401187:	mov    r14,rsi
   0x40118a:	push   r13
   0x40118c:	mov    r13,rdx
   0x40118f:	push   r12
   0x401191:	lea    r12,[rip+0x200c70]        # 0x601e08
   0x401198:	push   rbp
   0x401199:	lea    rbp,[rip+0x200c78]        # 0x601e18
   0x4011a0:	push   rbx
   0x4011a1:	sub    rbp,r12
   0x4011a4:	xor    ebx,ebx
   0x4011a6:	sar    rbp,0x3
   0x4011aa:	sub    rsp,0x8
   0x4011ae:	call   0x4005e0

You can see two pointers in there. The first one is the one we want. Why?, well, because we have already seen this code a bunch of times. An exercise you can do is to write a simple hello world program and compile it without stripping the symbols. Then load it on your preferred debugger and you will see the references to these sections clearly in the disassembly.

Anyway, let’s check what is in the .init_array section:

gdb-peda$ x/4a 0x601e08
0x601e08:	0x400750	0x400a1c
0x601e18:	0x400730	0x0

The first function is always there. Be free to further research how a binary is started up on a GNU/Linux system, but for the sake of brevity (we still have to talk about a few things), the function we are interested on is the one at 0x400a1c.

Analysing the constructor

The functions executed by __libc_csu_init are usually named constructors, because they are executed at the very beginning of the program.

The function is a bit long so we will be analysing it in pieces as we have done in other write-ups. Let’s take a look to our constructor:

   0x400a1c:	push   rbp
   0x400a1d:	mov    rbp,rsp
   0x400a20:	sub    rsp,0x20
   0x400a24:	mov    DWORD PTR [rbp-0x20],0x0
   0x400a2b:	mov    eax,0xb62
   0x400a30:	mov    DWORD PTR [rbp-0x1c],eax
   0x400a33:	mov    QWORD PTR [rbp-0x18],0x3df512
   0x400a3b:	mov    QWORD PTR [rbp-0x10],0x3e8ce9
   0x400a43:	mov    QWORD PTR [rbp-0x8],0x3ded36
   0x400a4b:	add    QWORD PTR [rbp-0x8],0x212ca
   0x400a53:	mov    DWORD PTR [rbp-0x20],0x0
   0x400a5a:	jmp    0x400aa0

Nothing special here. The usual function prologue and initialisation of some local vars. As we can see this function does not seem to receive any parameter. Let’s write down our local variables.

[rbp-0x8]    Var1 = 0x3ded36 + 0x212ca = 0x400000
[rbp-0x10]   Var2 = 0x3e8ce9
[rbp-0x18]   Var3 = 0x3df512
[rbp-0x1c]   Var4 = 0xb62
[rbp-0x20]   Var5 = 0

Then we find an unconditional jump that may indicate the start of a loop. Let’s check it:

   0x400a5a:	jmp    0x400aa0
   0x400a5c:	mov    eax,DWORD PTR [rbp-0x20]		; Var5
   0x400a5f:	movsxd rdx,eax
   0x400a62:	mov    rax,QWORD PTR [rbp-0x8]          ; Var1
   0x400a66:	add    rax,rdx
   0x400a69:	movzx  eax,BYTE PTR [rax]
   0x400a6c:	cmp    al,0xc3
   0x400a6e:	jne    0x400a9c
   0x400a70:	mov    eax,DWORD PTR [rip+0x201716]      ; 0x60218c
   0x400a76:	mov    edx,DWORD PTR [rbp-0x20]		 ; Var5  
   0x400a79:	movsxd rcx,edx
   0x400a7c:	mov    rdx,QWORD PTR [rbp-0x8]		 ; Var1
   0x400a80:	add    rdx,rcx
   0x400a83:	cdqe
   0x400a85:	mov    QWORD PTR [rax*8+0x6021a0],rdx	; global g[]
   0x400a8d:	mov    eax,DWORD PTR [rip+0x2016f9]     ; 0x60218c
   0x400a93:	add    eax,0x1
   0x400a96:	mov    DWORD PTR [rip+0x2016f0],eax      ; 0x60218c
   0x400a9c:	add    DWORD PTR [rbp-0x20],0x1		 ; Var5 ++
   0x400aa0:	mov    eax,DWORD PTR [rbp-0x20]		 ; Var5
   0x400aa3:	cmp    eax,DWORD PTR [rbp-0x1c]		 ; Var4
   0x400aa6:	jge    0x400ab3
   0x400aa8:	mov    eax,DWORD PTR [rip+0x2016de]      ; 0x60218c
   0x400aae:	cmp    eax,0x8
   0x400ab1:	jle    0x400a5c

We have identified the loop counter as Var5 and we will rename it to i. We have also found a global var array and a global variable at 0x60218c. We can also infer that Var1 is a pointer that we will rename to ptr .

Let’s look for a sec to the instructions related to the global vars:

   0x400a85:	mov    QWORD PTR [rax*8+0x6021a0],rdx	; global g[]
   0x400a8d:	mov    eax,DWORD PTR [rip+0x2016f9]        # 0x60218c
   0x400a93:	add    eax,0x1
   0x400a96:	mov    DWORD PTR [rip+0x2016f0],eax        # 0x60218c

For the first one you can see how register rax is multiplied by 8, meaning that we are accessing a QWORD array. The next 3 instructions increments in 1 the value of a global DWORD a 32bits integer. Let’s call the first global g[] and the second global g_count.

Let’s update our variables table

0x6021a0     qword g[]
0x60218c     dword g_cn
[rbp-0x8]    ptr = 0x3ded36 + 0x212ca = 0x400000
[rbp-0x10]   Var2 = 0x3e8ce9
[rbp-0x18]   Var3 = 0x3df512
[rbp-0x1c]   len = 0xb62
[rbp-0x20]   i = 0

With all this information we can conclude that this loop iterates from 0 to 0xb62 looking for value 0xc3 and it finished either when we complete this loop or when we find 9 0xc3 values.

What does this mean?. If you look carefully, you will notice that the search starts at address 0x400000 that is usually the starting address for the .text segment. Check it with readelf. Knowing that 0xc3 is the opcode of the assembly mnemonic RET, what this loop does is to find 9 address in the text code containing a RET instruction.

Analysing the Constructor 2

Let’s continue with the analysis of our constructor.

   0x400ab3:	add    QWORD PTR [rbp-0x10],0x17fe0	; Var2 += 0x17fe0
   0x400abb:	push   QWORD PTR [rbp-0x10]		; PUSH Var2
   0x400abe:	mov    DWORD PTR [rbp-0x20],0xffffffff	; i = -1;
   0x400ac5:	add    DWORD PTR [rbp-0x20],0x1		; i++

The next thing we find is an update of Var2 that is immediately pushed in the stack, followed by what looks like part of a loop. We will find soon if that is the case.

Before continuing, let’s update the value of Var2

[rbp-0x10] Var2 = 0x3e8ce9 + 0x17fe0 = 0x400CC9

OK, that looks like a text code address again. It is early to make assumption on this, but the fact that this value has been pushed in the stack is important. Let’s keep analysing the code.

   0x400ac9:	mov    rax,QWORD PTR [rip+0x2016b0]        ; 0x602180 <environ>
   0x400ad0:	mov    edx,DWORD PTR [rbp-0x20]		   ; EDX = i
   0x400ad3:	movsxd rdx,edx	     			   ; RDX = i
   0x400ad6:	shl    rdx,0x3				   ; RDX >>=3 | RDX = RDX *8
   0x400ada:	add    rax,rdx				   ; RAX = &environ[i*8]
   0x400add:	mov    rax,QWORD PTR [rax]		   ; RAX = environ[i*8]
   0x400ae0:	test   rax,rax	     			   ; TEST 0 
   0x400ae3:	jne    0x400aee				   ; Jump if not NULL
   0x400ae5:	mov    rax,QWORD PTR [rip+0x2016bc]        ; 0x6021a8 -> g[1]
   0x400aec:	jmp    rax	     			   ; JMP g[1]

In this case gdb has already identified the environ variable. This variable is a global that contains an array of strings storing all the environmental variables accessible to this process. The code above is checking that the item environ[i] is not NULL. In case it is null, the program will jump to one of the address found in the initial loop that we know points to a RET instruction.

So the code above is leaving the function, but… Do you remember the PUSH we have seen some instructions before?. Sure, that is the address where the function is returning: 0x400CC9. It is not actually returning to __libc_csu_init but to this other address. Let’s see what does it contain:

gdb-peda$ x/50i 0x400CC9
   0x400cc9:	jmp    0x400cd4
   0x400ccb:	mov    rax,QWORD PTR [rip+0x2014de]        # 0x6021b0
   0x400cd2:	jmp    rax
   0x400cd4:	leave
   0x400cd5:	ret

OK, it effectively is returning after all. Therefore, if the loop that goes through all the environmental variables gets to the end of the environ global var, the program will just return normally.

Let’s see what happens otherwise:

   0x400aee:	mov    rax,QWORD PTR [rip+0x20168b]        # 0x602180 <environ>
   0x400af5:	mov    edx,DWORD PTR [rbp-0x20]
   0x400af8:	movsxd rdx,edx
   0x400afb:	shl    rdx,0x3
   0x400aff:	add    rax,rdx
   0x400b02:	mov    rax,QWORD PTR [rax]		  ; RAX = environ[i*8]
   0x400b05:	test   rax,rax
   0x400b08:	je     0x400ccb

This code just checks if environ[i*8] is NULL. If it is null, the program jumps to 0x400ccb. Let’s see what is in there:

gdb-peda$ x/20i 0x400ccb
   0x400ccb:	mov    rax,QWORD PTR [rip+0x2014de]        # 0x6021b0
   0x400cd2:	jmp    rax
   0x400cd4:	leave
   0x400cd5:	ret

Does this sound familiar?. Yes, this is again a RET instruction. Or in other words, if the content of environ[i*8] is equal to NULL, we will return in the exact same way that we had described above.

Back to the main loop, let’s look at the next chunk of code:

   0x400b0e:	mov    rax,QWORD PTR [rip+0x20166b]        # 0x602180 <environ>
   0x400b15:	mov    edx,DWORD PTR [rbp-0x20]
   0x400b18:	movsxd rdx,edx
   0x400b1b:	shl    rdx,0x3
   0x400b1f:	add    rax,rdx
   0x400b22:	mov    rax,QWORD PTR [rax]
   0x400b25:	mov    rdi,rax
   0x400b28:	call   0x400630 <[email protected]>
   0x400b2d:	cmp    rax,0xa
   0x400b31:	jne    0x400ccb

I bet you can directly read this by now. Exactly!. The function is also returning if the length of environ[i*8] is different of 10 (0x0a).

So far we can say that the program is looking for an environmental variable of 10 characters!.

Figuring Out the Env Variable

The rest of the loop is composed of a sequence of similar instruction blocks. Let’s analyse one of them and then you can work out the rest!

   0x400b37:	mov    rax,QWORD PTR [rip+0x201642]        # 0x602180 <environ>
   0x400b3e:	mov    edx,DWORD PTR [rbp-0x20]
   0x400b41:	movsxd rdx,edx
   0x400b44:	shl    rdx,0x3
   0x400b48:	add    rax,rdx
   0x400b4b:	mov    rax,QWORD PTR [rax]
   0x400b4e:	movzx  eax,BYTE PTR [rax]		; EAX = environ[i*8][0]
   0x400b51:	cmp    al,0x4d	    			; EAX == 0x4d == 'M'
   0x400b53:	jne    0x400ac5

So this code is checking that the first character of environ[i*8] is equal to ‘M’… We are finally getting somewhere.

If you go through the whole sequence of instructions like this one, you will find out that the program is looking for the environmental variable… I won’t say it :P… you will have to find it yourself!

When the character found does not match, the program will jump to 0x400ac5:

   0x400ac5:	add    DWORD PTR [rbp-0x20],0x1		; i++
   0x400ac9:	mov    rax,QWORD PTR [rip+0x2016b0]        # 0x602180 <environ>
   0x400ad0:	mov    edx,DWORD PTR [rbp-0x20]		; EDX = i
   0x400ad3:	movsxd rdx,edx	     			; RDX = EDX
   0x400ad6:	shl    rdx,0x3				; RDX *= 8
   0x400ada:	add    rax,rdx				; RAX = &environ[i*8]
   0x400add:	mov    rax,QWORD PTR [rax]	        ; RAX = environ[i*8]
   0x400ae0:	test   rax,rax	     			; NULL?
   0x400ae3:	jne    0x400aee				; Repeat

That looks like part of the loop we identified before. Very good, so we just need to figure out what happens when the proper environmental variable has been set:

   0x400caf:	mov    rax,QWORD PTR [rbp-0x18]		; RAX = Var3
   0x400cb3:	add    rax,0x212ca   			; RAX = Var3 + 0x212ca
   0x400cb9:	mov    QWORD PTR [rbp-0x8],rax		; ptr = RAX
   0x400cbd:	mov    rax,QWORD PTR [rip+0x2014dc]     ; 0x6021a0 g[0]
   0x400cc4:	push   QWORD PTR [rbp-0x8]		; PUSH Var3 + 0x212ca
   0x400cc7:	jmp    rax   	 			; JMP g[0]
   0x400cc9:	jmp    0x400cd4				; 
   0x400ccb:	mov    rax,QWORD PTR [rip+0x2014de]     ; 0x6021b0
   0x400cd2:	jmp    rax
   0x400cd4:	leave
   0x400cd5:	ret

OK, so this is the thing. This code will push in the stack Var3 + 0x212ca or in other words 0x409FB3, that again is an address in the text segment. Then it will jump into one of the RET instruction in the global array g, in this case g[0], effectively calling the function located at that address.

Did we find the challenge?.. Let’s see

Note: In case you were using radare2, the values in the comparisons for each character will just be shown there and just browsing through the function, seeing those characters and the reference to environ, will be enough for you to find the variable you have to set.


So, let’s see if all this effort brought us somewhere.

$ VAR=VALUE /tmp/sc04
Congrats!. You have found the real challenge!!! Let's Play
Password :

Pretty good. We have found the real challenge!!!

Now we can do two things. Keep going as per now reverse the function at 0x409FB3 or using the debugger to find the code we are interesting in. I will just follow the second case as this is going to be a pretty long write-up. You can try yourself to follow the reverse path for you to practice. Be free to ask in the comments if you find problems reversing that function.

The debugger path is easy. Just run the program in the debugger and press CTRL+C when the program asks for the password. Alternatively you can run the program normally and attach to it using the -pid PROCESS_PID flag. I used this last path as, for some reason, setting the environmental variable in the shell and launching gdb didn’t work for me. If anybody know why just comment below, so I do not have to figure that out :slight_smile:

Facing the real challenge

Let’s start:

$ VAR=VALUE ./sc04.patch
(... in another terminal in a far away galaxy ...)
$ sudo gdb -pid /tmp/sc04.patch

(... messages...)
gdb-peda $ bt
#0  0x00007fb74ec42330 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fb74ebcd5b0 in _IO_new_file_underflow (fp=0x7fb74ef16640 <_IO_2_1_stdin_>) at fileops.c:613
#2  0x00007fb74ebce53e in __GI__IO_default_uflow (fp=0x7fb74ef16640 <_IO_2_1_stdin_>) at genops.c:435
#3  0x00007fb74ebc2274 in __GI__IO_getline_info ([email protected]=0x7fb74ef16640 <_IO_2_1_stdin_>,
    [email protected]=0x7ffcca1c86e0 "", n=0x7f, [email protected]=0xa, [email protected]=0x1,
    [email protected]=0x0) at iogetline.c:69
#4  0x00007fb74ebc2368 in __GI__IO_getline ([email protected]=0x7fb74ef16640 <_IO_2_1_stdin_>, [email protected]=0x7ffcca1c86e0 "",
    n=<optimised out>, [email protected]=0xa, [email protected]=0x1) at iogetline.c:38
#5  0x00007fb74ebc1206 in _IO_fgets (buf=0x7ffcca1c86e0 "", n=<optimised out>, fp=0x7fb74ef16640 <_IO_2_1_stdin_>)
    at iofgets.c:56
#6  0x0000000000402175 in ?? ()
#7  0x00007fb74eb74f45 in __libc_start_main (main=0x402115, argc=0x1, argv=0x7ffcca1c8858, init=<optimised out>,
    fini=<optimised out>, rtld_fini=<optimised out>, stack_end=0x7ffcca1c8848) at libc-start.c:287
#8  0x00000000004006b9 in ?? ()

We have interrupted the program while waiting for the user input so we get a pretty long stack backtrace. Quickly examining it we can see that frame #6 is the one interesting for us.

gdb-peda$  x/20i 0x0000000000402175
   0x402175:	lea    rax,[rbp-0x90]
   0x40217c:	mov    rdi,rax
   0x40217f:	call   0x402000
   0x402184:	mov    eax,0x0
   0x402189:	mov    rcx,QWORD PTR [rbp-0x8]
   0x40218d:	xor    rcx,QWORD PTR fs:0x28
   0x402196:	je     0x40219d
   0x402198:	call   0x400640 <[email protected]>
   0x40219d:	leave
   0x40219e:	ret

OK, you can now try to find the beginning of the function to get more context, but, if you are in a hurry, it is pretty safe to assume that, the user input will be stored at [rbp-0x90] and that it will be passed to the function at 0x402000. After that, the function just returns and, looking to the stack backtrace, the program will just finish. So, that function is the one we have to actually reverse.

Reversing 0x402000

We are getting close. The interesting function is again pretty long and we will go once again by chunks

gdb-peda$ x/100i 0x402000
   0x402000:	push   rbp
   0x402001:	mov    rbp,rsp
   0x402004:	sub    rsp,0x20
   0x402008:	mov    QWORD PTR [rbp-0x18],rdi
   0x40200c:	mov    DWORD PTR [rbp-0x8],0x0
   0x402013:	mov    rax,QWORD PTR [rbp-0x18]
   0x402017:	mov    rdi,rax
   0x40201a:	call   0x400630 <[email protected]>
   0x40201f:	mov    edx,DWORD PTR [rip+0x20007b]        # 0x6020a0
   0x402025:	movsxd rdx,edx
   0x402028:	cmp    rax,rdx
   0x40202b:	ja     0x40203b
   0x40202d:	mov    rax,QWORD PTR [rbp-0x18]
   0x402031:	mov    rdi,rax
   0x402034:	call   0x400630 <[email protected]>
   0x402039:	jmp    0x402041
   0x40203b:	mov    eax,DWORD PTR [rip+0x20005f]        # 0x6020a0
   0x402041:	mov    DWORD PTR [rbp-0x4],eax

By now, you should be able to analyse this one by yourself. It basically calculates the size of the user input (RDI stored in local [rbp-0x18]) and compares it to the value of the global at 0x6020a0 and stores at [rbp-0x4] the maximum of both values.

   0x402044:	mov    DWORD PTR [rbp-0xc],0x0
   0x40204b:	jmp    0x40207f
   0x40204d:	mov    eax,DWORD PTR [rbp-0xc]
   0x402050:	cdqe
   0x402052:	mov    rax,QWORD PTR [rax*8+0x6020c0]
   0x40205a:	mov    edx,DWORD PTR [rbp-0xc]
   0x40205d:	movsxd rcx,edx
   0x402060:	mov    rdx,QWORD PTR [rbp-0x18]
   0x402064:	add    rdx,rcx
   0x402067:	movzx  edx,BYTE PTR [rdx]
   0x40206a:	movsx  ecx,dl
   0x40206d:	mov    edx,DWORD PTR [rbp-0xc]
   0x402070:	mov    rsi,QWORD PTR [rbp-0x18]
   0x402074:	mov    edi,ecx
   0x402076:	call   rax
   0x402078:	add    DWORD PTR [rbp-0x8],eax
   0x40207b:	add    DWORD PTR [rbp-0xc],0x1
   0x40207f:	mov    eax,DWORD PTR [rbp-0xc]
   0x402082:	cmp    eax,DWORD PTR [rbp-0x4]
   0x402085:	jl     0x40204d
   0x402087:	cmp    DWORD PTR [rbp-0x8],0x0
   0x40208b:	jne    0x4020f5
   0x40208d:	mov    DWORD PTR [rbp-0xc],0x0
   0x402094:	jmp    0x4020e8

Then we find a new loop. The counter is at [rbp-0xc] and it runs as many iterations as calculated in the previous block. If we carefully look at this block we will see that it basically calls a series of functions (jmp rax) whose addresses are stored in a global variable at 0x6020c0, The return values of this functions are then accumulated in [rbp-0x8].

We can see the check against 0 towards the end of the function. You can further analyse the function, but in order to keep this shorter, I can tell you that we need to provide an input that will make the value of [rbp-0x8] zero.

Before starting the analysis of the different functions, let’s look at the parameters used to call all those functions.

Par1 -> EDI = user_input[i]
Par2 -> ESI = user_input
Par3 -> RDX = i

So we are calling a function for each character in the user provided password!.

Reversing the function array

So, the first thing we have to do is to get the addresses of the functions in that global array:

gdb-peda$ x/20a 0x6020c0
0x6020c0:	0x40219f	0x4024d5
0x6020d0:	0x40278f	0x402ac5
0x6020e0:	0x402daf	0x40309c
0x6020f0:	0x40337d	0x403656
0x602100:	0x40390c	0x403c67
0x602110:	0x403f8b	0x40421b
0x602120:	0x4044d7	0x4047cb
0x602130:	0x404ab5	0x404dbe
0x602140:	0x4050ae	0x4053a5
0x602150:	0x0	0x0

There are a few… If you look at the first one you will find a pretty long function doing all kinds of operations. This looks like a great candidate for using symbolic execution. But as I’m not familiar with that type of analysis, I have to try to solve this in a different way.

As we said, we want the function to return 0. So let’s check what the first function returns:

   0x4024cb:	mov    eax,DWORD PTR [rbp-0x28]

   0x4024ce:	add    rsp,0x58
   0x4024d2:	pop    rbx
   0x4024d3:	pop    rbp
   0x4024d4:	ret

So, it is returning the content of local variable [rbp-0x28]. Now, let’s identify were the parameters are stored:

   0x40219f:	push   rbp
   0x4021a0:	mov    rbp,rsp
   0x4021a3:	push   rbx
   0x4021a4:	sub    rsp,0x58
   0x4021a8:	mov    eax,edi
   0x4021aa:	mov    QWORD PTR [rbp-0x60],rsi
   0x4021ae:	mov    DWORD PTR [rbp-0x58],edx
   0x4021b1:	mov    BYTE PTR [rbp-0x54],al

What leads us to the following table:

[rbp-0x54] = Par1 = user_input[i]
[rbp=0x58] = Par3 = i (string index)
[rbp-0x60] = Par2 = user_input

Now, we can try to isolate the code that modifies the local var that is returned and, that way, try to get a simpler version of the function. Let’s find the instructions that modifies that local variable:

gdb-peda$ x/260i 0x40219f
   0x4021b4:	mov    DWORD PTR [rbp-0x3c],0x0
   0x4021bb:	mov    eax,DWORD PTR [rbp-0x3c]
   0x4021be:	mov    DWORD PTR [rbp-0x28],eax
   0x402307:	movzx  eax,BYTE PTR [rax]
   0x40230a:	movsx  eax,al
   0x40230d:	sub    eax,0x2d
   0x402310:	add    DWORD PTR [rbp-0x28],eax
   0x4023c9:	movsx  eax,BYTE PTR [rbp-0x54]
   0x4023cd:	xor    eax,DWORD PTR [rbp-0x58]
   0x4023d0:	mov    DWORD PTR [rbp-0x28],eax ;

   0x4023d3:	mov    DWORD PTR [rbp-0x3c],0x56
   0x4023da:	mov    eax,DWORD PTR [rbp-0x3c]
   0x4023dd:	xor    DWORD PTR [rbp-0x28],eax
   0x402476:	xor    DWORD PTR [rbp-0x28],0x26
   0x4024a4:	mov    eax,DWORD PTR [rbp-0x58]
   0x4024a7:	mov    DWORD PTR [rbp-0x20],eax
   0x4024aa:	mov    eax,DWORD PTR [rbp-0x20]
   0x4024ad:	xor    DWORD PTR [rbp-0x28],eax
   0x4024cb:	mov    eax,DWORD PTR [rbp-0x28]
(.. and then returns ...)

We can see that the third block just override the value on the local var, invalidating any previous operation. After all that, only xor operations are performed until the function returns. So, this first function can be simplified to:

   0x4023c9:	movsx  eax,BYTE PTR [rbp-0x54]	; EAX = user_input[i]
   0x4023cd:	xor    eax,DWORD PTR [rbp-0x58]	; XOR = user_input[i] ^ i
   0x4023d0:	mov    DWORD PTR [rbp-0x28],eax ; Val = user_input[i] ^ i

   0x4023d3:	mov    DWORD PTR [rbp-0x3c],0x56; 
   0x4023da:	mov    eax,DWORD PTR [rbp-0x3c] ; EAX = 0x56
   0x4023dd:	xor    DWORD PTR [rbp-0x28],eax ; Val = (user_input[i] ^ i) ^0x56
   0x402476:	xor    DWORD PTR [rbp-0x28],0x26; Val = (user_input[i] ^ i) ^0x56 ^ 0x26
   0x4024a4:	mov    eax,DWORD PTR [rbp-0x58] ; EAX = i
   0x4024a7:	mov    DWORD PTR [rbp-0x20],eax
   0x4024aa:	mov    eax,DWORD PTR [rbp-0x20]; EAX = i
   0x4024ad:	xor    DWORD PTR [rbp-0x28],eax; Val = (user_input[i] ^ i) ^0x56 ^ 0x26 ^i

The first function implements the following expression:

Val = (user_input[i] ^ i) ^0x56 ^ 0x26 ^i = user_input[i] ^ 0x56 ^ 0x26 
    = user_input[0] ^ 0x70

Therefore, the first key value is 0x70 or p and we are just in front of an obfuscated XOR encoder. You can take a look to another function to be sure, but I can tell you that that is the case for all the functions we have to analyse.

Getting the key and then the flag

Knowing that the checking is just a XOR operation we can easily get the password setting a breakpoint just before calling each of the functions, setting EDI to 0 (that is the only value used by the functions for its calculation) and then running the function. The result of any XOR operation against 0 is the other operator, i.e. the password. This way we can easily get the password and, after entering it also get the flag.

The proper way to do this, I guess, is using a gdb script. If you want to write one, remember that you can set the value of a register with the command:

set $edi = 0

Final Words

So, this was it for the OBFTEANATION challenge. The name comes from the two main difficulties it has. The first one is the obfuscation of the password checking which, at the end was not that hard. The TEA is related to the fact that the program was crypted with XXTEA. Using the dynamica analysis, jumping straight into debugging the program allow us to skip that part, but it was there.

As I said at the beginning I didn’t try to solve the challenge myself after writing it… I just wrote something that I hoped it would be hard enough to be fun for more advanced contestants. This means that I have just solved it now in order to be able write this, so there are many chances that this is not the best way of solving it. Be free to come up with an alternative write-up explaining how did you solve it. I will be interested on seeing alternative ways.

Finally, in principle I will not write write-ups for HexaPASS and challenge-004. They will be pretty much a repetition of Shuffled Love and this one. However if any of you have issues with then, just let me know.


Thanks for the writeup @0x00pf!

I spent quite a bit of time with that challenge but in the end I was stuck because I didn’t know how to patch the ELF header and wasn’t able to run it in gdb as a consequence (I found the environment variable checking code but failed at actually executing it with the MAIN= 31173 environ variable).
After reading how to patch the binary I spent a bit more time with figuring this challenge out and came up with the following gdb script to get the password (note that I didn’t look into the “call rax” calls but worked with the idea in mind that the calculations behind them might vary, which is why my script tries every characters until the result is 0):

import string

gdb.execute("set pagination off") # disable y/n questions
bruteForce = " "+string.ascii_letters+string.digits+" .,-_+!$&(){}" #list with all characters to to test (+filler at the beginning)
password = ['a']*18 #password list
for globalIndex in range(19): 
    mainIndex = -1
    for c in bruteForce:
        if mainIndex >= 0: #only edit the pasword after checking if the index is correct
            password[mainIndex] = c
        gdb.execute("unset environment")
        gdb.execute("set environment MAIN=31173") #set environment variables to get to the challenge
        gdb.execute("b *fgets") # breat at fgets and only set breakpoints when reached to not crash
        gdb.execute("run <<< $(python -c \"print '"+''.join(password)+"'\")")
        gdb.execute("b *0x402064") # ecx = index to read
        gdb.execute("b *0x402078") # eax = return value from call rax
        gdb.execute("continue") # continue from fgets
        for i in range(globalIndex-1): #skip already checked characters 
        indexRead = int(gdb.parse_and_eval("$ecx"))
        value = int(gdb.parse_and_eval("$eax"))
        if(value==0):  #eax == 0 => correct input character
            gdb.execute("d")  #reset breakpoints
            if(mainIndex != indexRead): #index seems of fix it
                mainIndex = indexRead
        gdb.execute("d") #reset breakpoints
print(''.join(password)) #output password
gdb.execute("set pagination on")

The script is pretty ugly but it results in the right password:

Starting program: /media/sf_SharedSpace/current/0x00ctf/sc04.patch 
Congrats!. You have found the real challenge!!! Let's Play
Password : p1C0bfU5K4t0R-2o1T

I didn’t know about the init and fini parameters of __libc_start_main nor the .init_array section, I also never worked on a challenge that involved the environment variable, thanks for the great challenge, I learned a lot! :smiley: