I was messing around with C and I decided to replicate a buffer overflow attack. What is a buffer overflow?
In simple terms, a buffer overflow is when input goes past the "cap" that is allowed for the input to have.
In more advanced terms, a buffer overflow is when the user inputs more data into a buffer than it had been allocated for.
Here is a simple script to demonstrate.
#include <stdio.h> // Include the library you need to input and output things.
int main(){ // Define the main function. The "int" signifies it will return an integer(in this case zero to signify the end).
int key = 0; // Set a variable "key" of the data type integer to zero (the name "key" can be anything).
char typy[5]; // Create a character array of the name "typy" that allocates 5 bytes for stdin (input) ("typy" can be any name you want).
printf("Type something: "); // A function that lets the user input data.
gets(typy); // "Gets" the input that the user typed and stores it in typy.
printf(typy); // Prints what you typed (the data stored in typy).
printf("\n"); // Prints a new line to make it look neater when the program exits (purely cosmetic. I use it to make it look neater in my linux terminal.)
if (key){ // If the value of key is true (non zero)
printf("It worked\n"); // Print "It worked. Notice the new line character. This is, once again, to make my terminal look neat as the program executes."
}
return 0; // Exits the program cleanly. (You are returning an integer to the function. That is why you start it off with "int" main()).
}
Now save the script and compile it with your favorite compiler.
For example, I would do gcc buffer_overflow_demo.c then run it with ./a.out.
Let’s evaluate what the script does. First, it tells you to type something. Let’s type a word under five characters such as “dog”. It prints dog back out to you. Everything runs normal and as expected. Now run it again and this time type something over 5 such as “0x00sec Rocks”. What happens? This time it should output the line “It worked”. Why does that happen? Simply because the character array ran out of room (allocated memory) to store what you typed in, so it "overflowed" and overwrote the variable "key" to make it non-zero. Remember any thing non-zero is true. When you ran the program the first time and typed dog it was under the size of allocated memory so everything ran fine and nothing was overwritten.
I hope that helped you understand what a buffer overflow was, how it works, and how it is executed.
As always,
This is a buffer overflow vulnerability but not the one that can modify the state of a variable. Get a pen and draw the stack on a paper. The declaration of the variables in the source code says it all.
I’m not sure if there’s a guaranteed way of knowing how the stack will turn out since that entirely depends on the compiler and what it believes is the most optimal solution. But yes, most, if not all, of my encounters were the other way around.
Apparently, gcc sorts local variables in the stack according to its type and not the declaration order. Looks like int goes first and actually the arrays goes last. It does not matter where do you put the int, it will always will go first, at least, right now with gcc.
Sure, compiler optimization plays a big role but I’ve messed with these kind of buffer overflows and I was unable to change the value of a certain variable unless the buffer was declared in a specific way/order.
Just an fyi to those who aren’t getting the expected result.
@oaktree yes, padding is indeed applied but in this case it’s more important which variable gets stored below/above the another. If the buffer gets stored below the “key” variable, then it can indeed overflow and affect key’s value, otherwise the affected variables would be instruction pointers, frame pointers, auxiliary vectors and much more (keep in mind the stack grows downwards and the buffer will be growing upwards).
@oaktree, IMO, the best way to find it out is through the legend GDB itself. Here’s a sample of ASM code if you want to play around with it. Nothing crazy, pretty self-explanatory, but it will help you visualise the whole upwards/downwards trick:
global _start
section .text
_start:
mov eax, 0x66778899
mov ebx, 0x0
mov ecx, 0x0
push ax
pop bx
push eax
pop ecx
push word [sample]
pop ecx
push dword [sample]
pop edx
; exit syscall
mov eax, 1
mov ebx, 0
int 0x80
section .data
sample: db 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff, 0x11, 0x22
It’s arch-dependent. For instance, PA-RISC stack grows upwards… Never worked with one though.
Thinking about it as a physical stack may be confusing. Better consider it as a LIFO (Last In First out) collection. Then it doen’t matter in which direction it grows… the important thing is that the last to get in is the first to get out.