Linux Internals - Binary Execution Analysis

_py · November 8, 2016, 4:48pm

Hello everyone, I hope your day has been great. Last time we dove into Dynamic Linking Internals. It’s time to dig even deeper though. Have you ever wondered “Ok, I can run my program, but what’s going behind the scenes that makes it possible?” If you never thought of that, you should! And if you did, don’t click away. So today’s topic is on how do our binaries get executed. I won’t go full technical mode on you so that even peeps without any prior knowledge on this subject can comprehend the general idea behind this process. I might skip some details that I don’t consider important, but if you want to learn more about ELF binaries, feel free to DM me, or google, or look up the ELF specs.

Note: This paper is revolved around Linux binaries so if you are a Windows dude / dudette, I apologise. Though, I hope you can still benefit from it. Moreover, it’s more of a research material that I’ve been up to so I’ll be updating it in case I learn more about it. Feel free to point out any mistake that I might have made and I’ll correct it asap.

Without further ado, grab some healthy snack and enjoy!

If you ever programmed in C / C++ / ASM (and probably many other languages that I’m not aware of as I’m writing this), you have definitely typed “./executable” in your terminal in order to run your binary. But what happened from the moment you clicked ENTER, till the moment you saw your program’s output result?

When you try to execute a binary from the terminal (which in reality is like a process calling another process, aka fork()-ing) , first thing that happens is the use of the sys_execve() system call in order to load the executable’s image. Let’s have a look at the execve()’s man page:

execve() does not return on success, and the text, data, bss, and stack
of the calling process are overwritten by that of the program loaded.

Execve() will clean up the address space and load your binary’s segments into memory (RAM). All I mean by segments is a bunch of chunks of 0s and 1s that are crucial for the binary in order to execute. To be more precise, the .text and .data segment(.bss segment as well sometimes) as well as the heap and the stack will be the ones loaded. Have no fear, if you are not aware of some of the above terms, a crash course on those is coming shortly so sit tight.

Let’s write down some pseudosteps for the binary execution process:

1. Find the executable’s segments.
2. If the executable is dynamically linked, make sure to load the shared libraries as well.
3. Load the binary.
4. Initialize the stack with the appropriate arguments.
5. Kernel passes the ball to the entry point (that can be either the executable 
or the dynamic linker who will apply relocations 
and link the main program with the requested runtime libraries 
before the binary gets executed).
6. Run program, run!

As promised, let’s have a crash course on some terms I mentioned above so the reading can be much smoother, shall we?

##Crash Course in Segments

Segment sounds like a fancy term but as almost every fancy term, it’s way simpler than it seems. As I described above, segments are just a bunch of contiguous blocks of bytes and each of them has a different interpretation and properties. They are essential to the program loader, but what do I mean with all this crap?

###Text Segment

The text segment contains the machine-language instructions of your binary. An important note is that the text segment is flagged as Read-Only for security reasons which you can probably guess.

###Data Segment

The data segment contains initialized global and static variables.

###BSS Segment

The bss segment, aka Block Started by Symbol, is the one containing the uninitialised global and static variables. I like thinking of it as the “Bullshit Size Segment” since its size is “bullshit” as in, it doesn’t really do much, it just chills right after the data segment. Note that the .bss segment takes up no space on disk which makes sense, right? Why allocate memory for something that isn’t initialized.

###Stack

The stack is a dynamically increasing and decreasing segment containing the so called stack frames, which practically are just info for each function being called (stores local variables, return pointer, frame pointer etc). You should invest some time understanding the stack’s mechanisms since it’s a fundamental concept in Computer Science.

###Heap
The heap is an area where you can allocate memory from dynamically (during runtime). If you want to learn more about the heap, @oaktree has made an extended series regarding the heap mechanisms and you should definitely check it out if you haven’t.

Segments are described by the so called Program Header Table, which simply put is, an array of structs containing info on the segment’s size, load address, permissions and much more. That was a brief overview into segments, if you want to know more, the ELF specs has everything you need.
In some sort of fancy terms, segments “divide” the virtual address space of a process.

Just to be sure though, let’s have a visual representation of the aforementioned terms since I’m a huge fan of being able to visualize concepts.

+--------------+
| Kernel Space |
+--------------+
|      ...     |
+--------------+
|     Stack    |  <-- growing downwards
+--------------+
|     .so      |  <-- memory mappings (i.e shared objects/libraries)
+--------------+
|     Heap     |  <-- growing upwards
+--------------+
|     .bss     |  <-- Uninitialized global/static vars
+--------------+
|     .data    |  <-- Initialized static/global vars
+--------------+
|     .text    |  <-- Machine instructions (i.e mov dis, dat)
+--------------+
|      ...     |
+--------------+

There you have it! Now we can move on and feel confident knowing some nitty-gritty details.

Phew! Alright, after this small break let’s go back to business. Let’s analyze each step I mentioned above one by one and try to make sense out of it.

###Find the executable’s segments

In order to load the executable’s segments, we first need to find out the address ranges they use. There are a bunch of techniques of figuring that out. Two of them would be to either parse the program header table, which as I said before includes all kinds of intel for each segment or to parse the /proc/self/maps (where self is the PID) file, which gives out everything there is to know. Just execute your program and check it by yourself if you don’t believe me.

###Linker, linker where you at?

Figuring out if the binary is dynamic linked is a piece of cake. The execve()’s man page spoils it so you can have a look at that and find out by yourself. An alternative way is to use the “file” command. Keep in mind that there are cases such as malware, where you can be fooled about the existence of a dynamic linker but let’s not be too hardcore for now. Once the dynamic linker is found, all that is required is to load it and the process of that is described below.

###Load the binary

The process of loading the binary is practically the same method as the one before. In others words, parsing the program header table in order to figure out each segment’s size and alignment. Each loadable segment is of type PT_LOAD and the program loader makes sure it loads them according to the p_memsz (the amount of memory that the specific segment takes up) and p_align (google memory alignment) values. After determining those, all that there needs to be done by the loader is to sum up the p_memsz values according to p_align and allocate memory starting from the first PT_LOAD segment’s address all the way to the total length which is the calculated sum. Once you figure out what that p_align is, it will make total sense.

###Stack Initialization

The loading part is done. Now it’s time to initialize the stack. Why initialize the stack though? Well, before the binary’s execution the stack contains immense information in order for the binary and the dynamic linker to co-operate and make ends meet. In particular, the stack has a specific alignment. Below is an illustration of how the stack looks like:

+--------------+
|     ...      |  <-- Top of the stack
+--------------+
|  env strings |  <-- Environment variable strings
+--------------+
|     ...      | 
+--------------+
| argv strings |  <-- Argument strings
+--------------+
|     ...      |  
+--------------+
|     auxv     |  <-- Elf_Aux vector
+--------------+
|     ...      |
+--------------+
|     envp     |  <-- Pointer to environment strings
+--------------+
|     ...      |
+--------------+
|     argv     |  <-- Pointer to argument strings
+--------------+
|     argc     |  <-- Number of cmd arguments
+--------------+
|     ...      |  <-- Bottom of the stack
+--------------+

You probably already know most of the stack elements except for the Elf_Aux, aka auxiliary vector. This vector is unknown to most people, yet it’s extremely essential since it passes information to the dynamic linker. What kind of information you may ask. This vector is actually an array of structs of type ElfN_auxv_t, where N = 32 or 64. Specifically:

typedef struct
{
    uint64_t a_type;   /* Dynamic Linking Entry Type */
    union
      {
          uint64_t a_val;
      } a_un;
} Elf64_auxv_t;

Ignore the a_val and let’s focus on a_type. So I was saying, the stack is initialized in such way so that the dynamic linker can complete its task. Thus, the info needed by the dynamic linker is:

AT_BASE   - The load address
AT_PHDR   - The address of the program header table
AT_PAGESZ - The size of a page in memory
AT_PHNUM  - The number of program headers
AT_ENTRY  - The entry point of the executable
AT_FLAGS  - Runtime flags for the dynamic linker

The above “types” are a range of values the a_type member can take and each one of those types correspond to an integer which isn’t much of an importance to know right now. As you can see, the dynamic linker needs a bit of help in order to find its way into the running process. Moving to the final step.

###Kernel passes the ball to the entry point

Pretty straight forward stuff. If the binary is dynamically linked, the entry point is the the dynamic linker itself so he/she can complete his/her task. Otherwise, no addition is being done, it’s just the (AT_ENTRY) e_entry value of the ELF header struct.

###RUN, PROGRAM RUN!

Tada! This is it boys and girls. A lot of info was thrown at you but if you re-read it again and again, I’m sure it will start making sense (keep in mind it took me weeks to fully understand those concepts). If you are reading this sentence, you are a true champ. Thank you for taking the time to read my paper and I hope you gained even a tiny bit of knowledge out of it. If you have any questions or future post recommendations regarding ELF Internals, please let me know. I’m already planning to cover more aspects of it.

Later,
@_py

oaktree · November 8, 2016, 7:31pm

Great stuff.

Could you elaborate on this, though? Or point me to your source?

_py · November 8, 2016, 7:57pm

Simply put, if a binary is dynamically linked, the execution starts first in the dynamic linker’s text segment (the entry point in its .text segment) in order to do the appropriate linking, address patching etc, which makes sense, right? First, you gotta take care of the relocations via the code included in the dynamic linker’s text segment and THEN off to the entry point of the main program. Otherwise, if the binary isn’t dynamically linked, there is no need to pass the execution to the dynamic linker or any shared library and the binary starts within its text segment (in particular, at the address where e_entry points at). I hope it makes more sense now. I’ll edit that part probs. I apologize for the confusion.

I can write a paper describing all the section / program header structs and ELF headers if you are confused about them.

oaktree · November 8, 2016, 10:49pm

Thank you. I do think that an overview of the Elf Header structs would be great, as well.

_py · November 8, 2016, 11:11pm

Have a look at this link for a more in-depth explanation with actual code references:

“”
_The execution start address for the program is also set to be the entry point of the interpreter, rather than that of the program itself. When the execve() system call completes, execution then begins with the ELF interpreter, which takes care of satisfying the linkage requirements of the program from user space — finding and loading the shared libraries that the program depends on, and resolving the program’s undefined symbols to the correct definitions in those libraries. Once this linkage process is done (which relies on a much deeper understanding of the ELF format than the kernel has), the interpreter can start the execution of the new program itself, at the address previously recorded in the AT_ENTRY auxiliary value.

“”

Valentine · November 9, 2016, 3:26pm

hmmmm… might just have to read the other papers, (I’m lost). XD

_py · May 1, 2017, 8:38am

pry0cc · May 1, 2017, 4:57pm

Why is this unlisted? @_py

_py · May 1, 2017, 5:05pm

@pry0cc It’s not up to my standards anymore. I don’t like it that much. I was having some self-reflection. I’ll make an improved version of it in the future. It isn’t as in-depth as I want it to be.

_py · January 21, 2018, 12:36am

This topic was automatically closed after 30 days. New replies are no longer allowed.