Linux Internals ~ Dynamic Linking Wizardry

_py · September 13, 2016, 4:50pm

UPDATE: Please refer to Linux Internals - The Art Of Symbol Resolution for a more detailed and updated analysis of the concept.

Hey peeps! I hope you’re all doing great. It’s been a long time since my last post. Anyway, today I present to you something completely different from what you’ve been used to seeing from me. This is more a self-research/study material. I’m nowhere close to being an expert on this subject but I busted my ass off trying to understand the inner workings of Linux executables and it finally started making sense. So I thought “Hey, why not share my findings? Not everyone know what’s going on under the hood.” Without further ado, let’s get right into the amazing world of Dynamic Linking!

###Prerequisites

-Basic C Programming
-Pointers
-Knowledge of the ELF would make your reading much smoother
-Knowledge of Linkers and Loaders
-Patience
-Logic
-Will

If some of you don’t have some of the prerequisites, I’m here to tell you have no fear because I will try to explain it as simply as I can. Even if you don’t have a background in the low-level world, I’ll do my best so that after you finish reading this paper you will have at least the basic feel of it.

Disclaimer: As I mentioned before, this paper is a result of my own experimentation and study, so if I’m explaining something falsely or inaccurately, feel free to point out any mistake. We all learn by doing and failing.

##Relocations

According to the ELF(5) man pages:

Relocation is the process of connecting symbolic references with symbolic definitions. Relocatable files must have information that describes how to modify their section contents, thus allowing executable and shared object files to hold the right information for a process’s program image. Relocation entries are these data.

Let’s forget about that definition for a while and make a simpler version of it by experimenting, shall we? Hopefully, by the end of this post you will have made your own conclusion without needing to read any wikipedia link or specification. I won’t dig deep into relocations since today’s topic is dynamic linking but I’ll explain as much as it’s needed so we can connect the pieces of the puzzle together. Relocations are crucial when it comes to dynamic linking. You can think of it as a binary patching mechanism which provides intel to the dynamic linker in order to resolve symbol definitions. I’m referring to “symbols” but you may be wondering, what does that even mean?

extern int i;
puts();

THAT simple. Imagine those 2 lines of code as being a part of a .c file. Puts() and ‘i’ are symbols. Specifically, they are names which need to be resolved by the dynamic linker. What do I mean by that? Your computer works with addresses, not names. Puts() belongs to the well known libc library, aka a shared object, which means it’s not defined in our source file. So how can the code of puts() be executed if we haven’t defined a piece of code for that function in our source file? Well, here comes the dynamic linker who will resolve the address of puts() in libc, patch the relocation that has been made in our source file about the puts() symbol and then execute the function. I will explain in much more detail the process shortly.

As about the extern int i; line. Well, there isn’t much difference in the resolution process so I’ll let you figure that out by providing a link for some research in case you never heard of extern.

Let’s have a look at the 64-bit relocation structs:

Version 1:

typedef struct {
    /* Offset to the location that requires relocation */
    Elf64_Addr r_offset;
    /* 
     * 1. Info about the index of the symbol in the symbol table.
     * 2. Type of relocation that needs to be applied.
    */
    uint64_t   r_info    
} Elf64_Rel;

Version 2:

typedef struct {
    Elf64_Addr r_offset;
    uint64_t r_info;
    int64_t  r_addend; /* Constant addend used in relocation calculations */
} Elf64_Rela;

Alright, I think I gave you a kickstart on relocations so you can look them up later on.

##The Art Of Dynamic Linking

After this small relocation introduction, it’s time to dig a bit deeper. If you haven’t understood the purpose of them yet, no worries, I’m about illustrate an example with actual code. Here’s our tiny source file.

Note: The code example will be performed on a 32-bit machine.

int main(void) {
    puts("Hey!");
    return 0;
}

Let’s try to compile it with gcc -m32 -o name name.c .

Hmm, I don’t know about you, but I’m getting some warnings on the fact that puts() isn’t declared in my source file. Well, I feel risky so I’ll give it a shot and try to run the executable.

Look at that! I didn’t define puts() anywhere in my source file, I didn’t include any header file and yet the code ran smoothly. So what happened? To the assembler haters, don’t click away, it’s simpler than you think. Let the journey begin.

##Lazy Binding via PLT and GOT

Welcome to the meat of this paper. The PLT (Procedure Linkage Table) and GOT (Global Offset Table) are sections included in executables and shared libraries. Our main focus is on executables right now. When a program makes use of a shared library function, such as puts() or gets(), which are not resolved until runtime, we are in need of a mechanism that resolves the addresses of those shared functions. This mechanism isn’t just a simple call instruction.

Don’t freak out! That’s the code of our main function via the use of the objdump -D command. We are only interested in the call 80482e0 <puts@plt> instruction.

Looks like the call to puts() leads to the address 0x80482e0, which is the PLT table entry for puts(). You don’t believe me?

Think of PLT and GOT as arrays with indices/entries. Each index includes info about variable and function symbols of our code (i.e addresses). PLT is specialized in function symbols while GOT is used for both variables and functions.

Moving on, there is an indirect jump (jmp *0x804a00c) to the address stored at 0x804a00c. That address is a GOT entry that holds the address for the puts() function in the libc library. Keep in mind that address. But wait, the address of puts() hasn’t been resolved yet, where will it jump to?

####Enter Lazy Binding

All I mean by lazy binding is this:

The dynamic linker will not resolve every function at load time, but instead, it will resolve the functions while they are being called during runtime through the help of its buddies who have made it possible, PLT and GOT. Let’s have a relocation throwback.

Note: The R_386_JUMP_SLOT is a relocation type for the PLT/GOT entries. For more details into relocations make sure you check out the ELF specifications [5].

Interesting! Did you notice a familiar offset? You didn’t? Alright, let me zoom in for you.

C’mon, it should ring a bell now! It doesn’t? Dayum, it’s the indirect jump address from the PLT entry to the GOT entry! In other words, that relocation type is shouting loud and clear “Find the address of puts() in libc and patch the offset 0x804a00c in the file with the address of puts().” Let me refresh your memory.

Ok, time to focus! If you’ve been sleeping while reading thus far, it’s time wake up. As you have probably noticed, the relocation offset is the same as the address that the puts() PLT entry jumps to. Since, puts() is being called for the first time, the dynamic linker has to resolve its address and it’s going to accomplish that by placing its address in the GOT entry for puts(). As I said earlier, both PLT and GOT are filled with address entries, after the dynamic linker gets its address resolution job done ofcourse. Let’s have a look at the 0x804a00c address.

As you can see, the address belongs to the GOT section of our program. Let’s zoom in once again.

The e6 82 is an address. Specifically, it’s the address 0x80482e6. Why is that? Well, that’s because my machine is using little endian, which means it reverses the byte order, thus it appears as e6 82. I’m not sure as to why it’s showing only e6 82 and not the whole address. I’d say it’s some kind of compiler optimization. I’m 200% sure though that it does indeed point to the 0x80482e6 address. Anyway, let’s have a look at the PLT section of our program once again.

Do you see what I see? The 0x80482e6 address belongs to the second instruction in the PLT section ( in this case, push $0x0). So, jmp *0x804a00c jumps to the 0x804a00c address which contains the 0x80482e6 address within, which is the push $0x0 instruction. That push instruction plays an important role, which is to push the GOT entry for puts() on the stack. The GOT entry offset for puts() is 0x0, which refers to the first GOT entry that is being reserved by a shared library symbol. That makes sense since it’s the first function needing to be resolved, thus it’s taking over the first entry. Now, if you remember what I told you before, GOT and PLT are like arrays with indices/entries. But when I say “it’s taking over the first entry”, it’s actually being stored at the 4th entry in the GOT, the GOT[3]. Why is that? That’s because the previous entries are reserved for dynamic linking purposes.

GOT[0] - Reserved by the dynamic segment of the ELF file which contains important dynamic-linking info.

GOT[1] - Reserved by the address of a structure called link_map, which is used for symbol resolution. Basically contains information about puts().

GOT[2] - Contains the address to the dynamic linker's function, aka _dl_runtime_resolve(), which resolves the symbol address for the shared library function.

The last instruction in the PLT section is a jmp 80482d0. This address is a pointer to the first PLT entry in the executable. Let’s take a look at it.

Hang in there, we are almost done. The first instruction pushes the address of the second GOT entry (GOT[1]) on the stack. Finally, the jmp *0x804a008 is an indirect jump to the third GOT entry (GOT[2]), which contains the address of the dl_runtime_resolve() function, thus passing control to the dynamic linker and resolving the symbol’s address. Right after that, the dynamic linker will patch the PLT and GOT entries with the help of relocations. Meaning, the next time there is a call to puts() in our program, there won’t be any lazy binding process, but instead, there will be a brunch right into the function’s code itself.

##Summary

Let’s sum up the dynamic linking process to make our life easier:

-Calling puts() by jumping into the PLT section.
-Indirect jump into the address of the GOT.
-The GOT's address contains the address which points back to the 2nd PLT instruction.
-Push the first GOT entry on the stack, which is the entry for puts(), so that the dynamic linker can patch it later.
-Jump into the first entry in PLT.
-Push the address of GOT[1] on the stack which contains an offset pointing to the link_map structure for puts().
-Push the address of GOT[2], which is the dl_runtime_resolve() function in order to resolve the address of puts() in libc.
-Patch the relocations.

In conclusion, lazy binding increases performance at load time. That’s the default linking way nowadays but that can be tweaked by changing the LD_BIND_NOW environment variable.

Oh well, that’s been it. I know it was a lot to take in. I did my best not to bombard you with completely technical terms. If you are reading this sentence, you are a true champion. Thank you for taking the time to read my paper and I hope you gained even a small piece of knowledge from it. Any kind of feedback would be much appreciated. If you have any questions, feel free to comment them down below or PM me. I will provide some reference links in case you found this paper interesting.

Reference Links:
[1] http://phrack.org/issues/58/5.html
[2] https://www.technovelty.org/linux/plt-and-got-the-key-to-code-sharing-and-dynamic-libraries.html
[3] https://github.com/mewrev/dissection
[4] http://www.airs.com/blog/archives/41
[5] http://refspecs.linuxbase.org/elf/elf.pdf

Later,
@_py

0x00pf · September 13, 2016, 5:57pm

Brilliant post @_py

I see some serious heavy wizardry in this post.

_py · September 13, 2016, 6:01pm

I appreciate the feedback @0x00pf! I hope my explanation didn’t confuse you.

anon79434934 · September 13, 2016, 6:53pm

You know an article is good when I understand it.

-Phoenix750

shahril · May 10, 2017, 4:56pm

Thanks for the paper @_py!

Reading your newest article Linux Internals - The Art Of Symbol Resolution plus reading this again really makes sense, as I’m able to understand it in just one shot! I love how you introduced the concept of GOT through array indices [0][1][2], which really makes sense for me.

It’s actually e6 82 04 08 in the little-endian. It’s on the first and second line.

This is because the instruction jmp *0x804a00c will take the 4 bytes value at the 0x804a00c address. You can imagine that this instruction is the same as jmp dword ptr [0x804a00c].

But this is an old write up of yours, so I guess you already know about this thing.

Cheers, thanks again for the paper!

_py · May 10, 2017, 6:28pm

Thank you so much for the encouraging comment @shahril!

This is because the instruction jmp *0x804a00c will take the 4 bytes value at the 0x804a00c address. You can imagine that this instruction is the same as jmp dword ptr [0x804a00c].

You are right about that. That’s the Intel version of the instruction (which I prefer way more than AT&T’s tbh ). If I remember correctly, the official term is indirect jump. Weirdly enough objdump cut the address in half if you noticed while in my recent write-up GDB shows it fully, that’s why I was surprised in the beginning.

I love how you introduced the concept of GOT through array indices [0][1][2], which really makes sense for me.

I’m really glad! The ELF specs refer to the binary structures as tables (i.e symbol table, global offset table, relocation table), but in reality they are arrays either containing C structs or pointers.

If you enjoyed learning about GOT/PLT you might be interested in having a look at my most recent write-up on Bypassing ASLR via Format String Bug, where I abuse the GOT to redirect code execution. I’m hoping to release another write-up soon where I’ll be abusing PLT in order to leak addresses via ROP.

Thank you once again for taking the time to read them both. I hope you developed a mental model as to how the linking internals work.

Cheers!

levi · November 8, 2017, 6:32am

thanks for the post

actually it’s the offset of relocation entry for “puts” function in the relocation array for “PLT” , as shown form running “readelf -r obj” , the relocation entry for “puts” , is at offset 0x0 form the beginning of .rel.plt section.

again thanks for the post and ur effort trying to share knowledge

_py · November 8, 2017, 6:36am

It’s fully documented in my updated version, but thanks for the note.