Dissecting and exploiting ELF files

ricksanchez · June 26, 2018, 11:02am

Preface

Hi folks!
For quite some time there was no article from my side.
Life kept me busy with all sorts of things, but here is a little something until some cooler project emerges .
This article will focus on explaining the ELF file format.
While this may seem like a really boring and very theory heavy research topic I actually had a lot of fun during my time digging through the available literature and trying things out on my own.

This topic offers a huge amount of information, and I am by no means an expert since the following chapters are all knowledge obtained through self studies.
I’ll try to keep it short and precise and convey my message in a fun way with practical examples.
That said let’s dive right into the table of contents!
The article will be split into the following parts:

General information
The ELF file format dissected
- 2.1. a practical example
A behavior study on the file size on x64
Misc
Final conclusion

Required skills

basic understanding of C, assembly, unix systems
about half an hour of time

General information

So first of all why would you and I want to bother learning a specified file format that was adopted as a system default in UNIX systems almost 20 years ago?

 “If you know the enemy and know yourself, you need not fear the result of a hundred battles.
 If you know yourself but not the enemy, for every victory gained you will also suffer a defeat.
 If you know neither the enemy nor yourself, you will succumb in every battle.”  - Sun Tzu

You might think what the hell am I talking about and why would I include this quote here?
You hopefully will realize what I realized when digging into this topic more and more and hence my reasoning here.
To me having this quote is no sign of being fancy and feeling superior. I just realized that even after quite some time it always comes back to this: “You know nothing”!
And I want to change that to get to at least the level of “You know something” .

So if you want to dive into reverse engineering/binary exploitation in UNIX flavored systems you will have to study the internals of such a system.
And one essential part of these are ELF files, since they are used for executables, shared libraries, object files, core-dump files, and even the kernel boot image!

So is this article only for people interested in reverse engineering?

No! If you’re a curious mind or wanting to learn more about UNIX flavored systems in general you’re at the right place (I hope…)!

With that being said let’s directly jump into the beefy part of this article.

The ELF file format dissected

Note: I will not and cannot present every detail of the ELF file format in this article.
The topic is a true rabbit hole and I really suggest doing your own research and reading.
I will add several sources at the end of this that helped me greatly! I recommend continuing from there if you got interested

Let’s start with a general layout on how a typical ELF file is structured:

   Linking View            Execution View

+-----------------+     +-----------------+
|  ELF header     |     |  ELF header     |
+-----------------+     +-----------------+
|  Program header |     |  Program header |
|  table (opt.)   |     |  table          |
+-----------------+     +-----------------+
|    Section 1    |     |                 |
+-----------------+     |    Segment 1    |
|       ...       |     |                 |
+-----------------+     +-----------------+
|    Section n    |     |                 |
+-----------------+     |    Segment 2    |
|       ...       |     |                 |
+-----------------+     +-----------------+
|       ...       |     |       ...       |
+-----------------+     +-----------------+
|  Section header |     |  Section header |
|  table          |     |  table (opt.)   |
+-----------------+     +-----------------+

As you can see an ELF file has at least 2 headers that are always present.

* ELF header (ELF32_Ehdr/ELF64_Ehdr) and the program header (Elf32_Phdr/struct Elf64_Phdr struct), or
* ELF header (ELF32_Ehdr/ELF64_Ehdr) and the section header (Elf32_Shdr/struct Elf64_Shdr struct)
-> Elf32 and Elf64 are each representing the architecture either being x86 or x64.

The two different views can be roughly distinguished by the following definitions:

The linking view is divided into sections and is used when linking of a library and program takes place. The sections contain information about object files like data, instructions, debugging information, symbols or relocation information.
The execution view is divided into segments and is as the name suggests used during program execution. The segments go hand in hand with the program header table as shown later.

Let’s take a closer look at the core elements. I’ll be focusing on x64 from now on, but for x86 this can be done analogously. Mostly the used and allocated space for structures differ between these two architectures.

The ELF header

First of all we do have the ELF header that is 64 bytes big on 64-bit machines and defined as follows:

[...]
/* 64-bit ELF base types. */
typedef __u64	Elf64_Addr;     /* 8 byte (unsigned) */
typedef __u16	Elf64_Half;     /* 2 byte (unsigned) */
typedef __s16	Elf64_SHalf;    /* 8 byte (signed) */
typedef __u64	Elf64_Off;      /* 8 byte (unsigned) */
typedef __s32	Elf64_Sword;    /* 4 byte (signed) */
typedef __u32	Elf64_Word;     /* 4 byte (signed) */
typedef __u64	Elf64_Xword;    /* 8 byte (unsigned) */
typedef __s64 Elf64_Sxword;     /* 8 byte (signed) */

[...]
#define EI_NIDENT	16

typedef struct elf64_hdr {
  unsigned char	e_ident[EI_NIDENT];	/* ELF "magic number" */
  Elf64_Half e_type;        /* Object file type */
  Elf64_Half e_machine;     /* Architecture */
  Elf64_Word e_version;     /* Object file version */
  Elf64_Addr e_entry;	   /* Entry point virtual address */
  Elf64_Off e_phoff;	   /* Program header table file offset */
  Elf64_Off e_shoff;      /* Section header table file offset */
  Elf64_Word e_flags;       /* Processor-specific flags */
  Elf64_Half e_ehsize;      /* ELF header size in bytes */
  Elf64_Half e_phentsize;   /* Program header table entry size */
  Elf64_Half e_phnum;       /* Program header table entry count */
  Elf64_Half e_shentsize;   /* Section header table entry size */
  Elf64_Half e_shnum;       /* Section header table entry count */
  Elf64_Half e_shstrndx;    /* Section header string table index */
} Elf64_Ehdr;

[...]

The header structure is not difficult to grasp. It always holds all the information as some kind of road map for the binary within the very first bytes of the file.
Just from these information we can already conclude a lot of information about the binary.
Let’s quickly go through the non address/size ones:

e_ident: initial magic bytes that provide an answer for the OS on how to interpret and decode the files contents.
e_type: identifies the file type. e.g: an executable or shared object file, …
e_machine: specifies the required architecture for a file. e.g: x86-64, ARM, MIPS, …
e_version: usually set to 1

The next values all specify certain offset, size, address values for the section header and program header values which we will discuss next. I’ll come back to these ELF header fields in due time.

The program header(s)

A program header describes segments within a binary and are necessary for program loading.
These segments (1 or more) are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory.
Since it helps in creating a process image a program header table becomes mandatory for executable files, but is optional for relocatable and shared object files (Linking View vs Execution View).
Program headers do not exit in relocatable objects because these *.o files are meant to be linked into an executable but not meant to be loaded directly into memory!

The program header structure is specified as follows:

[...]
typedef struct elf64_phdr {
  Elf64_Word p_type;        /* Segment type */
  Elf64_Word p_flags;       /* Segment flags */
  Elf64_Off p_offset;		/* Segment file offset */
  Elf64_Addr p_vaddr;		/* Segment virtual address */
  Elf64_Addr p_paddr;		/* Segment physical address */
  Elf64_Xword p_filesz;		/* Segment size in file */
  Elf64_Xword p_memsz;		/* Segment size in memory */
  Elf64_Xword p_align;		/* Segment alignment, file & memory */
} Elf64_Phdr;
[...]

p_type: identifies the type of the segment. e.g.: loadable segment, dynamic linking tables, …
p_flags: specifies the attributes/permissions of the current segment. e.g.: 0x3 for R+W permissions
p_offset: contains the offset of the segment from the beginning of the file
p_vaddr: contains the virtual address of the segment in memory
p_paddr: reserved for systems with physical addressing
p_filesz: contains the size of the file image of the segment
p_memsz: contains the size of the memory image of the segment
p_align: some alignment bytes with the power of 2

Since a program can have multiple program segments there are n program headers within a binary.
The ELF header gives us all the information about where those are and how many of them exist:

e_phoff: Points to the start of the program header table
e_phentsize: Contains the size of one program header table entry.
e_phnum: Contains the number of entries in the program header table.

Note: We can easily calculate the size of the program header by doing e_phentsize * e_phnum

There are about 5 very common program headers that I will introduce now:

PT_LOAD

An executable will always have at least one of these. It’s describing a loadable segment that is mapped into memory upon loading it.
If we take a dynamically linked ELF executable we generally have 2 of these segments right away:

one for the text segment (program code)
another one for the data segment for global vars, and other dynamic linking information

These are going to be mapped into memory with a memory alignment specified by p_align.

PT_DYNAMIC

Once again let’s take the example of a dynamically linked ELF executable, but this time it contains necessary information for the dynamic linker.
If that’s the case this segment is present and contains tagged values and pointers for e.g.:

shared libraries that are linked
address/location of the GOT
information about relocation entries

PT_NOTE

This segment is rather vendor/system specific as it can contain extra information.
Other programs can use these fields to check for conformance or compatibility for example.
This segment can hold any amount of entries.
Each of the entries are an array of 4-byte words with the processor specific endianess in mind.

PT_INTERP

This segment contains the location and size to a null terminated string that describes the program loader.
This is typically the dynamic linker.

For example in my local /bin/ls such a PT_INTERP segment can be found at the offset 0x238 and has the size 0x1c

$ hd -n 28 -s 568 /bin/ls
00000238  2f 6c 69 62 36 34 2f 6c  64 2d 6c 69 6e 75 78 2d  |/lib64/ld-linux-|
00000248  78 38 36 2d 36 34 2e 73  6f 2e 32 00              |x86-64.so.2.|
00000254

PT_PHDR

This one holds the location and size of the program header table itself.

The Section header(s)

As shown earlier a program header contains segments that are necessary for program execution.
Each of these segments hold either code or data that is divided into sections.
So in short a section header table references the location and size of these sections and is mainly used for linking/debugging purposes.
Section headers are not needed for correct program execution whereas program headers are!
That’s the case because section headers don’t map any memory layout for the binary.
So you can happily strip away a section header from a binary and it still will execute just fine, but debugging/reversing will be more difficult.

When taking a closer look at sections it appears each can hold code or data, for example program data, such as global variables, or dynamic linking information that
is necessary for the linker.
This makes clear why not having them in a binary makes the debugging process more difficult.

A section header is defined as follows:

[...]
typedef struct elf64_shdr {
  Elf64_Word sh_name;		/* Section name, index in string table */
  Elf64_Word sh_type;		/* Type of section */
  Elf64_Xword sh_flags;		/* Miscellaneous section attributes */
  Elf64_Addr sh_addr;		/* Section virtual addr at execution */
  Elf64_Off sh_offset;		/* Section file offset */
  Elf64_Xword sh_size;		/* Size of section in bytes */
  Elf64_Word sh_link;		/* Index of another section */
  Elf64_Word sh_info;		/* Additional section information */
  Elf64_Xword sh_addralign;	/* Section alignment */
  Elf64_Xword sh_entsize;	/* Entry size if section holds table */
} Elf64_Shdr;
[...]

The structure is very similar to the program header one, so I will only point out these fields that are different.

sh_name: An offset to a string in the .shstrtab section that represents the name of this section
sh_link: Points to another section
sh_info: Contains extra information about the section. Interpretation depends on section type
sh_entsize: Contains the size of each entry, for sections that contain fixed-size entries. Otherwise, this field contains zero.

Once again let’s quickly walk through the most common sections and what they are for!

.text

This section is a code section that contains the programs code instructions.

.rodata

This one contains read-only data for example strings from code like this printf("Wassup 0x00sec?\n");.
Moreover since it contains read-only data it must reside in a read only segment of the binary.

.plt

The procedure linkage table (PLT) contains information for the dynamic linker to be able to call functions from used shared libraries.

.data

This section resides in the data segment and contains data such as initialized global variables.

.bss

This section is similar to .data just with the fact that it contains uninitialized global data.

.got.plt

The global offset table (GOT) works together with the PLT to dynamically resolve and access imported shared library functions.
This one is often attacked in GOT-overwrite exploits.

.dynsym

This sections contains information about dynamic symbols imported from shared libraries e.g. printf from libc.
These are dynamically loaded at runtime.

.dynstr

.dynstr contains the string table for dynamic symbols that are each null terminated.

.rel.*

Any relocation section has information about certain parts of a binary that need to be adjusted/modified by the linker or at runtime.

.hash

This one contains a hash table with the purpose of being able to loop up symbols.

.symtab

The .symtab section contains all symbols from .dynsym as well as local symbols for the executable such as global vars, or local functions with type ElfN_Sym.
.symtab is not loaded into memory because it is not necessary for runtime. It’s mainly for debugging and linking purposes.

.strtab

.strtab contains the symbol string table that is references by an entry within ElfN_Sym structs.

.shstrtab

.shstrtab contains the section header string table that is used to resolve names for each section.
More precise in here are the string values for the sh_name field from the section header struct.
They can be accessed via an index/offset added on the sh_offset of this section.

.ctors/.dtors

The .ctors (constructors) and .dtors (destructors) sections contain function pointers to initialization and finalization code that is to be executed before and after the actual main() body of program code.

Interim conclusion 1

This was a lot of information already, but I hope you’re still following along.
If I were to summarize each of these 3 constructs I’d say the following:

An ELF header resides at the beginning and holds a road map describing the files organization.
A program header within the program header table is necessary for an executable file to map the binary correctly into memory.
A section within the section header table holds object file information for the linking view like instructions, data, symbol table, relocation information, and more, but is optional for executable files.

A practical example

So what can we do with all this information about how a ELF file is structured and which bytes are stored at which place?
The first project that came to my mind is take apart any x86/x64 ELF binary and parse the magic out of it.
Then I remembered we already got exactly this in form of readelf on basically any UNIX falvored system…
It does exactly this. It looks at all the necessary byte positions of a binary and gives a more human readable output corresponding to the present byte positions…
That did not stop me from trying to do the exact same thing and I began firing up my editor and hacked away some super beautiful python code step after step.

Let’s go through the theory from the previous section with the aid of an example to clear things up.
To make things easy we will take a look at /bin/ls as our test binary.
In my case it’s looking like this:

$ file /bin/ls
/bin/ls: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=9567f9a28e66f4d7ec4baf31cfbf68d0410f0ae6, stripped

As we’re dealing with a 64-bit binary we need to keep in mind the appropriate data types and with that the overall size of each structure!
So the ELF header, the initial data structure to be found in an ELF binary is 64 bytes long.
If we take a look at the corresponding hexdump of the first 64 bytes we can see the following:

$ hd -n 64 /bin/ls
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  03 00 3e 00 01 00 00 00  50 58 00 00 00 00 00 00  |..>.....PX......|
00000020  40 00 00 00 00 00 00 00  a0 03 02 00 00 00 00 00  |@...............|
00000030  00 00 00 00 40 00 38 00  09 00 40 00 1c 00 1b 00  |[email protected]...@.....|

If you recall the Elf64_Ehdr struct from earlier you can easily translate the bytes to the appropriate fields.
In this header it roughly will look like this:

e_ident[EI_IDENT]:
    e_indent[MAG1,..,MAG3] byte 0-3 -> 7f 45 4c 46
    e_indent[EI_CLASS] byte 4 -> 02
    e_indent[EI_DATA] byte 5 -> 01
    e_indent[EI_VERSION] byte 6 -> 01
    e_indent[EI_OSABI] byte 7 -> 00
    e_indent[EI_ABIVERSION] byte 8 -> 00
    e_indent[EI_PAD] byte 9-15 -> 00 00 00 00 00 00 00 00
e_type: byte 16-17 -> 03 00
e_machine: byte 18-19 -> 3e 00
e_version: byte 20-23 -> 01 00 00 00
e_entry: byte 24-31 -> 50 58 00 00 00 00 00 00
e_phoff: byte byte 32-39 -> 40 00 00 00 00 00 00 00
e_shoff: byte 40-47 ->  a0 03 02 00 00 00 00 00
e_flags: byte 48-51 -> 00 00 00 00
e_ehsize: byte 52-53 -> 40 00
e_phentsize: byte 54-55 -> 38 00
e_phnum: byte: byte 56-57 -> 09 00
e_shentsize: byte 58-59 -> 40 00
e_shnum: byte 60-61 -> 1c 00
e_shstrndx: byte 63-63 -> 1b 00

This obviously is not quite human readable.
Luckily certain bytes have a specfic meaning and hence can be replaced by a fitting ASCII string representation:

$ python3 parser.py -e /bin/ls
                    ELF HEADER
------------------  -------------------------------------
e_ident_EI_MAG      7f 45 4c 46 (valid ELF magic)
e_ident_EI_CLASS    64-bit
e_ident_EI_DATA     little-endian
e_ident_EI_VERSION  1 (current version)
e_ident_EI_OSABI    System V
e_ident_EI_PAD      0x0
e_type              ET_DYN (Shared object file)
e_machine           x86-64
e_version           0x1
e_entry             0x5850
e_phoff             0x40 (64 bytes into this file)
e_shoff             0x203a0 (132000 bytes into this file)
e_flags             0x0
e_ehsize            0x40 (64 bytes)
e_phentsize         0x38 (56 bytes)
e_phnum             0x9 (9)
e_shentsize         0x40 (64 bytes)
e_shnum             0x1c (28)
e_shstridx          0x1b (27)
--------------------  -------------------------------------

The same approach can be taken for the program headers and section headers within the binary.
The only difference being that we usually have multiple of each.

Let’s walk through an example for both file sections.

Starting with one program header in /bin/ls.
In the ELF header we found out that the the program header has a size of 56 bytes and the first one starts 64 bytes into the file.

$ hd -n 56 -s 64 /bin/ls
00000040  06 00 00 00 05 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000050  40 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |@.......@.......|
00000060  f8 01 00 00 00 00 00 00  f8 01 00 00 00 00 00 00  |................|
00000070  08 00 00 00 00 00 00 00                           |........|
00000078

So let’s identify the values again!

p_type: bytes 0-3 -> 06 00 00 00
p_flags: bytes 4-7 -> 05 00 00 00
p_offset: bytes 8-15 -> 40 00 00 00 00 00 00 00
p_vaddr: bytes 16-23 -> 40 00 00 00 00 00 00 00
p_paddr: bytes 24-31 -> 40 00 00 00 00 00 00 00
p_filesz: bytes 32-39 -> f8 01 00 00 00 00 00 00
p_memsz: bytes 40-47 -> f8 01 00 00 00 00 00 00
p_align: bytes 48-55 -> 08 00 00 00 00 00 00 00

The hexdump values nicely translate to the specified program header fields.
When translating these values in a human readable form again we can format it like that:

$ python3 parser.py -p /bin/ls
          FOUND PROGRAM HEADER
--------  ------------------------------
p_type    PT_PHDR
p_offset  0x40 (64 bytes into this file)
p_vaddr   0x40
p_paddr   0x40
p_filesz  0x1f8 (504 bytes)
p_memsz   0x1f8 (504 bytes)
p_flags   read, execute
p_align   0x8
--------------------------  ----------------------

Last but not least let’s do a last walk though for a section header.
It will be the same approach but for the sake of completeness let’s do it!
One of them is located at the following memory range:

$ hd -s 132064 -n 64 /bin/ls
000203e0  0b 00 00 00 01 00 00 00  02 00 00 00 00 00 00 00  |................|
000203f0  38 02 00 00 00 00 00 00  38 02 00 00 00 00 00 00  |8.......8.......|
00020400  1c 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00020410  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00020420

Note: The ELF header gives us all the tools to calculate the memory ranges for each section header again:
e_shoff * (e_shentsize + e_shnum), where e_shnum is the variable part to access each section header!
Now back to the same old game!

sh_name: bytes 0-3 -> 0b 00 00 00
sh_type: bytes 4-7 -> 01 00 00 00
sh_flags: bytes 8-15 -> 02 00 00 00 00 00 00 00
sh_addr: bytes 16-23 -> 38 02 00 00 00 00 00 00
sh_offset: bytes 24-31 -> 38 02 00 00 00 00 00 00
sh_size: bytes 32-39 -> 1c 00 00 00 00 00 00 00
sh_link: bytes 40-43 -> 00 00 00 00
sh_info: bytes 44-47 -> 00 00 00 00
sh_addralign: bytes 48-55 -> 01 00 00 00 00 00 00 00
sh_entsize: bytes 56-53 -> 00 00 00 00 00 00 00 00

It’s time to convert our found bytes into a human readable form for one last time:

$ python3 parser.py -s /bin/ls
              FOUND SECTION HEADER
------------  --------------------------------------------
sh_name       .interp
sh_type       SHT_PROGBITS (Program data)
sh_flags      SHF_ALLOC (Occupies memory during execution)
sh_addr       0x238
sh_offset     0x238 (568 bytes into this file)
sh_size       0x1c (28 bytes)
sh_link       0x0
sh_info       0x0
sh_addralign  0x1
sh_entsize    0x0
------------  --------------------------------------------

This is it!
I hope this little system binary walkthrough highlighted the most important bits and pieces from the prior theory heavy part.

Note: One thing I did not mention before is that we have to keep in mind the endianness to access the right memory locations

Interim conclusion 2

What I just walked through step by step with you is in the end exactly what readelf does.
So why did I do this?
In my opinion solely relying on tools might work most of the times, but understanding the core concepts of something ultimately deepens your knowledge and helps you getting an advantage.
Furthermore if you actively work on little projects like these you might find further points of interest within the same project to take a deeper look at as I will next.

So should you write your own implementation of toolXY?
When it comes to system binaries I disagree. They’ve been added to the default system installation routine for a reason.
They’ve gone through many iterations and code improvements already and are well tested.
If you really should find a flaw or have a brilliant idea to extend a tool try contributing to the tool in question instead so everyone might be able to benefit from it later !

A behavior study on the file size on x64

Disclaimer: 
The following idea below is taken from: [A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux]! See below for the source!
He did an awesome job with his article and I translated it to the x64 arch.
All props to him!

So after all that ELF file theory and digging around the ABI and internals in the previous post I wondered what would be the smallest valid ELF file which at least the most basic functionality.
This obviously would be something like a print statement so we can see some output on the screen.
So let’s test how small can we go with a simple hello world program written in C.

My initial hello_world.c file was something like this for the sake of having some starting reference:

#include <stdio.h>

int main(void) {
    printf("5");
    return;
}

Let’s compile this with the usual flags since it’s good enough for this experiment:

gcc -Wall HW_printf.c -o HW_printf

Note: gcc -Wall enables all compiler’s warning messages.

$ ./HW_printf
5%

So it does what we had in mind. A simple ‘message’ is printed upon executing our file.
To check the size of our program we can use the linux wc -c UNIX utility to count the bytes of our binary:

$ wc -c HW_printf
8296 HW_printf

Our first iteration of a hello world program is close to 8.3k bytes in size
So what happens if we remove the library import and printf statement and just bluntly return 0 in our main function.

int main(void) {
    return 5;
}

This pretty much looks as minimal as we can go.
This surely should have gotten rid of a lot of bytes right?

$ wc -c HW_only_return
8168 HW_only_return

Not as much as one would expect to be honest. We only got rid of around 100 measly bytes…
I’m sure we can do better!
But first let’s see if our program still works as intended!

The correct execution can be tested with the following command:

$ ./HW_only_return_stripped; echo $?
5

Since we’re only returning 5 and not really printing it to console we have to use this little workaround to get the output back on screen

Note: ‘$?’ is the return code from the last run process that we echo to console output.
We’re explicitly pushing 5 as a return value to the stack in out program that is printed as a result of our usage of echo $?.

Next up let’s make use of our compiler flags and strip the binary from before and see how much optimization the compiler does for us!

$ gcc -Wall -s HW_only_return.c -o HW_only_return_stripped

Note: gcc -s removes all symbol table and relocation information from the executable.

$ wc -c HW_only_return_stripped
6048 HW_only_return_stripped

The stripping actually removed over 2000 bytes, which is quite impressive if you put that in comparison.
Our binary shrank by around 25%! But is this all we can do?
We’re quite limited with what we can do when staying with C actually. We just have a one liner left in our program.
So let’s go even lower!
Time to create some rudimentary ASM hello_world program that does what we want!

; HW_1.asm              ; comment
BITS 64                 ; BITS directive to specify processor mode
GLOBAL main             ; GLOBAL directive exports 'main' so its accessible througout our code
SECTION .text           ; SECTION directive changes which section of the output file the code will be assembled into
main:
        mov rax, 5
        ret

We can then build and test it with:

$ nasm -f elf64 HW_1.asm
$ gcc -Wall -s HW_1.o -o HW_1_asm
$ ./HW_1_asm; echo $?
5

So this assembly code does what we want!
How about its size?

$ wc -c HW_1_asm
6048 HW_1_asm

That’s… rather not fun. Our super low level wizardry didn’t do anything at all!
This brings the following questions to the table:

Why is there no overhead induced by the C programming language that was removed in our assembly code?
What other option do we have to reduce the file size of an ordinary 64 bit elf file?

A possible answer for the first question is that we were using the main() interface in our assembly code.
By using this the linker adds some things like an OS specific interface that is calling main() in the end.
This induces some code overhead, which we do not want nor need!
To get rid of that we need to avoid using a main()-like function construct.
Let’s modify our assembly code as follows:

; HW_3.asm
BITS 64
GLOBAL _start       ; _start default entry point for linker
SECTION .text
_start:             ; new code
    xor rax, rax
    inc al
    mov bl, 5
    int 0x80

When building the program before the linker included such a _start function by default too, so why not make use of it by writing our very own?
This _start function is basically just a symbol for the linker to locate the entry point of our program.

So let’s compile and see if our assembly is working and especially if our modification actually did result in a smaller elf file!

$ nasm -f elf64 HW_3.asm
$ gcc -Wall -s -nostartfiles HW_3.o -o HW_3_asm
$ ./HW_3_asm; echo $?
5

Note: -nostartfiles: do not use the standard system startup files when linking.
The standard system libraries are used normally, unless -nostdlib or -nodefaultlibs is used.
We do need this option because of our very own _start function

Let’s quickly inspect the program within GDB:

LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────
*RAX  0x1c
 RBX  0x5
*RCX  0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
*RDX  0x7ffff7de59a0 ◂— push   rbp
*RDI  0x7ffff7ffe170 —▸ 0x555555554000 ◂— jg     0x555555554047
*RSI  0x7ffff7ffe700 ◂— 0
 R8   0x0
 R9   0x0
*R10  0x7ffff7ffe170 —▸ 0x555555554000 ◂— jg     0x555555554047
*R11  0x206
*R12  0x555555554250 ◂— xor    rax, rax
*R13  0x7fffffffdec0 ◂— 0x1
 R14  0x0
 R15  0x0
 RBP  0x0
 RSP  0x7fffffffdec0 ◂— 0x1
*RIP  0x555555554250 ◂— xor    rax, rax
─────────────────────────────────────────────────────────────────[ DISASM ]─────────────────────────────────────────────────────────────────
 ► 0x555555554250    xor    rax, rax
   0x555555554253    mov    dl, 0
   0x555555554255    int    0x80
   0x555555554257    add    byte ptr [rax], al
   0x555555554259    add    byte ptr [rax], al
   0x55555555425b    add    byte ptr [rax], al
   0x55555555425d    add    byte ptr [rax], al
   0x55555555425f    add    byte ptr [rax], al
   0x555555554261    add    byte ptr [rax], al
   0x555555554263    add    byte ptr [rax], al
   0x555555554265    add    byte ptr [rax], al
─────────────────────────────────────────────────────────────────[ STACK ]──────────────────────────────────────────────────────────────────
00:0000│ r13 rsp  0x7fffffffdec0 ◂— 0x1
01:0008│          0x7fffffffdec8 —▸ 0x7fffffffe249 ◂— 0x756e2f656d6f682f ('/home/la')
02:0010│          0x7fffffffded0 ◂— 0x0
03:0018│ rcx      0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
04:0020│          0x7fffffffdee0 —▸ 0x7fffffffe2ac ◂— 0x464e4f435f474458 ('XDG_CONF')
05:0028│          0x7fffffffdee8 —▸ 0x7fffffffe2e3 ◂— 0x50454c45545f434c ('LC_TELEP')
06:0030│          0x7fffffffdef0 —▸ 0x7fffffffe2fc ◂— 0x5f6e653d474e414c ('LANG=en_')
07:0038│          0x7fffffffdef8 —▸ 0x7fffffffe30d ◂— 0x313d4c564c4853 /* 'SHLVL=1' */
───────────────────────────────────────────────────────────────[ BACKTRACE ]────────────────────────────────────────────────────────────────
 ► f 0     555555554250
   f 1                1
   f 2     7fffffffe249
   f 3                0
Breakpoint *0x555555554250
pwndbg>

We can clearly see our code in the [ DISASM ] area of pwndbg but there is no real return address!
So how does our program exit properly?
The magic is in the 0x80 command.
On Linux systems this is a system call, more specifically an interrupt.
Depending on the values in certain registers the system call will handle differents needs.
In our case we would like to exit the program with a status code 5.

If we take a look at the status codes for a syscall we can see

#define __NR_exit                 1

This is exactly what we want.
With that in mind look back at the assembly code.
We’re preparing our rax register to hold the value 1 to invoke _exit with the return value being set in rbx!

So what is the result of our optimization?

$ wc -c HW_2_asm
4832 HW_2_asm

By using a custom _start entry point coupled with a fitting setup for our registers and a syscall we did get rid of another 1.2k bytes!

And the file is indeed still a valid ELF 64-bit binary!

$ file HW_3_asm
HW_3_asm: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=12c38b5a5c12a71d1d30f6501ab0fd3c6b96def7, stripped

So what now?
We reached a point where we almost halfed the size of our initial C program with 8296 bytes in size compared to only 4832 bytes now!
Now we only could try to optimize our assembly by using shorter/different instructions, which in my opinion is close to impossible by now.
And even if we did find a ‘smarter’ assembly way it most likely would result in <10 bytes in file size reduction.
That said I played around a bit more with that idea but could not find a way to significantly reduce the size of our binary even further.
But is this the smallest we can go…?

How about we code our ELF binary from the ground up for some more experience ?
All we need is the official ELF file specification for x64 binaries, which we can pull directly from the Linux Github repo.

All we really need is the ELF header and the ELF program header, because the section header is optional for executables as we learned earler and hence unwanted overhead!
So our new program only should consists of the following code in the end:

/* 64-bit ELF base types. */
typedef __u64	Elf64_Addr;
typedef __u16	Elf64_Half;
typedef __s16	Elf64_SHalf;
typedef __u64	Elf64_Off;
typedef __s32	Elf64_Sword;
typedef __u32	Elf64_Word;
typedef __u64	Elf64_Xword;
typedef __s64 Elf64_Sxword;

[...]

typedef struct elf64_hdr {
  unsigned char	e_ident[EI_NIDENT];
  Elf64_Half e_type;
  Elf64_Half e_machine;
  Elf64_Word e_version;
  Elf64_Addr e_entry;
  Elf64_Off e_phoff;
  Elf64_Off e_shoff;
  Elf64_Word e_flags;
  Elf64_Half e_ehsize;
  Elf64_Half e_phentsize;
  Elf64_Half e_phnum;
  Elf64_Half e_shentsize;
  Elf64_Half e_shnum;
  Elf64_Half e_shstrndx;
} Elf64_Ehdr;

[...]

typedef struct elf64_phdr {
  Elf64_Word p_type;
  Elf64_Word p_flags;
  Elf64_Off p_offset;
  Elf64_Addr p_vaddr;
  Elf64_Addr p_paddr;
  Elf64_Xword p_filesz;
  Elf64_Xword p_memsz;
  Elf64_Xword p_align;
} Elf64_Phdr;

These two C structs can directly translated to some assembly:

; HW_4.asm
BITS 64
ehdr:                                          ; ELF64_Ehdr
        db  0x7F, "ELF", 2, 1, 1, 0            ; e_indent
times 8 db  0                                  ; EI_PAD
        dw  3                                  ; e_type
        dw  0x3e                               ; e_machine
        dd  1                                  ; e_version
        dq  _start                             ; e_entry
        dq  phdr - $$                          ; e_phoff
        dq  0                                  ; e_shoff
        dd  0                                  ; e_flags
        dw  ehdrsize                           ; e_ehsize
        dw  phdrsize                           ; e_phentsize
        dw  1                                  ; e_phnum
        dw  0                                  ; e_shentsize
        dw  0                                  ; e_shnum
        dw  0                                  ; e_shstrndx

ehdrsize    equ $ - ehdr

phdr:                                          ; ELF64_Phdr
        dd  1                                  ; p_type
        dd  5                                  ; p_flags
        dq  0                                  ; p_offset
        dq  $$                                 ; p_vaddr
        dq  $$                                 ; p_paddr
        dq  filesize                           ; p_filesz
        dq  filesize                           ; p_memsz
        dq  0x1000                             ; p_align

phdrsize    equ $ - phdr

_start:
        xor al, al
        inc al
        mov bl, 5
        int 0x80

filesize    equ $ - $$

Note: the translation between C data types and assembly data types is pretty straight forward
db (define byte) - allocates 1 byte
dw (define word) - allocates 2 bytes
dd (define doubleword) - allocates 4 bytes
dq (define quadword) - allocates 8 bytes
dt (define ten bytes) - allocates 10 bytes

Note: $ is used to refer to the current address and $$ is used to refer to the address of the start of current section in assembly.

If we try building it together via:

$ nasm -f bin -o HW_4.out HW_4.asm && chmod +x HW_4.out
$ ./HW_4.out; echo $?
5

SWEET! We just built a super minimal file, which seems to execute how we would expect it.
What does the file command say about our little file?

$ file HW_4.out
HW_4.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, corrupted section header size

It seems to recognize it as a valid ELF file with only a corrupted section header size!
That sounds totally logical, since we did not include any…
Let’s check the behavior in GDB:

LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
───────────────────────────────────────────────────────────────[ REGISTERS ]────────────────────────────────────────────────────────────────
 RAX  0x1
 RBX  0x5
 RCX  0x0
 RDX  0x0
 RDI  0x0
 RSI  0x0
 R8   0x0
 R9   0x0
 R10  0x0
 R11  0x0
 R12  0x0
 R13  0x0
 R14  0x0
 R15  0x0
 RBP  0x0
 RSP  0x7fffffffdec0 ◂— 0x1
*RIP  0x7ffff7ffe07f ◂— int    0x80
─────────────────────────────────────────────────────────────────[ DISASM ]─────────────────────────────────────────────────────────────────
   0x7ffff7ffe078    xor    rax, rax
   0x7ffff7ffe07b    inc    al
   0x7ffff7ffe07d    mov    bl, 0
 ► 0x7ffff7ffe07f    int    0x80 <SYS_write>
        fd: 0x0
        buf: 0x0
        n: 0x0
   0x7ffff7ffe081    add    byte ptr [rax], al
   0x7ffff7ffe083    add    byte ptr [rax], al
   0x7ffff7ffe085    add    byte ptr [rax], al
   0x7ffff7ffe087    add    byte ptr [rax], al
   0x7ffff7ffe089    add    byte ptr [rax], al
   0x7ffff7ffe08b    add    byte ptr [rax], al
   0x7ffff7ffe08d    add    byte ptr [rax], al
─────────────────────────────────────────────────────────────────[ STACK ]──────────────────────────────────────────────────────────────────
00:0000│ rsp  0x7fffffffdec0 ◂— 0x1
01:0008│      0x7fffffffdec8 —▸ 0x7fffffffe249 ◂— 0x756e2f656d6f682f ('/home/la')
02:0010│      0x7fffffffded0 ◂— 0x0
03:0018│      0x7fffffffded8 —▸ 0x7fffffffe278 ◂— 0x544145535f474458 ('XDG_SEAT')
04:0020│      0x7fffffffdee0 —▸ 0x7fffffffe2ac ◂— 0x464e4f435f474458 ('XDG_CONF')
05:0028│      0x7fffffffdee8 —▸ 0x7fffffffe2e3 ◂— 0x50454c45545f434c ('LC_TELEP')
06:0030│      0x7fffffffdef0 —▸ 0x7fffffffe2fc ◂— 0x5f6e653d474e414c ('LANG=en_')
07:0038│      0x7fffffffdef8 —▸ 0x7fffffffe30d ◂— 0x313d4c564c4853 /* 'SHLVL=1' */
───────────────────────────────────────────────────────────────[ BACKTRACE ]────────────────────────────────────────────────────────────────
 ► f 0     7ffff7ffe07f
   f 1                1
   f 2     7fffffffe249
   f 3                0
pwndbg>
[Inferior 1 (process 17169) exited with code 05]
Warning: not running or target is remote
pwndbg>

It does look pretty similar to before. All our code is there, the register are set accordingly and the exited with code 05 indicates that it indeed did what it should have done!
But what about the file size?

$ wc -c HW_4.out
128 HW_4.out

Only 128 bytes! Thats a massive reduction!
Remember? our first candidate had almost 8.2k bytes this one <0.13k with the same program behavior!
The only way to reduce the file size even further would be to remove specified ELF header - or ELF program header fields with the assumption of them not being needed even when specified!

Interim conclusion 3

So what are the key takeaways for this section?
Should you learn to write assembly code to save space?
Not at all. Space is not a limitation on current systems, at least in the server/desktop segment.
But even when it comes to embedded devices and IoT, where system ressources are often heavily limited this is not a valid choice anymore.
The process to write a fully functional application is tedious and takes wayyyy longer to test and debug too.

So what is it that you should take home from this section?
For the most part that would be be curious!
Make the environment you work in your own and see how things work internally to find vectors for optimization or even a way for exploitation.

I hope you enjoyed this somewhat theory heavy dive into the ELF file format and can now understand the inner works a bit better.

Misc (aka ‘Super secure ELF encryption’)

Last but not least I wanted to mention a fun little finding.
As shown in the first section of this ELF file introduction the specified ELF standard has an 16 byte big magic bytes field that includes an 8 byte unused padding field e_ident[EI_PAD] right within the ELF header.

So what would happen if we write ‘random’ bytes to that position?
Would it break the ELF binary?

I’ll just end this article with the following code snippets to answer both these questions

$ file testbins/*
testbins/f1: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f2: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f3: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f4: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f5: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped
testbins/f6: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=270d4e4155c688b53260fd8fef55b6922b6e81d0, not stripped

As you can see I have 6 valid and totally legit 64 bit ELF binaries.
Notice how all have the same hash value?
So they should be identical right?

Their functionality is nothing fancy.
You just have to believe me that for all 6 binaries the behavior when executing is this:

$ ./testbins/f1 
Hello World by @0x00rick!%

readelf -h spits out the following:

$ readelf -h f1
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 55 72 6c 20 6c 62 68 21
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       85
  Type:                              DYN (Shared object file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x540
  Start of program headers:          64 (bytes into file)
  Start of section headers:          6448 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)

That looks pretty normal to me… except wait the magic bytes above show some strange behavior…

$ python3 elf_crypter.py -d /home/lab/GIT/ELF_magic/elf_enc/testbins
[!] decrypting padding bytes of f1: Hey you!
[!] decrypting padding bytes of f2: Listen..
[!] decrypting padding bytes of f3: It is I!
[!] decrypting padding bytes of f4: encry...
[!] decrypting padding bytes of f5: ...ption
[!] decrypting padding bytes of f6: RIIIIICK

So this is most likely the smartest way to store your master key on your system, split between valid binaries in 8 byte chunks.
Even when people should find the chunks they need to put them together in the correct order …

On a more serious note these binaries show that these padding bytes are indeed unused in the current standard!
We are free to play around with that space as much as we want.

But wait… if these 8 bytes are unused… couldn’t we stuff some opcodes in there and reduce the file size of our binary even further?!
If we take a quick look back at our latest attempt:

_start:
        xor al, al
        inc al
        mov bl, 5
        int 0x80

That looks like it could fit in 8 bytes of space!
Let’s assembly it:

"\x30\xc0\xfe\xc0\xb3\x05\xcd\x80"

And let’s change up our assembly code as follows:

; HW_5.asm
BITS 64
ehdr:                                          ; ELF64_Ehdr
        db  0x7F, "ELF", 2, 1, 1, 0            ; e_indent
        db  0x30, 0xc0, 0xfe, 0xc0, 0xb3, 0x05, 0xcd, 0x80 ; EI_PAD
        dw  3                                  ; e_type
[...]
_start:
        jmp ehdr + 8

[...]

Let’s test it!

$ nasm -f bin -o HW_5.out HW_5.asm && chmod +x HW_5.out
$ ./HW_5.out; echo $?
5

Hell yes!
What about its size?

$ wc -c HW_5.out
122 HW_5.out

We got rid of another 6 bytes !

Interim conclusion 4

This part should just be seen as a fun shenanigan at the end that you can come up with while digging into a research topic.
The key takeaway for you in this section should be that besides all this theory heavy research and learning never forget to have fun and be creative in your solutions and attempts!

Final conclusion

This is it. If you reached this point here you’ve read through it all.
Thanks for taking the time!

So what did we learn this time?

Structure of an ELF binary and its most common internals
Analysis of a default system binary and writing of a parser
File size analysis with some assembly wizardry to reduce the file size of a very simple program by about 98%
random shenanigan while understand the ELF file format

Lastly I want to address that all the used code examples and written tools for this article can be found on my github here.
Do not expect any pretty code. All of it has the ‘PoC’-stamp and just works (at the moment).

That marks the end of this article and I hope you enjoyed reading through it.
As always I’m appreciating feedback of any kind.

Sources

anon79434934 · June 26, 2018, 3:16pm

Bookmarked for future refference

0x00pf · June 26, 2018, 3:56pm

Great post @ricksanchez!

You can go a bit farther using C (Programming for Wannabes. Part II)… but we all know C is for wimps

Also this The Price of Scripting: DietLibC vs ASM … we had the same idea!

Congrats!

ricksanchez · June 26, 2018, 6:00pm

oops… Directly added those to the sources section!
How could I not have known that these exist!

ricksanchez · July 26, 2018, 11:02am

This topic was automatically closed after 30 days. New replies are no longer allowed.