(Part 1) ELF and the ARM processor: Writing a simple ARM debugger in C

Hello! Welcome to my first tutorial about the ELF format, the ARM processor, and how to code a debugger for it.

Ever wondered how a debugger works? Ever wondered how a debugger works on an ARM processor?

In this series of articles, I am going to go from start to finish on how to code a simple debugger for the ARM platform. There are many subtle, and some not-so-subtle, differences between x86 and ARM internals and the way they use the ptrace() syscall.

In this first article, I am going to cover how to access and print the ELF section and program headers so you can take a closer look at what goes on under the hood. However, this is not an ELF or C programming tutorial. I will briefly cover the most important parts, but you should be prepared to use the man pages if you are interested in a more detailed understanding.

Consider this an introduction to the techniques that will be used in the debugger.

Community Assigned Level:

  • Newbie
  • Wannabe
  • Hacker
  • Wizard
  • Guru

0 voters

Required Skills

  • Linux
  • Basics of the ELF format
  • C programming

Disclaimer

This article heavily draws upon the book Learning Linux Binary Analysis by Ryan “elfmaster” O’Neill. I highly recommend this book for anyone interested in the ELF format.


Meat

The ELF format is linux’s chosen executable file format. In fact, it stands for Executable and Linkable Format.
Within the ELF format are defined several structures which store program data.
The three structures we will be looking at are the Elf32_Ehdr, Elf32_Phdr, and Elf32_Shdr struct, which are the Elf Header, Program Header, and Section Headers structures, defined as:

This is the ELF header. It essentially marks this file as the ELF type, its architecture, and the entry point address where execution begins.

#define EI_NIDENT 16
   typedef struct {
       unsigned char e_ident[EI_NIDENT];
       uint16_t      e_type;
       uint16_t      e_machine;
       uint32_t      e_version;
       ElfN_Addr     e_entry;
       ElfN_Off      e_phoff;
       ElfN_Off      e_shoff;
       uint32_t      e_flags;
       uint16_t      e_ehsize;
       uint16_t      e_phentsize;
       uint16_t      e_phnum;
       uint16_t      e_shentsize;
       uint16_t      e_shnum;
       uint16_t      e_shstrndx;
   } Elf32_Ehdr;

According to “Learning Linux Binary Analysis”:
“ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory. The program header table can be accessed by referencing the offset found in the initial ELF header member called e__phoff”

typedef struct {
    uint32_t   p_type;       //segment type
    Elf32_Off  p_offset;     //segment offset
    Elf32_Addr p_vaddr;      //segment virtual address
    Elf32_Addr p_paddr;      //segment physical address
    uint32_t   p_filesz;     //size of segment in the file
    uint32_t   p_memsz;      //size of segment in memory
    uint32_t   p_flags;      //segment flags, I.E execute|read|read
    uint32_t   p_align;      //segment alignment in memory
  } Elf32_Phdr;

According to “Learning Linux Binary Analysis”
“A section header table exists to reference the location and size of these sections and is primarily for linking and debugging purposes. Section headers are not necessary for program execution, and a program will execute just fine without having a section header table. This is because the section header table doesn’t describe the program memory layout. That is the responsibility of the program header table. The section headers are really just complimentary to the program headers. The readelf –l command will show which sections are mapped to which segments, which helps to visualize the relationship between sections and segments.”

typedef struct {
uint32_t   sh_name;        // offset into shdr string table for shdr name
    uint32_t   sh_type;    // shdr type I.E SHT_PROGBITS
    uint32_t   sh_flags;   // shdr flags I.E SHT_WRITE|SHT_ALLOC
    Elf32_Addr sh_addr;    // address of where section begins
    Elf32_Off  sh_offset;  // offset of shdr from beginning of file
    uint32_t   sh_size;    // size that section takes up on disk
    uint32_t   sh_link;    // points to another section
    uint32_t   sh_info;    // interpretation depends on section type
uint32_t   sh_addralign;   // alignment for address of section
uint32_t   sh_entsize;     // size of each certain entries that may be in section
} Elf32_Shdr;

This is where our data is stored. Now, we can use some C code to access the members of these structures and print out their values. First, some setup is required:

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <elf.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv) {
   int fd, i;                      // File descriptor and index
   uint8_t *mem;                   // Variable we will use to mmap our executable
   struct stat st;                 // Our usual stat structure
   char *StringTable, *interp;
   
   // This is where we defined our ELF, program, and section header variables
   Elf32_Ehdr *ehdr;
   Elf32_Phdr *phdr;
   Elf32_Shdr *shdr;

   if(argc < 2) {
      printf("Usage: %s <executable>\n", argv[0]);
      exit(0);
   }

   if((fd = open(argv[1], O_RDONLY)) < 0) {
      perror("open");
      exit(EXIT_FAILURE);
   }
   
   if(fstat(fd, &st) < 0) {
      perror("fstat");
      exit(EXIT_FAILURE);
   }

Now that we have our variables set up, we can map our executable into memory and begin to read and print our values.

This is where we map our executable into memory:

   mem = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
   if(mem == MAP_FAILED) {
      perror("mmap");
      exit(EXIT_FAILURE);
   }

The initial ELF Header starts at offset 0 of our mapped memory:

   ehdr = (Elf32_Ehdr *)mem;

The shdr table and phdr table offsets are given by e_shoff and e_phoff members of the Elf32_Ehdr.

   phdr = (Elf32_Phdr *)&mem[ehdr->e_phoff];
   shdr = (Elf32_Shdr *)&mem[ehdr->e_shoff];

Check to see if the ELF magic (The first 4 bytes) match up as 0x7f E L F

   if(mem[0] != 0x7f && strcmp(&mem[1], "ELF")) {
      fprintf(stderr, "%s is not an ELF file\n", argv[1]);
      exit(EXIT_FAILURE);
   }

We are only parsing executables with this code so ET_EXEC marks an executable.

   if(ehdr->e_type != ET_EXEC) {
      fprintf(stderr, "%s is not an executable\n", argv[1]);
      exit(EXIT_FAILURE);
   }

Now that we have our program stored in mem, we can print its entry point address. Remember that the e_entry member holds the address of the entry point:

printf("Program Entry point: 0x%x\n", ehdr->e_entry);

Now, we have to find the string table for the section headers names. The e_shstrndx member holds this data:

StringTable = &mem[shdr[ehdr->e_shstrndx].sh_offset];

Next, we print each section header name and address. Notice we get the index into the string table
that contains each section header name with the shdr.sh_name member

printf("Section header list:\n\n");
for(i = 1; i < ehdr->e_shnum; i++)
  printf("%s: 0x%x\n", &StringTable[shdr[i].sh_name], shdr[i].sh_addr);

Finally, we print out each segment name, and address. Except for PT_INTERP we print the path to the dynamic linker.

printf("\nProgram header list\n\n");
for(i = 0; i < ehdr->e_phnum; i++) {   
  switch(phdr[i].p_type) {
     case PT_LOAD:
        /*
         * We know that text segment starts
         * at offset 0. And only one other
         * possible loadable segment exists
         * which is the data segment.
         */
        if(phdr[i].p_offset == 0)
           printf("Text segment: 0x%x\n", phdr[i].p_vaddr);
        else
           printf("Data segment: 0x%x\n", phdr[i].p_vaddr);
     break;
     case PT_INTERP:
        interp = strdup((char *)&mem[phdr[i].p_offset]);
        printf("Interpreter: %s\n", interp);
        break;
     case PT_NOTE:
        printf("Note segment: 0x%x\n", phdr[i].p_vaddr);
        break;
     case PT_DYNAMIC:
        printf("Dynamic segment: 0x%x\n", phdr[i].p_vaddr);
        break;
     case PT_PHDR:
        printf("Phdr segment: 0x%x\n", phdr[i].p_vaddr);
        break;
  }

}

Here is the entire program:

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <elf.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv) {
   int fd, i;
   uint8_t *mem;
   struct stat st;
   char *StringTable, *interp;
   
   Elf32_Ehdr *ehdr;
   Elf32_Phdr *phdr;
   Elf32_Shdr *shdr;

   if(argc < 2) {
      printf("Usage: %s <executable>\n", argv[0]);
      exit(0);
   }

   if((fd = open(argv[1], O_RDONLY)) < 0) {
      perror("open");
      exit(EXIT_FAILURE);
   }
   
   if(fstat(fd, &st) < 0) {
      perror("fstat");
      exit(EXIT_FAILURE);
   }
   
   mem = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
   if(mem == MAP_FAILED) {
      perror("mmap");
      exit(EXIT_FAILURE);
   }
   
   ehdr = (Elf32_Ehdr *)mem;
   phdr = (Elf32_Phdr *)&mem[ehdr->e_phoff];
   shdr = (Elf32_Shdr *)&mem[ehdr->e_shoff];
   
   if(mem[0] != 0x7f && strcmp(&mem[1], "ELF")) {
      fprintf(stderr, "%s is not an ELF file\n", argv[1]);
      exit(EXIT_FAILURE);
   }
   
   if(ehdr->e_type != ET_EXEC) {
      fprintf(stderr, "%s is not an executable\n", argv[1]);
      exit(EXIT_FAILURE);
   }

   printf("Program Entry point: 0x%x\n", ehdr->e_entry);
   
   StringTable = &mem[shdr[ehdr->e_shstrndx].sh_offset];

   printf("Section header list:\n\n");
   for(i = 1; i < ehdr->e_shnum; i++)
      printf("%s: 0x%x\n", &StringTable[shdr[i].sh_name], shdr[i].sh_addr);
   
   printf("\nProgram header list\n\n");
   for(i = 0; i < ehdr->e_phnum; i++) {   
      switch(phdr[i].p_type) {
         case PT_LOAD:
            if (phdr[i].p_offset == 0)
               printf("Text segment: 0x%x\n", phdr[i].p_vaddr);
            else
               printf("Data segment: 0x%x\n", phdr[i].p_vaddr);
         break;
         case PT_INTERP:
            interp = strdup((char *)&mem[phdr[i].p_offset]);
            printf("Interpreter: %s\n", interp);
            break;
         case PT_NOTE:
            printf("Note segment: 0x%x\n", phdr[i].p_vaddr);
            break;
         case PT_DYNAMIC:
            printf("Dynamic segment: 0x%x\n", phdr[i].p_vaddr);
            break;
         case PT_PHDR:
            printf("Phdr segment: 0x%x\n", phdr[i].p_vaddr);
            break;
      }
   }

   exit(0);
}

Conclusions

As you can see, being able to code an ELF parser gives us a lot of insight into the ELF format and how it is stored in memory. It doesn’t take that much C code to do it, either.

In my next article, we will be using the ptrace syscall to print the ARM registers of a program we trace. Here is a small sample:

/* The parent forks a child process to be traced */
if((pid = fork()) < 0) {
    perror("fork");
    exit(EXIT_FAILURE);
}
/* Use ptrace to begin tracing the child process */
if(pid == 0) {
    if(ptrace(PTRACE_TRACEME, pid, NULL, NULL) < 0) {
        perror("PTRACE_TRACEME");
        exit(EXIT_FAILURE);
    }
    execve(exec, args, envp);
    exit(EXIT_FAILURE);
}
10 Likes

Here’s some feedback:

  1. ELF isn’t a format only for executable files. There are also shared libraries and object files.

  2. Quoting every section of the book regarding ELF shows me that either you don’t understand the material in depth, or you didn’t bother trying to simplify the terms. Some drawings on how sections and segments look like would be nice for the newbies since those two interject with each other and can be confusing.

  3. You referred to structs and didn’t elaborate on them, such as the string table section (though not a must).

  4. You mentioned that data is stored in sections. That’s partially true. What about segments? Which brings my point about the difference between sections vs segments.

  5. I’d highly recommend to write an entire article between what truly sections and segments are since they are the bread and butter of debuggers and the ELF’s format itself.

Consider the above points as tips in case you really want to go deep into the binary internals. If you just want to provide a high level view, you’re doing a great job.

5 Likes

Thanks a lot for the feedback. I will take it under consideration.

Good writing.

Also make note of @_py 's comments…

I look forward to the second instalment.


I believe you’re missing a free in the PT_INTERP case.

Is it true that the exit library function frees memory as part of its functionality?

Thanks for your article, I learned a lot !

It was my understanding that memory is “freed” on exit when the process is taken out of memory.

The whole point of free is to be able to reuse memory… After all, memory is finite – scarce on Windows.

Unused RAM is wasted RAM. :wink:

4 Likes

This topic was automatically closed after 30 days. New replies are no longer allowed.