Hello! Welcome to my first tutorial about the ELF format, the ARM processor, and how to code a debugger for it.
Ever wondered how a debugger works? Ever wondered how a debugger works on an ARM processor?
In this series of articles, I am going to go from start to finish on how to code a simple debugger for the ARM platform. There are many subtle, and some not-so-subtle, differences between x86 and ARM internals and the way they use the ptrace() syscall.
In this first article, I am going to cover how to access and print the ELF section and program headers so you can take a closer look at what goes on under the hood. However, this is not an ELF or C programming tutorial. I will briefly cover the most important parts, but you should be prepared to use the man pages if you are interested in a more detailed understanding.
Consider this an introduction to the techniques that will be used in the debugger.
Community Assigned Level:
- Newbie
- Wannabe
- Hacker
- Wizard
- Guru
0 voters
Required Skills
- Linux
- Basics of the ELF format
- C programming
Disclaimer
This article heavily draws upon the book Learning Linux Binary Analysis by Ryan “elfmaster” O’Neill. I highly recommend this book for anyone interested in the ELF format.
Meat
The ELF format is linux’s chosen executable file format. In fact, it stands for Executable and Linkable Format.
Within the ELF format are defined several structures which store program data.
The three structures we will be looking at are the Elf32_Ehdr, Elf32_Phdr, and Elf32_Shdr struct, which are the Elf Header, Program Header, and Section Headers structures, defined as:
This is the ELF header. It essentially marks this file as the ELF type, its architecture, and the entry point address where execution begins.
#define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
uint16_t e_type;
uint16_t e_machine;
uint32_t e_version;
ElfN_Addr e_entry;
ElfN_Off e_phoff;
ElfN_Off e_shoff;
uint32_t e_flags;
uint16_t e_ehsize;
uint16_t e_phentsize;
uint16_t e_phnum;
uint16_t e_shentsize;
uint16_t e_shnum;
uint16_t e_shstrndx;
} Elf32_Ehdr;
According to “Learning Linux Binary Analysis”:
“ELF program headers are what describe segments within a binary and are necessary for program loading. Segments are understood by the kernel during load time and describe the memory layout of an executable on disk and how it should translate to memory. The program header table can be accessed by referencing the offset found in the initial ELF header member called e__phoff”
typedef struct {
uint32_t p_type; //segment type
Elf32_Off p_offset; //segment offset
Elf32_Addr p_vaddr; //segment virtual address
Elf32_Addr p_paddr; //segment physical address
uint32_t p_filesz; //size of segment in the file
uint32_t p_memsz; //size of segment in memory
uint32_t p_flags; //segment flags, I.E execute|read|read
uint32_t p_align; //segment alignment in memory
} Elf32_Phdr;
According to “Learning Linux Binary Analysis”
“A section header table exists to reference the location and size of these sections and is primarily for linking and debugging purposes. Section headers are not necessary for program execution, and a program will execute just fine without having a section header table. This is because the section header table doesn’t describe the program memory layout. That is the responsibility of the program header table. The section headers are really just complimentary to the program headers. The readelf –l command will show which sections are mapped to which segments, which helps to visualize the relationship between sections and segments.”
typedef struct {
uint32_t sh_name; // offset into shdr string table for shdr name
uint32_t sh_type; // shdr type I.E SHT_PROGBITS
uint32_t sh_flags; // shdr flags I.E SHT_WRITE|SHT_ALLOC
Elf32_Addr sh_addr; // address of where section begins
Elf32_Off sh_offset; // offset of shdr from beginning of file
uint32_t sh_size; // size that section takes up on disk
uint32_t sh_link; // points to another section
uint32_t sh_info; // interpretation depends on section type
uint32_t sh_addralign; // alignment for address of section
uint32_t sh_entsize; // size of each certain entries that may be in section
} Elf32_Shdr;
This is where our data is stored. Now, we can use some C code to access the members of these structures and print out their values. First, some setup is required:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <elf.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv) {
int fd, i; // File descriptor and index
uint8_t *mem; // Variable we will use to mmap our executable
struct stat st; // Our usual stat structure
char *StringTable, *interp;
// This is where we defined our ELF, program, and section header variables
Elf32_Ehdr *ehdr;
Elf32_Phdr *phdr;
Elf32_Shdr *shdr;
if(argc < 2) {
printf("Usage: %s <executable>\n", argv[0]);
exit(0);
}
if((fd = open(argv[1], O_RDONLY)) < 0) {
perror("open");
exit(EXIT_FAILURE);
}
if(fstat(fd, &st) < 0) {
perror("fstat");
exit(EXIT_FAILURE);
}
Now that we have our variables set up, we can map our executable into memory and begin to read and print our values.
This is where we map our executable into memory:
mem = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if(mem == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
The initial ELF Header starts at offset 0 of our mapped memory:
ehdr = (Elf32_Ehdr *)mem;
The shdr table and phdr table offsets are given by e_shoff and e_phoff members of the Elf32_Ehdr.
phdr = (Elf32_Phdr *)&mem[ehdr->e_phoff];
shdr = (Elf32_Shdr *)&mem[ehdr->e_shoff];
Check to see if the ELF magic (The first 4 bytes) match up as 0x7f E L F
if(mem[0] != 0x7f && strcmp(&mem[1], "ELF")) {
fprintf(stderr, "%s is not an ELF file\n", argv[1]);
exit(EXIT_FAILURE);
}
We are only parsing executables with this code so ET_EXEC marks an executable.
if(ehdr->e_type != ET_EXEC) {
fprintf(stderr, "%s is not an executable\n", argv[1]);
exit(EXIT_FAILURE);
}
Now that we have our program stored in mem, we can print its entry point address. Remember that the e_entry member holds the address of the entry point:
printf("Program Entry point: 0x%x\n", ehdr->e_entry);
Now, we have to find the string table for the section headers names. The e_shstrndx member holds this data:
StringTable = &mem[shdr[ehdr->e_shstrndx].sh_offset];
Next, we print each section header name and address. Notice we get the index into the string table
that contains each section header name with the shdr.sh_name member
printf("Section header list:\n\n");
for(i = 1; i < ehdr->e_shnum; i++)
printf("%s: 0x%x\n", &StringTable[shdr[i].sh_name], shdr[i].sh_addr);
Finally, we print out each segment name, and address. Except for PT_INTERP we print the path to the dynamic linker.
printf("\nProgram header list\n\n");
for(i = 0; i < ehdr->e_phnum; i++) {
switch(phdr[i].p_type) {
case PT_LOAD:
/*
* We know that text segment starts
* at offset 0. And only one other
* possible loadable segment exists
* which is the data segment.
*/
if(phdr[i].p_offset == 0)
printf("Text segment: 0x%x\n", phdr[i].p_vaddr);
else
printf("Data segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_INTERP:
interp = strdup((char *)&mem[phdr[i].p_offset]);
printf("Interpreter: %s\n", interp);
break;
case PT_NOTE:
printf("Note segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_DYNAMIC:
printf("Dynamic segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_PHDR:
printf("Phdr segment: 0x%x\n", phdr[i].p_vaddr);
break;
}
}
Here is the entire program:
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <elf.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char **argv) {
int fd, i;
uint8_t *mem;
struct stat st;
char *StringTable, *interp;
Elf32_Ehdr *ehdr;
Elf32_Phdr *phdr;
Elf32_Shdr *shdr;
if(argc < 2) {
printf("Usage: %s <executable>\n", argv[0]);
exit(0);
}
if((fd = open(argv[1], O_RDONLY)) < 0) {
perror("open");
exit(EXIT_FAILURE);
}
if(fstat(fd, &st) < 0) {
perror("fstat");
exit(EXIT_FAILURE);
}
mem = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if(mem == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
ehdr = (Elf32_Ehdr *)mem;
phdr = (Elf32_Phdr *)&mem[ehdr->e_phoff];
shdr = (Elf32_Shdr *)&mem[ehdr->e_shoff];
if(mem[0] != 0x7f && strcmp(&mem[1], "ELF")) {
fprintf(stderr, "%s is not an ELF file\n", argv[1]);
exit(EXIT_FAILURE);
}
if(ehdr->e_type != ET_EXEC) {
fprintf(stderr, "%s is not an executable\n", argv[1]);
exit(EXIT_FAILURE);
}
printf("Program Entry point: 0x%x\n", ehdr->e_entry);
StringTable = &mem[shdr[ehdr->e_shstrndx].sh_offset];
printf("Section header list:\n\n");
for(i = 1; i < ehdr->e_shnum; i++)
printf("%s: 0x%x\n", &StringTable[shdr[i].sh_name], shdr[i].sh_addr);
printf("\nProgram header list\n\n");
for(i = 0; i < ehdr->e_phnum; i++) {
switch(phdr[i].p_type) {
case PT_LOAD:
if (phdr[i].p_offset == 0)
printf("Text segment: 0x%x\n", phdr[i].p_vaddr);
else
printf("Data segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_INTERP:
interp = strdup((char *)&mem[phdr[i].p_offset]);
printf("Interpreter: %s\n", interp);
break;
case PT_NOTE:
printf("Note segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_DYNAMIC:
printf("Dynamic segment: 0x%x\n", phdr[i].p_vaddr);
break;
case PT_PHDR:
printf("Phdr segment: 0x%x\n", phdr[i].p_vaddr);
break;
}
}
exit(0);
}
Conclusions
As you can see, being able to code an ELF parser gives us a lot of insight into the ELF format and how it is stored in memory. It doesn’t take that much C code to do it, either.
In my next article, we will be using the ptrace syscall to print the ARM registers of a program we trace. Here is a small sample:
/* The parent forks a child process to be traced */
if((pid = fork()) < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
/* Use ptrace to begin tracing the child process */
if(pid == 0) {
if(ptrace(PTRACE_TRACEME, pid, NULL, NULL) < 0) {
perror("PTRACE_TRACEME");
exit(EXIT_FAILURE);
}
execve(exec, args, envp);
exit(EXIT_FAILURE);
}