Understanding a Win32 Virus: Background Material

dtm · September 3, 2016, 4:10am

In my previous article, A Simple Demonstration on Malware Analysis, we dissected an infected program to locate and extract the file infector. To follow up from before, I said that we will be reversing the infector to find out what mechanisms have been included to allow it to perform its task, but instead of going through the raw assembly from what we gathered, we will be looking directly at the source code (which is still assembly) as I’ve yet to learn and recognize a variety of different code constructs and algorithms and I won’t pretend that I know everything. It’s also much easier on the eyes of a beginner due to commenting. Again, the original source code can be found at Rohitab - [NASM] Simple Win32 Virus and there is also a corresponding tutorial on a majority of its non-viral code at Rohitab - [Quick tutorial] Finding Kernel32 Base and walking its export table. but I feel that it’s a bit lacking in detail or that some of the information is a bit confusing so I’ll try to deliver the explanation in a way that I feel is more satisfactory.

Who Can Benefit From This?

Obviously malware authors or analysts can benefit from the content in this article but also those interested in shellcode and exploit development may find this interesting. Perhaps it may give an insight into how executables and DLLs interact with each other or possibly gain something about the memory or processes within Windows.

Pre-requisites for this material:

Knowledge of the WinAPI
Knowledge of the PE file format
Knowledge of Windows memory
Knowledge of Windows processes
Basic knowledge of x86 Intel assembly

If you do not have such pre-requisites, I will attempt to explain the content as clearly as possible but if here is any confusion, do not hesitate to leave a question in the comments. I will try to answer them as well as I can (if I can).

Disclaimer: The background content may be long and probably boring (like most theory) but it is necessary in understanding the inner workings of this virus. I can’t force you to read it but it would help a lot if you don’t already know how things work in low level Windows.

1. Dynamic Link Libraries and Exported Functions
 (i)   What is a Dynamic Link Library?
 (ii)  Exported Functions Example
 (iii) Export Table
 (iv)  Kernel32.dll
 (v)   Dynamically Loading DLLs
2. The Process Environment Block
 (i)   What is the Process Environment Block?
 (ii)  Motivation
 (iii) Retrieving kernel32 Module's Base Address
 (iv)  Locating the Export Table
 (v)   Obtaining Exported Functions

Feel free to skip ahead if you are already familiar with anything.

Dynamic Link Libraries and Exported Functions

What is a Dynamic Link Library?

Dynamic Link Libraries, or DLLs for short, are objects which are described, by MSDN - What is a DLL?, to “promote modularization of code, code reuse, efficient memory usage, and reduced disk space. Therefore, the operating system and the programs load faster, run faster, and take less disk space on the computer.” What this implies is that DLLs provide (usually a (large) collection of) exported functions with which programs can import for use.

Imagine designing a function for a program that you are writing, but instead of having that function in that program, you relocate it to a separate file, a DLL, such that your program depends on the DLL file to provide the functionality to it. For example, an analogy would be moving a program off your computer’s hard drive and then placing it inside an external hard drive. Now when you want to run the program, you will have to depend on your hard drive to provide that program to you, that is, you will need to locate to your hard drive’s storage, find the program and then execute it (while it’s still there). Following this analogy, imported and exported functions basically mean that your computer has the location of where the program is stored and that your hard drive exposes the program to the outside world for usage. Naturally, whenever the program needs to use a function exported by a DLL, it must exist and be loaded into its memory space for it to work, let alone be loaded into memory and executed properly.

Exported Functions Example

A DLL actually follows the same format as executables, i.e. it has a PE file format, and can also be developed in the same manner. For example, if I wanted to export a function to add two numbers, it can be written like so:

// MyDLL.c

#define DllExport __declspec(dllexport)

DllExport int AddTwoNumbers(int a, int b) {
    return a + b;
}

Let’s examine what this DLL’s exported function looks like in PEview.

It looks almost exactly the same as an executable file, with the exception that it has DLL values. What we’re mainly interested here is the export table as highlighted at the bottom. We’ll follow the value in the data column to RVA 0x00002420, or VA 0x10002420 (RVA + ImageBase).

Export Table

The IMAGE_EXPORT_DIRECTORY struct is defined like so:

typedef struct _IMAGE_EXPORT_DIRECTORY {
    DWORD Charcteristics;           // offset 0; size 4
    DWORD TimeDateStamp;            // offset 4; size 4
    WORD MajorVersion;              // offset 8; size 2
    WORD MinorVersion;              // offset 10; size 2
    DWORD Name;                     // offset 12; size 4
    DWORD Base;                     // offset 16; size 4
    DWORD NumberOfFunctions;        // offset 20; size 4
    DWORD NumberOfNames;            // offset 24; size 4
    DWORD AddressOfFunctions;       // offset 28; size 4
    DWORD AddressOfNames;           // offset 32; size 4
    DWORD AddressOfNameOrdinals;    // offset 36; size 4
} IMAGE_EXPORT_DIRECTORY, *PIMAGE_EXPORT_DIRECTORY;

Our DLL’s export table looks like this:

The highlighted is what we are interested in looking at so let’s start with the first one. The Address Table or the AddressOfFunctions member, contains the address of a list of where the exported functions exist in memory. Let’s follow the address and see what we have.

So we can see in the Data column that our function AddTwoNumbers lies at RVA 0x00001000, or VA 0x10001000. Let’s check with OllyDbg.

Our DLL was loaded into the address starting with base 0x6C730000 (left) and now we need to navigate to RVA 0x00001000 or VA 0x6C731000 (right) and we can see our disassembled AddTwoNumbers at the top:

; AddTwoNumbers function
push    ebp             ; create new function stack
mov     ebp, esp        ; adjust stack pointer
mov     eax, [ebp+8]    ; eax = a (compiled with cdecl)
add     eax, [ebp+C]    ; eax += b
pop     ebp             ; destroy function stack
ret                     ; return a + b (in eax)

Let’s now have a look at the AddressOfNames member which contains the address of the list of function names at 0x0000244C:

Just our one function here. Nothing special. And finally, the AddressOfNameOrdinals is simply:

…with an ordinal number of 0x0000. Most of the time, importing functions are done by name but it is possible to import by ordinal number. Note how the ordinal numbers start at 0.

Kernel32.dll

Now that we know a bit about DLLs using the simple example above, let’s take a quick look at a system DLL. The kernel32 module is a special DLL, in fact, it’s so special that it is loaded with every executable, even if it has no imported functions from it. Let’s define a bare minimum executable:

; empty.asm

global _main

section .text
_main:
    ret

And let’s view it in PEview:

So it does have an import table, but let’s check the data inside.

Nothing. No data, just an empty table. Let’s now examine its modules in memory with OllyDbg.

As I’ve previously stated, kernel32.dll has been loaded into memory at base address 0x75F40000. Why is it loaded every time? I actually don’t know… But if you do know, please do share! Anyway, let’s perform the same analysis on this DLL as we did with the previous one. Let’s open PEview again and locate the export table offset.

Export table:

This time, we’re only interested in the AddressOfNames and AddressOfNameOrdinals members. Here is the list of function names:

Here is the list of ordinals:

Dynamically Loading DLLs

Besides having the Windows loader (the applications which load and set up executables in memory) supply the DLLs on execution, a program is able to dynamically load a DLL, i.e. during runtime with a special function called LoadLibrary. This function takes in a single file name parameter to locate the module to be loaded and mapped into the executable’s address space. Often combined with GetProcAddress, the program is capable of retrieving functions from the loaded module for use. Both of these functions are provided directly by the kernel32 module which means that a program can potentially use any function from any DLL whenever it wants to, even if the DLL was not initially mapped by the Windows loader.

Great! You’ve survived a crash course for DLLs and their exported functions! Let’s combine what we’ve learned with the next topic.

The Process Environment Block

What is the Process Environment Block?

The Process Environment Block, or PEB for short, is a “data structure that is used by the operating system internally, most of whose fields are not intended for use by anything other than the operating system” and contains information about the process in question. How do we access the PEB? It’s located at fs:[30h]. The PEB struct is defined by Microsoft as so:

typedef struct _PEB {
    BYTE Reserved[2];      // offset 0; size 2
    BYTE BeingDebugged;    // offset 2; size 1
    BYTE Reserved2[1];     // offset 3; size 1
    PVOID Reserved3[2];    // offset 4; size 8
    PPEB_LDR_DATA Ldr;     // offset 12
    // unnecessary members omitted
} PEB, *PPEB;

Motivation

You may be wondering what this has to do with anything so let me give some motivation. We’ve established that the kernel32 module is always loaded into every process and because of this, it’s a common feature across every loaded executable. Using the two functions mentioned previously to be able to load any function from any DLL on a system, a malicious piece of code will be able to utilize this method to gain maximum power with minimal dependencies, that is, it is purely standalone and independent of anything yet it can still access its tools to perform its operation successfully. Through the PEB, the malware will be able to programmatically locate the kernel32 module to get what it needs. Let’s find out how we can do this.

Retrieving `kernel32` Module’s Base Address

We’re interested in the Ldr member of this struct as it “contains information about the loaded modules for the process.” Let’s follow and examine the PPEB_LDR_DATA struct.

typedef struct _PEB_LDR_DATA {
    BYTE Reserved1[8];                     // offset 0; size 8
    PVOID Reserved2[3];                    // offset 8; size 12
    LIST_ENTRY InMemoryOrderModuleList;    // offset 20
} PEB_LDR_DATA, *PPEB_LDR_DATA;

The InMemoryOrderModuleList is a struct of a doubly linked list of the modules defined as so:

typedef struct _LIST_ENTRY {
    struct _LIST_ENTRY *Flink;    // points to next module
    struct _LIST_ENTRY *Blink;    // points to previous module
} LIST_ENTRY, *PLIST_ENTRY, *RESTRICTED_POINTER PRLIST_ENTRY;

The kernel32 module is located as the third node (first being the executable itself, second being ntdll) in the list so we will have to enumerate InMemoryOrderModuleList->Flink two times. After doing this, we will need to look at one final structure to retrieve the base address of the module: LDR_DATA_TABLE_ENTRY.

typedef struct _LDR_DATA_TABLE_ENTRY {
    // unnecessary members omitted
    LIST_ENTRY InMemoryOrderLinks;    // offset 0; size 8
    PVOID Reserved2[2];               // offset 8; size 8
    PVOID DllBase;                    // offset 16
    // unnecessary members omitted
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

MSDN states that “each [InMemoryOrderModuleList] item in the list is a pointer to an LDR_DATA_TABLE_ENTRY structure” (above). That means to get to the DllBase member, we add 16 bytes to the third InMemoryOrderModuleList->Flink item.

Locating the Export Table

Now that we’ve got the base address, we need to find the export table where all the information on functions are stored. To do so, we’ll need to know where the IMAGE_NT_HEADERS structure lives. The first structure of the PE file is the IMAGE_DOS_HEADER and is defined as such:

typedef struct _IMAGE_DOS_HEADER {
    WORD e_magic;     // 'MZ'
    // unnecessary members omitted
    LONG e_lfanew;    // offset 60; contains offset to IMAGE_NT_HEADERS
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

It contains the offset to the IMAGE_NT_HEADERS, AKA the PE header which is where the export table can be found, i.e. at offset 78 bytes from the header.

Obtaining Exported Functions

What we can do now is get the offset of the AddressOfNames, enumerate it until we get a match with the LoadLibrary and GetProcAddress strings, retrieve the ordinal number with the AddressOfNameOrdinals (since the two tables are exactly the same, the offsets are the same) and with that, we can obtain the function address with the AddressOfFunctions by using the ordinal number multiplied by four for the correct offset as the addresses are 4 bytes each. It would go something like this as a C representation:

// get function name offset
int i = 0;
for (i = 0; strcmp(szFunctionName, "GetProcAddress") != 0; i++) {
    CHAR *szFunctionName = ExportTable.AddressOfNames[i];
}
// get ordinal using i as an offset
// ExportTable.AddressOfNameOrdinals[0] + i*2;
int ordinal = ExportTable.AddressOfNameOrdinals[i];
// get address of function using the ordinal as an offset
// ExportTable.AddressOfFunctions[0] + ordinal*4;
LPVOID FunctionAddress = ExportTable.AddressOfFunctions[ordinal];

Once we have GetProcAddress, we can find LoadLibrary in kernel32 (without repeating the above process) and then have access to all of the WinAPI functions.

Okay, we’ve covered a fair bit so now we are able to start the main course. I’ve separated the code analysis since it was too much to fit into a single article so join me over in the next segment.

– dtm

spylegion · September 3, 2016, 6:26pm

Great thread buddy, keep it up

root_haxor · September 3, 2016, 6:34pm

Make Video it will be easy to learn

0x00pf · September 3, 2016, 7:23pm

Great post mate… I almost missed the last part. Just realized you updated it but chance.

Congrats!

dtm · September 4, 2016, 4:52am

Thanks. If you’re still interested, I’m not finished yet. Just publishing the additions in case I accidentally lose them again.

0x00pf · September 4, 2016, 8:19am

Sure. I’m looking forward to the complete analysis of the virus.

Update: Sorry about the lapsus.

dtm · September 4, 2016, 10:14am

What video???

dtm · January 21, 2018, 12:42am

This topic was automatically closed after 30 days. New replies are no longer allowed.