A Simple Demonstration on Malware Analysis

dtm · August 29, 2016, 11:33am

What’s good, peeps? Initially, I was going to do a write up on an approach to programming an executable file infector (AKA a virus) however, I thought that it would be more interesting if we reversed and analyzed one. Throughout my journey with malware, I’ve never been quite able to grasp the mechanism in which parasitic file infectors employ - some reasons being that executables require the parsing of the import table for calling WinAPI functions, or perhaps the task of writing an executable which was entirely independent and was able to write its own entire self into another such that it would not destructively disrupt its flow of execution. Of course, as my journey continued, pieces of the puzzle were coming together, for example, learning and understanding the PE file format and (N)ASM ((Netwide) Assembler) enabled me to make sense of such intricacies which will be explained further into this example.

Pre-requisites for this material:

Knowledge of the WinAPI
Knowledge of the PE file format (optional)
Knowledge of Windows memory (optional)
Basic knowledge of C/C++
Basic knowledge of x86 Intel Assembly

If you do not have such pre-requisites, I will attempt to explain the content as clearly as possible but if there is any confusion, do not hesitate to leave a question in the comments. I will try to answer them as well as I can (if I can).

What is a Virus?

The most basic (and only) requirement for a virus is that it must be able to replicate. Special programs which can replicate itself without human intervention are called worms, otherwise, we identify them as a virus. Most people who come across malware will instantly associate it as a virus, however, not all malware are viruses and also, not all viruses may be malware (not malicious) but the main purpose of such a design is for malicious intent (at least it is in the modern day).

There are many designs of viruses as they have evolved throughout the age of computers. Evolution which is necessary in nature purely because of the need to survive and flourish, as we would discover with biological viruses or rather, anything biological. The constant battle between virus authors and the antivirus industry has lead to the continuous engineering and reverse engineering of such software, resulting in different techniques to obscure or hide from their analyst predators while wreaking havoc to its prey as effectively as possible. Some examples of infection methods include overwriting entire victim files, appending and prepending, writing into code caves and EPO (entry-point obscuring) which are used, for example, to prevent damaging the host file, prevent detection, prevent disinfection or all of these mentioned (and possibly more). For more information, please refer to this paper by ir3t: Introduction to Various File Infection Techniques.

Malware Analysis

Quick Dynamic Analysis

Let’s finally begin analyzing the virus which I have obtained. I will not be starting off with the original virus program, instead, I will be using an already-infected file from which I will extract the viral code.

First things first, let’s run the infected file so we can gather some quick intel on what or how the virus impacts the program. If you’re doing this, remember to do so only in a protected and controlled environment.

Before running the infected file, we can already see a difference between the original and the infected simply by observing the file size difference. This is already an issue for the author however, it was not programmed for the purpose of stealth so we will let it slide. Let’s continue to running the infected program.

Here, we see the console for the application and then we are instantly greeted with a message box from the author and…

…now we see that our other executable is also infected, again, through the observation of an altered file size. Pressing OK on the message box to continue execution of our infected file gives us the rest of the program.

File Analysis

What we can do now is perform static analysis on the infected file. Notice that the infection occurred before the main execution of the program. From this, we can conclude that the virus code must have preceded the main function somehow (obviously). How can such a thing happen? Well, if you’ve read over my PE File Infection paper, there is a simple way to do this. If you don’t remember or you haven’t read it, the method is simply just modifying the PE header’s AddressOfEntryPoint value listed in IMAGE_NT_HEADERS.IMAGE_OPTIONAL_HEADER.AddressOfEntryPoint. Microsoft defines the IMAGE_NT_HEADERS struct like so:

typedef struct _IMAGE_NT_HEADERS {
  DWORD                 Signature;
  IMAGE_FILE_HEADER     FileHeader;
  IMAGE_OPTIONAL_HEADER OptionalHeader;   <---
} IMAGE_NT_HEADERS, *PIMAGE_NT_HEADERS;

and the IMAGE_OPTIONAL_HEADER struct:

typedef struct _IMAGE_OPTIONAL_HEADER {
  // unnecessary members omitted
  DWORD                AddressOfEntryPoint;   <--- this is the offset from the base
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;   <--- we need this as a base address
  // unnecessary members omitted
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER, *PIMAGE_OPTIONAL_HEADER;

The modified AddressOfEntryPoint defines the address of where the program will begin execution so we may find the start of the virus code in the infected file instead. Let’s open up the infected file in PEview and see what we can find.

We can see in the relevant members of the structs that the AddressOfEntryPoint is 0x00010020 and ImageBase is 0x04000000, so by adding these two values, we get the address of memory where the virus code may be located. Note that the values on the very left labeled under VA are the Virtual Addresses, i.e. the addresses in memory, not as a file. With these values, we will navigate to 0x00410020 for further investigation.

Usually, under normal circumstances, the value should exist somewhere within the .text or .code section but we find that the address is in the .tls (Thread Local Storage) section instead which may mean that it could be exploiting a multithreading procedure, but from what we have gathered in the initial run of the infected file, we had to process the message box first before the main function executed (message boxes are synchronous objects). So we can rule out such a possibility for now and probably assume that the virus just infects the last section of its host. We will be able to confirm our assumptions when we analyze the virus code itself. Currently, we cannot see anything obvious that might hint that this is the virus code but let’s see all of it before we jump to any conclusions.

Scrolling down to the bottom, we can see the strings which were found in the greeting of the message box which means that this area of binary may very well be the virus code itself. Let’s continue onto a deeper level of analysis by using a disassembler to try to translate the binary form into a more readable assembly form.

Disassembly

Opening up our infected file in OllyDbg, we are instantly greeted with information telling us that the AddressOfEntryPoint contains a value outside of the code segment. Thanks for that, but we’ve already established this! Let’s navigate to the Memory Map window (left) and open up the .tls section to examine the disassembly as a memory dump (right).

We see that the disassembler has incorrectly aligned the bytes at the address we want at 0x00410020 so it has also obscured the disassembled mnemonics on the right. So let’s fix this by right-clicking and selecting Go to address and then entering in what we require.

A quick inspection shows that this is proper code. Notice that the instructions are all simple and proper (mov, add, push, call, lea) and the numbers are nice and small, nothing incredibly large and obscure. Also, it’s popular to use the pushad instruction to save the state of the program (by pushing all the registers) to then later call popad to recover it. In between these are possibly the instructions of either malicious code or some sort of (de)compression or de/encryption.

So the sensible thing to do here is to upload this to an antivirus scanner and we could try to dump the binary into a file here but there is a slight problem. This is just raw data with no (PE file format) structure whatsoever therefore it would not be detected as anything malicious (trust me, I’ve tried it). Instead, what we will have to reconstruct the program by using the hexadecimal values. First, highlight all the instructions within the memory dump, right-click, select Binary and then Binary copy. We’ll paste this into a text editor and then trim some junk bytes from the top and bottom.

We’ll then need to transform this into a string of hexadecimal data so we can place it into an array in a C program. Using the Linux command line, cat dump.bin | tr '\n' ' ' | sed 's/ /\\x/g/' | sed 's/\\x$//g' >dump.bin (that’s right, this Windows kid knows how to use the Linux command line) and then manually add a \x at the start of the string. Now we can whip up a magical C program to execute the shellcode and hence replicate the virus.

unsigned char VirusShellcode[] = "\x60\xE8\x00\x00\x00\x00\x5D\x81\xED ...";

int main(void) {
    // declare function pointer
    void (*pVirusPtr)(void);
    // point to the address of VirusShellcode
    pVirusPtr = VirusShellcode;

    // execute VirusShellcode
    pVirusPtr();

    return 0;
}

Compile it and then we can now upload it to a scanner such as NoDistribute or Majyx which both show the same result anyway (AFAIK they both, if not all non-distributing scanners, use the same engine): detection rate 1/35; Kaspersky Antivirus: HEUR:Virus.Win32.Infector. Note that since this is not the original virus, it may have different results. Just to show that it works:

But hold on a second… Does this mean we can do this with any program? Can we just extract the code section of X program, place it into a buffer of a C program and execute it like we did above? Well… the answer is no. No we can’t. There’s a bit more to it than that, I’m afraid but don’t worry, if you’re itching to find out why it works in this scenario and why we can’t emulate it for any other typical program, I will explain it to you (just not here).

Conclusion

Now that we’ve identified the threat, disinfection is probably trivial here. Removing the binary starting from 0x00410020 and then fixing the .tls section header to match the changed size should prove to be sufficient. That’s great and all but what I’m more interested in is how this virus achieves its goal however, since this article was purposed for a malware analysis demonstration, I will end it here for now and do a separate one to detail a reverse engineering.

Hope you’ve learned something from this!

– dtm

P.S. I almost forgot, here is the source code to the virus: Rohitab - [NASM] Simple Win32 Virus. Enjoy!

afiskon · August 29, 2016, 12:00pm

Nice article. Thanks for a good reading!

0x00pf · August 29, 2016, 4:04pm

Great post mate! Awesome

dtm · January 21, 2018, 12:42am

This topic was automatically closed after 30 days. New replies are no longer allowed.