Zero2Auto – Deep Dive into Netwalker
Preface
Recently, I’ve joined @VK and @0verflows advanced malware analysis course called “Zero2Auto”. The first lesson was about algorithms in malware; compression, hashing and encryption. The first lesson was supplied with a PDF which is now released as a post by Vitaly based on another post about the Netwalker sample. I was thinking on how I could practice this lesson, and I concluded that a simple thing I could do is expand upon these two posts as they were not detailed enough for most beginner reverse engineers. My main goal would be to prove the assumed findings in Vitaly’s post by expanding on the mechanisms detailed within it and to detail new findings that might relate to the lessons subject which is how to recognize known encoding algorithms and automate them.
Required prior knowledge:
- X86 Assembly
- IDA Pro and x64dbg Knowledge
- Basic Powershell skills
Dealing with the first stage PowerShell
The first thing we’ll do is download the sample referenced in both blog posts mentioned above:
SHA-256 hash of the sample: f4656a9af30e98ed2103194f798fa00fd1686618e3e62fba6b15c9959135b7be
This is a very long and obfuscated PowerShell script, it’s so long that I couldn’t even load it into my VM’s PowerShell ISE without it crashing, so I decided that I can circumvent this by loading the script into Sublime(which is the best text editor on earth).
The script might be long and scary but do not fear, for all we need to do is examine the first line of the code:
The first command “Invoke Expression” will simply run the command wrapped inside “$()”, within this statement we can see that the statement will perform base64 decoding. So to decode the first stage obfuscation, we can simply remove the Invoke expression command and pipe the entire decoded string to a text file using “| Out-File -FilePath .\Process.txt” and this would result in the following decode payload. The second stage payload is nothing but the same scary mess, but this time a long bytearray is being decrypted within a loop, it simply XORs each byte within that array with 0x47.
Again, all I decide to do is to pipe the final product to a text file:
We reach the second stage payload, which contains 2 long bytearrays – which as stated within the blog posts are two DLL files representing x86 and x64 versions of the malware DLL that would be loaded into the memory of explorer.exe. I’m merely interested in the bytearray representing the x86 DLL. Using sublime text I can click cntrl+shift+l and click on the last line of the first byte array – I’ll copy this bytearray to another script file, then I’ll simply invoke a PowerShell command to write this byte array to a file. It’s worth noting that for some reason sublime text appended newline characters at the last character in each line so running this script wont work unless you replace all new line characters within the script.
Reversing the Netwalker x86 DLL:
Let’s throw the file in PEBear and see if we can find anything interesting:
What is this? Ah yes, do not worry the malware author corrupted the PE File header and replaced the “MZ” characters with the header with a word value 0xDEAD (remember this).
What I’ll do, is replace this value using HxD to a proper PE header so we could examine it within IDA and Resource Hacker:
Much better!
Usually, upon reaching this point I would perform basic static analysis by examining the file’s strings, view any anomalies within its header and examine its resources. Then I would perform basic dynamic analysis by running the sample and monitor it, but we must remember our initial goal – we must expand upon Vitaly’s findings and find any worthwhile material we can explore ourselves. So first let’s examine Vitaly’s first mention of CRC32:
How did Vitaly know this is indeed a CRC32 hashing algorithm? Well lets start by utilizing the KANAL plugin within PEid, I’ll load the malware into PEid and launch the KANAL tool:
As we can see, KANAL recognizes that there is a reference for the CRC32 algorithm within a lot of locations but what exactly did it find there? Let’s jump to 0x1000424F
What is this constant? Let’s google it:
Aha, alright – even if one would view how crc2 checksum is calculated one could quickly see the recognizable division flow at 0x1000421C.
When dynamically analyzing the file, at location 0x10001A59 one can see that the value 0x3e006b7a is resolved as FindResourceA.
How NetWalker utilizes PE Header stomping to break analysis
Let’s examine the following assumption made in Vitaly’s blog:
At location 0x1000A0B0 one can find the API resolving function:
So, I assumed Vitaly is referencing the content with sub_1003710
And indeed, he was, how ever by simply breaking on this location and running the sample it would crash. I decided to attempt to understand why this was happening. First attempted to skip the call at 0x1000371E which I renamed to func_checkValidHeader but I this function crashed the sample every time with a access violation exception, so lets take a look at it.
First, it loads the offset of the current function into EAX and ANDs it with 0xFFFF000. It would then begin to iterate through a loop, subtracting 0x800 from EAX and attempting to locate the value 0xDEAD within the address referenced in EAX. Sounds familiar? Sure does – as we recall the DLL PE header was stomped with 0xDEAD, this code routine is attempting to validate that no one tampered with the sample.
Since I modified the header, the sample would get stuck in an infinite loop until EAX would point to an invalid memory location resulting an access violation exception – to add to this finding if we go to address 0x1000372D the sample attempts to fix the stomped header using the value returned by func_checkValidHeader which should point to the base address of the DLL. It would then replace 0xDEAD with “MZ” thus fixing the header.
To quickly solve this issue, I just patched the binary by removing the header patcher and that solved the problem.
We can indeed verify Vitay’s findings after this as the sample doesn’t crash. First the sample loads the resource, locks it.
The malware then loads the resources size and allocates a buffer within the heap to load the resource into it using memcpy.
The malware then loads the key length and the key itself and saves it to a stack variable
Copied key:
Afterwards the key value, size and a pointer to the heap buffer containing the resource are saved and pushed into a function I renamed func_RC4Decrypt:
Vitaly assumes this within the blog:
If one followed the lesson in the course carefully one knows that one of the recognizable features of RC4 KSA is a loop flow iterating 256 times:
and if we examine the function located at 0x10009210 we can confirm an example for this at 0x10009281 and at 0x100092CF:
EBP is being loaded with 256 as a preparation of the second KSA iteration:
Decryption process as seen within the debugger:
Size(red), key(blue), resource(rest)
After the function decryption is finished:
Finally, at location 0x10003832 the sample restores the Netwalker header back to 0xDEAD.
I wonder if 0xDEAD is a cross binary constant across all Netwalker samples
Sources: