Continuing the discussion from [Malware Analysis] Case GBC-17_124: The dropper Part I:
The get_msg
Function
Our analysis is going pretty well. By now, we know that the get_msg
function is the one that gets the data from the network, data that is later dumped into a file. Time to rip it off:
get_msg: β
00400aad: 55 push rbp β
00400aae: 48 89 e5 mov rbp, rsp β
00400ab1: 53 push rbx β
00400ab2: 48 81 ec 58 04 00 00 sub rsp, 0x458 β
00400ab9: 89 bd ac fb ff ff mov dword ptr [rbp - 0x454], edi β; {ls_rbp-1108}
00400abf: 48 89 b5 a0 fb ff ff mov qword ptr [rbp - 0x460], rsi β; {ls_rbp-1120}
00400ac6: 64 48 8b 04 25 28 00 00 mov rax, qword ptr fs:[0x28] β
00400acf: 48 89 45 e8 mov qword ptr [rbp - 0x18], rax β; {ls_rbp-24}
00400ad3: 31 c0 xor eax, eax β
00400ad5: e8 c6 fe ff ff call <rand@plt> β; <rand@plt> 4009a0(.plt+0x140)
00400ada: 89 85 c4 fb ff ff mov dword ptr [rbp - 0x43c], eax β; {ls_rbp-1084}
We can see the typical stack arrangement common to normal C functions as well as the set of the canary in the stack to detect stack corruption. The function parameters are also stored in local variables at rbp - 0x454
for the first parameter and rbp-0x460
for the second.
Note: Remember that for Linux 64bits bits binaries, parameters are passed in registers as: func (RDI, RSI, RDX, RCX)
Note1: EDI
is actually the lower 32 bits part of the 64bits register RDI
After that, we find a call to rand
that, as you know, generates a random number. This first random number gets stored at rbp + 0x43c
.
00400ae0: 8b 85 c4 fb ff ff mov eax, dword ptr [rbp - 0x43c] β; {ls_rbp-1084}
00400ae6: 89 c7 mov edi, eax β
00400ae8: e8 f3 fd ff ff call <htonl@plt> β; <htonl@plt> 4008e0(.plt+0x80)
00400aed: 89 85 b8 fb ff ff mov dword ptr [rbp - 0x448], eax β; {ls_rbp-1096}
Then, that number is converted to network order and stored at local variable rbp-0x448
to be used later.
00400af3: 48 8b 05 de 15 20 00 mov rax, qword ptr [rip + 0x2015de] β; <http_req> 6020d8(.data+18) :
00400afa: 48 89 c7 mov rdi, rax β
00400afd: e8 7e fd ff ff call <strlen@plt> β; <strlen@plt> 400880(.plt+0x20)
00400b02: 48 89 c2 mov rdx, rax β
00400b05: 48 8b 0d cc 15 20 00 mov rcx, qword ptr [rip + 0x2015cc] β; <http_req> 6020d8(.data+18) :
00400b0c: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400b12: 48 89 ce mov rsi, rcx β
00400b15: 89 c7 mov edi, eax β
00400b17: e8 54 fd ff ff call <write@plt> β; <write@plt> 400870(.plt+0x10)
Next, the program calculates the length of the http_req
symbol and sends it through the network using write
. We use as first parameter to write
the local variable rbp - 0x454
(see how that value flows into RDI
). If you recall, rbp - 0x454
contains the first parameter passed to the function. You can go quickly back to the main
function and verify that the first parameter is actually the socket associated to the current server connection.
This is a good time to start naming variables using your preferred tool. As you progress filling in proper names for those rbp + XXX
the code becomes more and more readable. Most disassemblers provides this option, so does STAN (check func.var
command). I will not do that here so you can follow the paper even using objdump
:).
Also note the instruction pointer relative addressing to reference http_req
. This is common for global variables that lives in the .data
segment instead of in the stack. We already have a symbol for it and our preferred disassembler is showing it. Depending on the tool you may already get the content of the global var. For instance, radare2 will generate a symbol whose name is actually the content of the string (when the variable points to a string). Something like this:
0x00400b38 488b05911520. mov rax, qword [rip + 0x201591] ; [0x6020d0:8]=0x400fd8 str.HTTP_1.0_200_OK_
nServer:_ShadowHTTP_3.26__SECURED__nContent_type:_text_html__charset_UTF_8_nContent_Length:__1_n_nPNG LEA obj.http_hdr
; obj.http_hdr
With the current STAN version you have to manually dereference the pointer (this is going to change soon!). Something like this:
STAN] > mem.dump p 6020d8 1
+ Dumping 1 items from segment '.data'
0x6020d8: 0x401050 <_IO_stdin_used +128>
STAN] > mem.dump x 401050 100
+ Dumping 100 items from segment '.rodata'
| 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f |0123456789abcdef
----------+-------------------------------------------------+----------------
0x401050 : 47 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0a 55 |GET / HTTP/1.1.U
0x401060 : 73 65 72 2d 41 67 65 6e 74 3a 42 6c 61 63 6b 4e |ser-Agent:BlackN
0x401070 : 69 62 62 6c 65 73 20 35 2e 33 32 20 78 4f 53 20 |ibbles 5.32 xOS
0x401080 : 32 2e 31 0a 0a 00 49 6e 76 61 6c 69 64 20 6e 75 |2.1...Invalid nu
0x401090 : 6d 62 65 72 20 6f 66 20 70 61 72 61 6d 65 74 65 |mber of paramete
0x4010a0 : 72 73 0a 00 73 6f 63 6b 65 74 3a 00 63 6f 6e 6e |rs..socket:.conn
0x4010b0 : 65 63 74 3a |ect:
Anyway, the dropper has already sent an HTTP request to the remote server. Letβs see what happen next
00400b1c: 48 8d 8d b8 fb ff ff lea rcx, qword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400b23: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400b29: ba 04 00 00 00 mov edx, 4 β
00400b2e: 48 89 ce mov rsi, rcx β
00400b31: 89 c7 mov edi, eax β
00400b33: e8 38 fd ff ff call <write@plt> β; <write@plt> 400870(.plt+0x10)
Right after that, the dropper sends to the server the random value generated at the beginning of the function in network order (rbp - 0x448]
). Do you remember we have already seen this in the wireshark TCP Stream earlier on?
00400b38: 48 8b 05 91 15 20 00 mov rax, qword ptr [rip + 0x201591] β; <http_hdr> 6020d0(.data+10) :
00400b3f: 48 89 c7 mov rdi, rax β
00400b42: e8 39 fd ff ff call <strlen@plt> β; <strlen@plt> 400880(.plt+0x20)
00400b47: 48 89 c2 mov rdx, rax β
00400b4a: 48 8d 8d e0 fb ff ff lea rcx, qword ptr [rbp - 0x420] β; {ls_rbp-1056}
00400b51: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400b57: 48 89 ce mov rsi, rcx β
00400b5a: 89 c7 mov edi, eax β
00400b5c: e8 9f fd ff ff call <read@plt> β; <read@plt> 400900(.plt+0xa0)
And after that, it calculates the length of another global var (http_hdr
) and reads that amount of data from the server. So, the dropper is expecting a specific header from the server. I already know that this data is going to be dump so I would not comment more on this.
00400b61: 48 8d 8d b8 fb ff ff lea rcx, qword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400b68: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400b6e: ba 04 00 00 00 mov edx, 4 β
00400b73: 48 89 ce mov rsi, rcx β
00400b76: 89 c7 mov edi, eax β
00400b78: e8 83 fd ff ff call <read@plt> β; <read@plt> 400900(.plt+0xa0)
Now it reads an integer (4 bytes) from the network and stores it again on rbp - 0x448
β¦ uhmβ¦ letβs see what happens next.
00400b7d: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400b83: 89 c7 mov edi, eax β
00400b85: e8 06 fe ff ff call <ntohl@plt> β; <ntohl@plt> 400990(.plt+0x130)
00400b8a: 89 85 c8 fb ff ff mov dword ptr [rbp - 0x438], eax β; {ls_rbp-1080}
The value read from the network is converted to host format and stored at rbp - 0x438
. At this point we can assume that rbp - 0x448
is a kind of temporal variable and therefore not important for us. Letβs keep going.
00400b90: 8b 85 c4 fb ff ff mov eax, dword ptr [rbp - 0x43c] β; {ls_rbp-1084}
00400b96: 8b 95 c8 fb ff ff mov edx, dword ptr [rbp - 0x438] β; {ls_rbp-1080}
00400b9c: 31 d0 xor eax, edx β
00400b9e: 89 85 bc fb ff ff mov dword ptr [rbp - 0x444], eax β; {ls_rbp-1092}
Aha, a xor
. This is interesting. We are xoring the value we have just received from the server rbp - 0x438
with the random value we generated (either check the text above or, if you have renamed your local vars you will immediately see that). Letβs keep analysing the code but this looks like a good candidate for our encryption key.
rbp - 0x444 -> candidate_key
00400ba4: 48 8d 8d b8 fb ff ff lea rcx, qword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400bab: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400bb1: ba 04 00 00 00 mov edx, 4 β
00400bb6: 48 89 ce mov rsi, rcx β
00400bb9: 89 c7 mov edi, eax β
00400bbb: e8 40 fd ff ff call <read@plt> β; <read@plt> 400900(.plt+0xa0)
00400bc0: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400bc6: 89 c7 mov edi, eax β
00400bc8: e8 c3 fd ff ff call <ntohl@plt> β; <ntohl@plt> 400990(.plt+0x130)
00400bcd: 89 85 cc fb ff ff mov dword ptr [rbp - 0x434], eax β; {ls_rbp-1076}
00400bd3: 8b 85 cc fb ff ff mov eax, dword ptr [rbp - 0x434] β; {ls_rbp-1076}
00400bd9: 48 98 cdqe β
00400bdb: 48 89 c7 mov rdi, rax β
00400bde: e8 5d fd ff ff call <malloc@plt> β; <malloc@plt> 400940(.plt+0xe0)
00400be3: 48 89 85 d0 fb ff ff mov qword ptr [rbp - 0x430], rax β; {ls_rbp-1072}
00400bea: 8b 85 cc fb ff ff mov eax, dword ptr [rbp - 0x434] β; {ls_rbp-1076}
00400bf0: 89 85 b8 fb ff ff mov dword ptr [rbp - 0x448], eax β; {ls_rbp-1096}
00400bf6: eb 49 jmp <l7> β; 400c41(.text+0x281)
This is a big chunk but, by now, it should be very easy to interpret. This is what it does:
- Reads an integer from the network and stores it in our temporal var (
rbp - 0x448
) - Converts it to host representation
- Allocates a buffer with size equals to the value we have just received from the network!
Looking to the assignments at the end we can see that:
- The pointer returned by
malloc
is stored atrbp - 0x430
- The size of the buffer is copied at
rbp - 0x448
Then we find an unconditional jump (jmp
) that usually means we are about to start a loop.
l8: β
00400bf8: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400bfe: 89 c3 mov ebx, eax β
00400c00: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400c06: 48 63 d0 movsxd rdx, eax β
00400c09: 8b 85 cc fb ff ff mov eax, dword ptr [rbp - 0x434] β; {ls_rbp-1076}
00400c0f: 48 63 c8 movsxd rcx, eax β
00400c12: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400c18: 48 98 cdqe β
00400c1a: 48 29 c1 sub rcx, rax β
00400c1d: 48 8b 85 d0 fb ff ff mov rax, qword ptr [rbp - 0x430] β; {ls_rbp-1072}
00400c24: 48 01 c1 add rcx, rax β
00400c27: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400c2d: 48 89 ce mov rsi, rcx β
00400c30: 89 c7 mov edi, eax β
00400c32: e8 c9 fc ff ff call <read@plt> β; <read@plt> 400900(.plt+0xa0)
00400c37: 29 c3 sub ebx, eax β
00400c39: 89 d8 mov eax, ebx β
00400c3b: 89 85 b8 fb ff ff mov dword ptr [rbp - 0x448], eax β; {ls_rbp-1096}
l7: β
00400c41: 8b 85 b8 fb ff ff mov eax, dword ptr [rbp - 0x448] β; {ls_rbp-1096}
00400c47: 85 c0 test eax, eax β
00400c49: 75 ad jne <l8> β; 400bf8(.text+0x238)
Sure this is a loop using a loop index rbp - 0x448
. That was a copy of the size of the data we are going to receive from the net. Again, I will leave the detailed analysis of this loop as an exercise to the reader. The loop basically reads data from the network and stores it in the allocated buffer until all data has been processed.
00400c4b: 8b 85 ac fb ff ff mov eax, dword ptr [rbp - 0x454] β; {ls_rbp-1108}
00400c51: 89 c7 mov edi, eax β
00400c53: e8 98 fc ff ff call <close@plt> β; <close@plt> 4008f0(.plt+0x90)
Then the socket is closed⦠:o we have actually found a bug LoL.
The rest of the program is jut another loop. Again I will just give you some hints for you to reverse it. By now you should already be an expert on reversing loops :). If any of you experience real difficulties going through this code just post your question in the comments.
- The loop is XORing the buffer got from the network with the key we calculated and stored at
rbp - 0x444
- The key is an integer so the program stores a pointer to the key in
rbp- 0x428
to go through the int (4 bytes) byte by byte - All that funny
shr, and, add, sub
code⦠is the modulus 4 calculation⦠I think
With this info and some googling you should be able to make sense out of the code below.
00400c58: 48 8b 85 a0 fb ff ff mov rax, qword ptr [rbp - 0x460] β; {ls_rbp-1120}
00400c5f: 8b 95 cc fb ff ff mov edx, dword ptr [rbp - 0x434] β; {ls_rbp-1076}
00400c65: 89 10 mov dword ptr [rax], edx β
00400c67: 48 8d 85 bc fb ff ff lea rax, qword ptr [rbp - 0x444] β; {ls_rbp-1092}
00400c6e: 48 89 85 d8 fb ff ff mov qword ptr [rbp - 0x428], rax β; {ls_rbp-1064}
00400c75: c7 85 c0 fb ff ff 00 00 mov dword ptr [rbp - 0x440], 0 β; {ls_rbp-1088}
00400c7f: eb 56 jmp <l9> β; 400cd7(.text+0x317)
l10: β
00400c81: 8b 85 c0 fb ff ff mov eax, dword ptr [rbp - 0x440] β; {ls_rbp-1088}
00400c87: 48 63 d0 movsxd rdx, eax β
00400c8a: 48 8b 85 d0 fb ff ff mov rax, qword ptr [rbp - 0x430] β; {ls_rbp-1072}
00400c91: 48 8d 0c 02 lea rcx, qword ptr [rdx + rax] β
00400c95: 8b 85 c0 fb ff ff mov eax, dword ptr [rbp - 0x440] β; {ls_rbp-1088}
00400c9b: 48 63 d0 movsxd rdx, eax β
00400c9e: 48 8b 85 d0 fb ff ff mov rax, qword ptr [rbp - 0x430] β; {ls_rbp-1072}
00400ca5: 48 01 d0 add rax, rdx β
00400ca8: 0f b6 30 movzx esi, byte ptr [rax] β
00400cab: 8b 85 c0 fb ff ff mov eax, dword ptr [rbp - 0x440] β; {ls_rbp-1088}
00400cb1: 99 cdq β
00400cb2: c1 ea 1e shr edx, 0x1e β
00400cb5: 01 d0 add eax, edx β
00400cb7: 83 e0 03 and eax, 3 β
00400cba: 29 d0 sub eax, edx β
00400cbc: 48 63 d0 movsxd rdx, eax β
00400cbf: 48 8b 85 d8 fb ff ff mov rax, qword ptr [rbp - 0x428] β; {ls_rbp-1064}
00400cc6: 48 01 d0 add rax, rdx β
00400cc9: 0f b6 00 movzx eax, byte ptr [rax] β
00400ccc: 31 f0 xor eax, esi β
00400cce: 88 01 mov byte ptr [rcx], al β
00400cd0: 83 85 c0 fb ff ff 01 add dword ptr [rbp - 0x440], 1 β; {ls_rbp-1088}
l9: β
00400cd7: 8b 85 c0 fb ff ff mov eax, dword ptr [rbp - 0x440] β; {ls_rbp-1088}
00400cdd: 3b 85 cc fb ff ff cmp eax, dword ptr [rbp - 0x434] β; {ls_rbp-1076}
00400ce3: 7c 9c jl <l10> β; 400c81(.text+0x2c1)
Finally, the function prepares itself to return the pointer to the malloced
buffer (that now contains the decoded data received from the network), check the canary and clean the stack
00400ce5: 48 8b 85 d0 fb ff ff mov rax, qword ptr [rbp - 0x430] β; {ls_rbp-1072}
00400cec: 48 8b 5d e8 mov rbx, qword ptr [rbp - 0x18] β; {ls_rbp-24}
00400cf0: 64 48 33 1c 25 28 00 00 xor rbx, qword ptr fs:[0x28] β
00400cf9: 74 05 je <l11> β; 400d00(.text+0x340)
00400cfb: e8 90 fb ff ff call <__stack_chk_fail@plt> β; <__stack_chk_fail@plt> 400890(.plt+0x30)
l11: β
00400d00: 48 81 c4 58 04 00 00 add rsp, 0x458 β
00400d07: 5b pop rbx β
00400d08: 5d pop rbp β
00400d09: c3 ret β
Getting the Sample
So, what have we learn about the dropper after analysing it?
- The data is xor encoded
- The encoding key is an integer (4 bytes) calculated as the xor of a random number generated by the dropper and another random number received from the server.
- During the communication, we can see the local random number in the request to the server
- The server responds with 2 consecutive integers. The first integer is used to generate the key and the second integer is the size of the data that is going to be transferred
- The HTTP headers are not used at all and can be safely ignored.
With all this information letβs look again to our network dump using wireshark. First the request to the server, that shall contain the random number generated by the dropper
00000000 47 45 54 20 2f 20 48 54 54 50 2f 31 2e 31 0a 55 GET / HT TP/1.1.U
00000010 73 65 72 2d 41 67 65 6e 74 3a 42 6c 61 63 6b 4e ser-Agen t:BlackN
00000020 69 62 62 6c 65 73 20 35 2e 33 32 20 78 4f 53 20 ibbles 5 .32 xOS
00000030 32 2e 31 0a 0a 2.1..
00000035 6b 8b 45 67 k.Eg
So, the number we are interested on is: 0x6b8b4567
Note: We can figure out the length of the HTTP header dumping the http_req
symbol in the dropper binary, as we have seen previously. In general, HTTP headers finish with two black lines (β0a 0aβ in the dump).
Now letβs look at the server response
00000000 48 54 54 50 2f 31 2e 30 20 32 30 30 20 4f 4b 0a HTTP/1.0 200 OK.
00000010 53 65 72 76 65 72 3a 20 53 68 61 64 6f 77 48 54 Server: ShadowHT
00000020 54 50 2f 33 2e 32 36 20 5b 53 45 43 55 52 45 44 TP/3.26 [SECURED
00000030 5d 0a 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 ].Conten t-type:
00000040 74 65 78 74 2f 68 74 6d 6c 3b 20 63 68 61 72 73 text/htm l; chars
00000050 65 74 3d 55 54 46 2d 38 0a 43 6f 6e 74 65 6e 74 et=UTF-8 .Content
00000060 2d 4c 65 6e 67 74 68 3a 20 2d 31 0a 0a 50 4e 47 -Length: -1..PNG
00000070 74 29 86 72 t).r
00000074 00 20 38 00 6a 86 ee 59 17 c2 a3 1f 15 c3 a2 1f . 8.j..Y ........
(...)
The key number is just after the PNG
string : 0x74298672
and the file size is next: 0x00203800
.
Now we can calculate the key:
0x6b8b4567 ^ 0x74298672 = 0x1fa2c315
Finally, you need to dump the data and write a small program to decode the data. I have already told you how to dump the data from wireshark into a file and how to skip a header. Now you know that you have to also skip the PNG
string and the 2 integers sent by the server. Just check the new offset and dump the thing.
$ dd if=png.dump of=sample bs=1 skip=$((0x78))
Done, we just need a program to decode the file and we are done. This is my quick and dirty xor
decoder in C. You can use any language you want.
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char *argv[])
{
unsigned char *p, *k;
int len;
int key = atoi (argv[2]);
int i;
FILE *f;
f = fopen (argv[1], "rb");
fseek (f, 0L, SEEK_END);
len = ftell (f);
fseek (f, 0L, SEEK_SET);
printf ("File size is: %d\n", len);
printf ("key %x\n", key);
p = malloc (len);
fread (p, 1, len, f);
fclose (f);
k = &key;
for (i = 0; i < len; i++) p[i] ^= k[i % 4];
f = fopen (argv[1], "wb");
fwrite (p, 1, len, f);
fclose (f);
And, after all this, we can finally recover our sample!
$ cp sample sample.1
$ ./xor1 sample.1 530760469
File size is: 2111488
key 1fa2c315
$ file sample.1
sample.1: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, BuildID[sha1]=0x0ec065afe8fd78ff18381dc09b0be805364c3c95, not stripped
Next
Believe it or no, you have got the first flag⦠just run strings sample.1| grep CTF
Awesomeβ¦ you have successfully analysed a malware dropper. Now, you have to start again and analyse the actual malware sample. I hope it would be a lot easier now that you can make good use of all the experience you have gained analysing the dropper. Actually you will find a lot of familiar code in the sample what should make your analysis faster and easier. Anyhow, Iβll wait a couple of weeks for you to get the flags and before release Part IIβ¦ Maybe some of you would write Part II for me