Introducing STAN. A simple tool for RE beginners

0x00pf · June 21, 2017, 10:23pm

Those of you that are following me on twitter may have heard about STAN. This is my pet project to learn about reverse engineering. It was born as an experiment with the capstone disassembly framework (http://www.capstone-engine.org/), and it evolved in something usable for simple projects.

If you want to use it for your practice, or you want to take a look to the code, or even you want to extend it for your own needs… just grab the code here:

STAN is an early alpha phase and you may expect crashes and some misbehaviour. In general, for simple challenges it should work fine, but… as I said, it is just a pet project. So far it only works with GNU/Linux binaries and with ELF format for x86 and ARM (only 32bits for ARM).

A Simple Crackme for Testing

So, instead of writing a boring tutorial on the different options. I will just solve a very simple challenge.

Here it is:

#!/bin/sh

cat << EOM | base64 -d | gunzip > /tmp/specimen
H4sIAG7iSlkAA+1Ya0wUVxS+sw92FJhd7QtTa9YU41J1ZRVboIgMLOxdu7SgoNYXLu6qJMIm7Iyi
qVU7Yh3GrSaNib/80bRJY1LDD2MpMTC4ysMftlBbSU1rNNUurq34qFAfTM+dnaWwSRN/9F/3JHfO
Pd+e75zv3rlMmNlb6inTURSKmw4VIhIdporUuEjDD1nGUwDLRQa4TkG0mmtEE61okh9OQZM8Qlb1
SngqFK9r2TTJp2nwmvG6MZ5BG1ZNslXTGfeZWnbcGzRfeZPzmdDzW1wW4b8IQw/D9W41OtP0w9O5
Vd/UXhrRfXlyY97vVPmaL/D+OzTktadCTkeLsQh1EnEOOdIKDou32kkVhwLAcQCUXiz+ho93Gwwo
kqXm4f3nCT9cA7mrcahqjMbiS0HGiiKn1AI9kQLwDys7dkLtyHEdQs0yP62dbHvH3ni76LR2orWj
eUL/bHBCr7I23FpwGpm/lt3iRbMRL/1AUZQd5pHBisjLkPC+SGNICreioSvwQys19B24tgy4EWEs
Ga+lW0HCGBYLcY+LJlsCXt2dHleGGoouW6cOkKEjoKwaizfZ1W7xCruKrWar3OLllVh8gIU7NnHP
WbewDJWH9tL8o3JxzCOOYsmCmwe4V7DweAafiqUiGsPKlkRTsCBbotcdjzqPwWpgISVQGYf4bla8
0D4VesKWlINIp3g3Mlv9qbQbS6DE6AC1bskEZbjpWBidzae4pdV0lHYLPZboLccAtMM9MkKbFGwu
PQexaFwEFIfslhjc3MdlYuEvPT/FLa0iUqLpEBpJWAbhABQTui3Ry6T6MiI839GHxR+xcC5X2N1r
4l/HUpkFKvamwZZJZVaYXohNbTD9CaZRGitdOK+L7/VIxj4A3PtHyeE1f6zeacl4jGB5svmjExC7
QxWrsLRkF6khdmHhxjDOuuwJeTLNbuH6sEf89sHnHsCzuhzyg5PvhAooLC5dD8mR0jFF6fyMHJZp
cBKI3IyOryAkR6kRqYj5mGw+I0d1Dlk0jqYSkXrQWKi2Ms4lTpFbKpQDA5xp3+OF/L1oRNpz1iPe
wOIvkbRnihI1tgVIpZB/cB27nt3AbmRrNoRbTPkpfDpWurHSQ8v8sEMOz5CFXirMVjtktgoOx59Y
uKTAkXiCxbuVWLx3/xQWr2HxYmw6AjdkEI/0Y+oSVi5yeTjvPpygnW/grH5PaH6qJy/KzWopRbl6
frqq5Q8gRz5F5A+Cm4KVcPRnhxyFhQyuXbeBqAq3nhjSgdpW/dCTp+B0Qw/BkYNZvbLSI81hYOXl
zb9yZbhZweIzrhD+GIU+hSjpd4bqFLc0MxM3X+VsHvG21m2O2o1nivcVzFvKm5aH1s+jyPk6T0f7
HVfXQm+tNZyNvH7uBfMZg9kJFwbuvDNL5m6TPQ9nN2VnB/2bkSsQ8E1FxV64VHiDwZ2BRl++FS30
+Xcs5Bu9Db5APfI4aypWlHreY51oh3f71sa6Bp/2TCJG7V6BqCYL9WqaiT5KxfCZML5/oihvkwSW
sexiaBXLgVEFOCa4izlK64oYWqtDeFUwaNif8Ue1hm+H8Qh4iybg82EcIM9JyB+jYn0O6krSU4qP
6N2fGELGwylOwaS/TYEupPU+DWME7kYTAYoZS0jnZDIO61nGetCAGZtgdDHZunWMDQCWyYCEYoYm
vDswPHCkt4zzisd5LOGxwKtmbM4JPKI7E3TtBt5rz/nMT1rSkpa0pCUtaUn7P5j6Gq1MVeeukpJ8
q626lm/geGuOPdees2ARr0aODx059uwc++KsGI6QPbgtyDVy3lpkbwhwfvvWBt5ey9dt9y2o8yE7
52/ikL0x4PNyXmT3b6vZ0uit9yN7LK4NBpF9c6C+3t/A/VfrIO/e5P1ep8X/fBeIxZkJ+YaEeBaa
8E0CzKnxnRp/cQKBmhyq9QkW/x+ZvKrHfCxelpBPJfj52jyuv03jt2n83ISG9OQQvanx9XFg/HtK
zKUl5Ceuv0CrGedb4t9hND5OyE/sz2r9sxPwOH9eAp64/knaJ9hbGn/5v/Dj9jetXZN9yBIAAA==
EOM

Just run it and you will get the challenge in the tmp folder

Basic STAN Operations

So, let’s launch STAN passing as parameter our challenge:

$ stan /tmp/specimen
STAN is a sTAtic aNalyser. v 0.1
(c) pico

+ Opening file '/tmp/specimen'
+ Loaded '/tmp/specimen' 4808 bytes
+ ELF Machine ID: 62
+ Arch: 1 Mode:2 Type: 1
+ Processing Core...
Starting analysis
+ Processing 4 sections/segments
+ Processing section [0] '.text'
  * Analysing 342 instructions
+ Processing section [1] '.rodata'
+ Processing section [2] '.eh_frame'
+ Processing section [3] '.data'
--------------------------------------
CASE: 'corpse'
CORE: 0x696140
......................................
+ Dumming Core
  - File         : /tmp/specimen
  - Size         : 4808
  - Type         : ELF64
  - Valid        : VALID
  - Architecture : X86
  - Mode         : 64bits
[00] text_00 Addr:0x400000 Offset:0x0000 Size:0x06f0 (1776)
[01] data_01 Addr:0x601000 Offset:0x1000 Size:0x000c (12)
.................................................
[00]           .text 0x03 Addr:0x400144 Offset:0x0144 Size:0x0433 (   1075) [text_00+0x0144]
[01]         .rodata 0x06 Addr:0x400577 Offset:0x0577 Size:0x003f (     63) [text_00+0x0577]
[02]       .eh_frame 0x06 Addr:0x4005b8 Offset:0x05b8 Size:0x0138 (    312) [text_00+0x05b8]
[03]           .data 0x06 Addr:0x601000 Offset:0x1000 Size:0x000c (     12) [data_01+0x0000]
--------------------------------------
STAN] >

It shows a bunch of data related to the ELF file… Let’s look directly to the code:

STAN] > dis.section .text
(the whole program is dumped)

The family of dis commands are used for disassembling code. The dis.section one, disassembles a whole section. Usually you will use the dis.function that disassembles just 1 function. Any way, in this case, dumping the whole section will allow us to quickly find the main function.

Oops. I haven’t mention that, but the challenge is a stripped binary… so no symbols in it. STAN can deal with symbols and show then to you… but the stripped binary is just smaller to be included as text in this write up.

So, if you quickly browse the asm code, you will see the string Password: towards the begining of the dump. That is our main function. Alternatively, STAN creates a special symbol named __entry_point to reference the program entry point. Starting from there and following the different function calls you will eventually arrive to the same point.

The main function

Let’s take a look to the main function at the same time we discover more STAN functions.

STAN] > dis.function .text
+ Function '.text'@0x400144 found at section '.text'(1075,342)

                                 .text:
400144:   48 81 ec 08 04 00 00    	sub	rsp, 0x408
40014b:   ba 0b 00 00 00          	mov	edx, 0xb
400150:   be 8a 05 40 00          	mov	esi, 0x40058a		#  40058a(.rodata+13) : 'Password: '
400155:   bf 01 00 00 00          	mov	edi, 1
40015a:   31 c0                   	xor	eax, eax
40015c:   e8 b0 00 00 00          	call	<func_400211>			#  <func_400211> 400211(.text+0xcd)
400161:   48 89 e6                	mov	rsi, rsp
400164:   ba 00 04 00 00          	mov	edx, 0x400
400169:   31 ff                   	xor	edi, edi
40016b:   31 c0                   	xor	eax, eax
40016d:   e8 98 00 00 00          	call	<func_40020a>			#  <func_40020a> 40020a(.text+0xc6)
400172:   ff c8                   	dec	eax
400174:   48 89 e7                	mov	rdi, rsp
400177:   48 98                   	cdqe
400179:   c6 04 04 00             	mov	byte ptr [rsp + rax], 0
40017d:   e8 29 00 00 00          	call	<func_4001ab>			#  <func_4001ab> 4001ab(.text+0x67)
400182:   31 c0                   	xor	eax, eax
400184:   48 81 c4 08 04 00 00    	add	rsp, 0x408
40018b:   c3                      	ret
+ Stopped after founding symbol '__entry_point' (18 instructions)
STAN] >

As you can see, STAN already show us strings if the are referenced from the program (not always the case) and it also creates dummy names for every function it finds. You can see three function calls in the main program.

If you further explore the first two func_400211 and func_40020a you will find out that the first one is a write and the second one is a read. To figure out this using STAN you will have to dump the whole section. The opcode analysis module is still pretty basic and it cannot figure out complex structures like the one you will find in there.

Alternatively, use your intuition. If you have already run the program, you know it will show a message and then ask for some input… Let’s go that path and let’s use the renaming functions provided by STAN:

STAN] > func.rename func_400211 maybe_write
 + Found function func_400211
 + Found Symbol func_400211
STAN] > func.rename func_40020a maybe_read
 + Found function func_40020a
 + Found Symbol func_40020a
STAN] >

Note: All those debug messages will eventially disappear

If we now dump the code again, it will look like this:

STAN] > dis.function .text
+ Function '.text'@0x400144 found at section '.text'(1075,342)

                                 .text:
400144:   48 81 ec 08 04 00 00    	sub	rsp, 0x408
40014b:   ba 0b 00 00 00          	mov	edx, 0xb
400150:   be 8a 05 40 00          	mov	esi, 0x40058a		#  40058a(.rodata+13) : 'Password: '
400155:   bf 01 00 00 00          	mov	edi, 1
40015a:   31 c0                   	xor	eax, eax
40015c:   e8 b0 00 00 00          	call	<maybe_write>			#  <maybe_write> 400211(.text+0xcd)
400161:   48 89 e6                	mov	rsi, rsp
400164:   ba 00 04 00 00          	mov	edx, 0x400
400169:   31 ff                   	xor	edi, edi
40016b:   31 c0                   	xor	eax, eax
40016d:   e8 98 00 00 00          	call	<maybe_read>			#  <maybe_read> 40020a(.text+0xc6)
400172:   ff c8                   	dec	eax
400174:   48 89 e7                	mov	rdi, rsp
400177:   48 98                   	cdqe
400179:   c6 04 04 00             	mov	byte ptr [rsp + rax], 0
40017d:   e8 29 00 00 00          	call	<func_4001ab>			#  <func_4001ab> 4001ab(.text+0x67)
400182:   31 c0                   	xor	eax, eax
400184:   48 81 c4 08 04 00 00    	add	rsp, 0x408
40018b:   c3                      	ret
+ Stopped after founding symbol '__entry_point' (18 instructions)
STAN] >

Getting there

So, we have figured out the write and the read and there is only one more function left, the mysterious func_4001ab. So, we better figure out what we are passing as parameter to the function.

You are probably better than me, but I cannot really remember the calling convention for 64bits functions… so I added a help.abi command to STAN to remember me the order:

STAN] > help.abi
  Current Core is: Linux X86 64bits
  -> func (RDI, RSI, RDX, RCX) -> RAX
STAN] >

Good, so, let’s look at the code above to figure out what we may expect to receive in that mysterious function… OK, let’s rename it before continuing.

STAN] func.rename func_4001ab mystery

So, we can see that RDI is initialised with RSP (the top of the stack). And what may be in the top of the stack?. Let’s check the previous function, the read:

RDI = 0 (xor edi,edi)
RSI = RSP
RDX = 0x400

So, it actually looks like a read call, and the second parameter is the buffer to store the data that is set to the top of the stack. Then, without changing anything we call the mystery function passing as parameter whatever we had read from the user.

OK. It is being time for a break so, let’s note down what we have found out, before we leave so we do not have to start from scratch when we come back:

STAN] > func.rename func_4001ab mystery
 + Found function func_4001ab
 + Found Symbol func_4001ab
STAN] > comment.add 40016d read (0 = stdin, RSP, 0x400)
+ Adding comment 'read (0 = stdin, RSP, 0x400)' at 0x40016d
STAN] > comment.add 40017d mystery (RSP) -> we just pass in the data read from the user
+ Adding comment 'mystery (RSP) -> we just pass in the data read from the user' at 0x40017d

Yes, we can add comments to specific addresses. And then the code will look like this:

STAN] > dis.function .text
+ Function '.text'@0x400144 found at section '.text'(1075,342)

                                 .text:
400144:   48 81 ec 08 04 00 00    	sub	rsp, 0x408
40014b:   ba 0b 00 00 00          	mov	edx, 0xb
400150:   be 8a 05 40 00          	mov	esi, 0x40058a		#  40058a(.rodata+13) : 'Password: '
400155:   bf 01 00 00 00          	mov	edi, 1
40015a:   31 c0                   	xor	eax, eax
40015c:   e8 b0 00 00 00          	call	<maybe_write>			#  <maybe_write> 400211(.text+0xcd)
400161:   48 89 e6                	mov	rsi, rsp
400164:   ba 00 04 00 00          	mov	edx, 0x400
400169:   31 ff                   	xor	edi, edi
40016b:   31 c0                   	xor	eax, eax
40016d:   e8 98 00 00 00          	call	<maybe_read>			#  <maybe_read> 40020a(.text+0xc6)
                                        ; read (0 = stdin, RSP, 0x400)
400172:   ff c8                   	dec	eax
400174:   48 89 e7                	mov	rdi, rsp
400177:   48 98                   	cdqe
400179:   c6 04 04 00             	mov	byte ptr [rsp + rax], 0
40017d:   e8 29 00 00 00          	call	<mystery>			#  <mystery> 4001ab(.text+0x67)
                                        ; mystery (RSP) -> we just pass in the data read from the user
400182:   31 c0                   	xor	eax, eax
400184:   48 81 c4 08 04 00 00    	add	rsp, 0x408
40018b:   c3                      	ret
+ Stopped after founding symbol '__entry_point' (18 instructions)

Time to save our work and take a break.
Just type:

STAN] case.save

Finishing the challenge

Hope you have had a nice break. I did. Now we can launch again STAN, but this time we are going to open the file from the STAN command-line

$ stan
STAN is a sTAtic aNalyser. v 0.1
(c) pico

STAN] core.load /tmp/specimen
+ Cleanning up core
+ Deleting Segments....
+ Deleting Sections....
+ Deleting Symbols....
+ Opening file '/tmp/specimen'
+ Loaded '/tmp/specimen' 4808 bytes
+ ELF Machine ID: 62
+ Arch: 1 Mode:2 Type: 1
+ Processing Core...
Starting analysis
+ Processing 4 sections/segments
+ Processing section [0] '.text'
  * Analysing 342 instructions
+ Processing section [1] '.rodata'
+ Processing section [2] '.eh_frame'
+ Processing section [3] '.data'
STAN] case.load /tmp/specimen.srep
-> SYMBOL: 'mystery' '4001ab'
 + Found function mystery
 + Found Symbol mystery
-> FUNCTION: 'mystery' '0x4001ab'
-> COMMENT: 'read (0 = stdin, RSP, 0x400)' '0x40016d'
-> COMMENT: 'mystery (RSP) -> we just pass in the data read from the user' '0x40017d'
STAN]

As you can imagine core.load allows you to load a binary from the disk (you can use TAB completion to navigate the file system), then you use case.load to load your previous saved status. For the time being case.save saves the state as a file with the same name than the binary under analysis but with extension .srep.

Now we can go, and reverse the mystery function:

Unveiling the `mystery`

Let’s disassemble mystery

STAN] > dis.function mystery
+ Function 'mystery'@0x4001ab found at section '.text'(1075,342)

                               mystery:
4001ab:   51                      	push	rcx
4001ac:   be 77 05 40 00          	mov	esi, 0x400577		#  <.rodata> 400577(.rodata+0) : '0x00sec'
4001b1:   e8 98 02 00 00          	call	<func_40044e>			#  <func_40044e> 40044e(.text+0x30a)
4001b6:   85 c0                   	test	eax, eax
4001b8:   75 11                   	jne	<l0>			# 4001cb(.text+0x87)
4001ba:   ba 05 00 00 00          	mov	edx, 5
4001bf:   be 7f 05 40 00          	mov	esi, 0x40057f		#  40057f(.rodata+8) : 'Good\n'
4001c4:   bf 01 00 00 00          	mov	edi, 1
4001c9:   eb 11                   	jmp	<l1>			# 4001dc(.text+0x98)
                                    l0:
4001cb:   ba 04 00 00 00          	mov	edx, 4
4001d0:   be 85 05 40 00          	mov	esi, 0x400585		#  400585(.rodata+e) : 'Bad\n'
4001d5:   bf 01 00 00 00          	mov	edi, 1
4001da:   31 c0                   	xor	eax, eax
                                    l1:
4001dc:   e8 30 00 00 00          	call	<maybe_write>			#  <maybe_write> 400211(.text+0xcd)
4001e1:   83 c8 ff                	or	eax, 0xffffffff
4001e4:   5a                      	pop	rdx
4001e5:   c3                      	ret

Here we find two labels: l0 and l1. And we can also see a call to what we believe is a write after setting the strings for the right and wrong password. Everything should be obvious now, but let’s rename the labels for the LuLz

STAN] > label.rename l0 BadBoy
 + Found label l0
- DEBUG: Symbol 'l0' not found
STAN] > label.rename l1 print_and_exit
 + Found label l1
- DEBUG: Symbol 'l1' not found
STAN] > dis.function mystery
+ Function 'mystery'@0x4001ab found at section '.text'(1075,342)

                               mystery:
4001ab:   51                      	push	rcx
4001ac:   be 77 05 40 00          	mov	esi, 0x400577		#  <.rodata> 400577(.rodata+0) : '0x00sec'
4001b1:   e8 98 02 00 00          	call	<func_40044e>			#  <func_40044e> 40044e(.text+0x30a)
4001b6:   85 c0                   	test	eax, eax
4001b8:   75 11                   	jne	<BadBoy>			# 4001cb(.text+0x87)
4001ba:   ba 05 00 00 00          	mov	edx, 5
4001bf:   be 7f 05 40 00          	mov	esi, 0x40057f		#  40057f(.rodata+8) : 'Good\n'
4001c4:   bf 01 00 00 00          	mov	edi, 1
4001c9:   eb 11                   	jmp	<print_and_exit>			# 4001dc(.text+0x98)
                                BadBoy:
4001cb:   ba 04 00 00 00          	mov	edx, 4
4001d0:   be 85 05 40 00          	mov	esi, 0x400585		#  400585(.rodata+e) : 'Bad\n'
4001d5:   bf 01 00 00 00          	mov	edi, 1
4001da:   31 c0                   	xor	eax, eax
                        print_and_exit:
4001dc:   e8 30 00 00 00          	call	<maybe_write>			#  <maybe_write> 400211(.text+0xcd)
4001e1:   83 c8 ff                	or	eax, 0xffffffff
4001e4:   5a                      	pop	rdx
4001e5:   c3                      	ret

OK… so can you figure out the password for this crackme?
In case you wonder, the func_40044e is actually strcmp you can dive deeper in the code to figure this out… dis.function func_40044e… this is a simple one if you want to try.

Other commands you may find interesting

Just for completion, these are a few commands that may also be useful if you want to play with STAN:

comment.del addr: Deletes a previous comment… sometimes we make mistakes
mem.dump x addr count. Dumps content of address addr as hex bytes. You can change x by p to dump words (pointers)

STAN] > mem.dump x 0x400577 10
func.def. This allows you to tell STAN that you believe there is a function at some address. As I said, the analysis module is pretty poor so you may spot obvious functions that STAN missed.

And, I haven’t say that… but STAN is colourful . This is how our functions looks like after all our hard work

Conclusions

As a final note, I have found this little tool very useful. It is not as powerful as radare2 or binary ninja, but it is very easy to use and helps a bit but still force you to do some work… which is something good when you are starting and you are still learning the basics.

It is still a lot to do and as I said, it is kind of alpha SW, so, use it at your own risk

hack fun

ricksanchez · June 22, 2017, 6:15am

You mad man @0x00pf . Nice post as always.

Soon you will post challenges and then we can just run STAN and it will spit out the solutions

pry0cc · June 22, 2017, 7:42am

My internal mental monologue:

"Hmm I wonder where Pico is. He seems kind of quiet lately’
“Who’s Stan?”
“OH DAMN HE’S ONLY DONE IT AGAIN”

This is super cool. I am so happy you have made something just for beginners. Most of the commands are readable, and thus easier to remember.

As somebody who is just getting into RE. I am going to give this a try. I’ll give you my feedback, I understand it is super early days. I wonder what you will add next?

0x00pf · June 22, 2017, 4:16pm

@ricksanchez… it will not be that easy. Actually the challenge I’m working on completely defeats STAN… but I’ll do some improvements to the tool before releasing the challenge (still waiting for some feedback on the challenge tho)

@pry0cc I look forward to any feedback. What’s next?.. well I have realised that STAN is way too STAtic and it needs some big change in the disassembling module… It is coming

UPDATE: You have autocompletion of the commands pressing TAB. Just pressing TAB on a blank prompt will show you all of commands… thanks to the amazing readline library!)

SmartOne · June 22, 2017, 4:20pm

This is crazy Phenomenous work! And you did it all on your own? How long did it actually take you to code it/get the idea?
Best, SmartOne

0x00pf · June 22, 2017, 4:31pm

Thanks @SmartOne

That capstone framework is amazing. I’ve got a working PoC (around 1.2KLoC) in a couple of days. Then it took me 2/3 days to just refactor that code in something with a bit of structure. So, roughly it took two complete weekends plus some hours every day during the week.

Regarding the idea, I knew about capstone and I wanted to try it. Just followed the documentation in their site and the test program started to grow very fast …

ricksanchez · June 22, 2017, 7:30pm

just joking man . Also you’re telling your story of “just 1.2kLoC in a couple of days” like it is nothing. I’d need way longer ofr that. So I’m massively impressed

Also looking forward to the next challenge

0x00pf · June 22, 2017, 9:35pm

OK… my bad I just checked the lines in the source file not the actual LoCs. Just run sloccount on my first PoC and it reported 978 LoC, so roughly just 1KLoc. Sorry about the confusion.

ricksanchez · June 23, 2017, 5:47am

Oh if that’s the case things obviously change… and I thought you’re a genius man, but just 978 LoC that’s not even 1k. mpfff weak stuff man .

Ah common @0x00pf it doesnt matter if it’s 1k or 1.2k LoC… It’s an alpha release and it’s already a tool which makes the life of every beginner RE or someone who wants to step into that area easier and what’s even more important more approachable!

You’ll add more stuff and it’ll grow. For what I saw in your post here I will still be massively impressed no matter what.