picoCTF Write-up ~ Forest (RE)

_py · April 19, 2017, 9:03am

Hello folks! I hope you’re doing great. This is my first ever CTF write-up yay!

During the last couple of weeks, @IoTh1nkN0t, @dtm, @kowalski and myself, took part in picoCTF. It’s an “entry-level” CTF, which practically means, it’s made for CTF rookies! Well, you needed a little bit of experience in order to tackle some of the more challenging tasks but nonetheless it was hella fun. And the best part? The tasks stay online for a year+ so go pwn 'em all now! I’m pretty sure it was our first CTF as a team and as individuals. We ended up placing in the top 120, though at some point we reached top 70 (12k+ teams in total). My teammates will follow up with their own write-ups as well AFAIK.

The main reason I’m writing this post isn’t to show off my solving skills, but to get you interested in CTFs since they are extremely fun and personally I learnt a ton. I won’t be writing about all the tasks I’ve solved. I’ll focus on the ones that were fun and challenging to me. I’ll begin with forest , which was a Reverse Engineering task and in the next days I’m hoping to release my write-up on a Forensics and Exploit Development (bypassing ASLR via format string bug) task.

Without further ado, let’s get right into it!

###Binary Analysis

Let’s begin with the RE starter-pack analysis:

> file forest
  forest: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=4bc47bea06a7f6c49f552be50a5d028989dd9f50, stripped

Sweet! A 32-bit ELF binary. Oh wait, stripped as well. I haven’t dealt with more than 2 stripped binaries in my life so it took me quite a bit of time to analyse it and understand it in Binary Ninja. I was also given the following text:

DLLDLDLLLLLDLLLLRLDLLDLD
LLLRRDLLLLRDLLLLLDLLRLRR
RDLLLDLLLDLLLLLDLLRDLLLR
RLDLLLDLLLLLDLLLRLDLLDLL
RLRRDLLLDLLRLRRRDLLRDLLL
LLDLLLRLDLLDLLRLRRDLLLLL
DLLRDLLLRRLDLLLDLLLLLDLL
RDLLRLRRDLLLDLLLDLLRLRRR
DLLLLLDLLLLRLDLLLRRLRRDD
LLLRRDLLLRRLRDLLLRLDLRRD
DLLLRLDLLLRRRDLLRLRRRDLRRLD

Pretty non-sense, right? Have no fear! We will find out soon!

By the way, some of the RE veterans would probably understand what this binary is about just from its name but I’m a sucker .

I did not bother running the binary in the beginning so let’s dig deeper with bninja.

Here are the important instructions from the snippet:

1. lea ecx, [esp + 0x4]          # address of argc
2. mov eax, dword [data_8049c0c] # stores a string pointer into eax
3. sub esp, 0xc 
4. push eax
5. call sub_80486f0

The pointer being stored into eax is a string pointer which contains the following string:

yuoteavpxqgrlsdhwfjkzi_cmbn

We’ll make sure that’s the one once we fire up GDB later on. Let’s move on with our analysis. So it looks like it calls a function at location 0x80486f0 with the content of eax being its argument.

Cute. Let’s see what it does step-by-step.

Stores the number 0 in 2 local variables. I said local variables because once you enter the function’s body, ebp + offset refers to arguments and ebp - offset refers to local variables. That’s how it works in 32-bit ELF binaries.
Quick little tip for the RE newbies. As I said, 0 is being stored in 2 local vars. Now look at the down-left corner of the above assembly snippet. ebp - 0x10 is actually a counter variable! In other words, whenever you see an initialization of a variable to 0, look for an addition of that variable with 1. There’s a pretty high chance that it’s a loop counter.

  add dword [ebp-0x10], 0x1 # ya sneaky counter

Jump to the next instruction.

Let’s enter the loop code…

1. mov   edx, dword [ebp + 0x8]
2. mov   eax, dword [ebp - 0x10]
3. add   eax, edx 
4. movzx eax, byte [eax]
5. test  al, al

English translation:

Grab dat string pointer.
Add the counter value to it. This is a pretty common way of parsing every byte of a string FYI.
Store the byte in eax.
Check if it’s null.

Neat so far. So in each loop iteration it will read one byte from the string and check if it’s a null-byte. Which makes sense right? Once we reach the end of the string, that’s what should be there.

Moving on to the case where it’s not a null-byte, which is the juice of that function.

1. mov edx, dword [ebp + 0x8]  
2. mov eax, dword [ebp - 0x10]
3. add eax, edx
4. movzx eax, byte [eax]
5. movsx eax, al
6. sup esp, 0x8
7. push eax
8. push dword [ebp - 0xc]
9. call sub_8048607

English translation:

Not sure if you noticed it, but it does the exact same thing, except this time, it pushes the byte at address string pointer + counter and that weird value at ebp - 0xc as arguments to a function as well.

Let’s recap our intel:

So far the binary gets a pointer to a string.
Function at address 0x80486f0 receives that pointer as an argument.
It iterates through each byte and calls for each byte a function at address 0x8048607.

Ok cool, not bad so far. Let’s investigate the function at 0x8048607 and do not panic.

I kinda got scared once I noticed that this function calls itself. Reversing recursion can get tricky. Nevertheless, let’s tackle this bad boy. As always, I’ll write down the assembly and the english pseudo version of it for the non-assembly readers.

1. mov eax, dword [ebp + 0xc]  # string's byte 
2. mov byte [ebp - 0x1c], al   # store the byte in ebp - 0x1c 
3. cmp dword [ebp + 0x8], 0x0  # malloc_ptr 
4. jne 0x804864b               # check if pointer is null

The part you should pay attention to is the ebp + 0x8 variable (line 3 in the above code snippet) which was ebp - 0xc in the previous function we inspected. If you haven’t noticed it already, the current function is calling malloc in each recursion. And if you didn’t know, the return value of a function is stored in eax. Our function’s disassembly shows us that afterwards, the result (pointer in our case since we are dealing with malloc), will be stored in ebp + 0x8. Thus, we can safely assume, that ebp + 0x8 is indeed a malloc’d pointer.

1. call malloc
2. mov dword [ebp + 0x8], eax

Don’t worry though, we’ll delve deeper into the actual functionality.

Interesting. So this function calls malloc and stores the pointer into ebp + 0x8. But wait, there is more! Let’s analyze further the branch that is taken if the malloc pointer is null, which is indeed null when the function is called for the first time.

1.  sub esp, 0xc
2.  push 0xc 
3.  call malloc
4.  add esp, 0x10
5.  mov dword [ebp + 0x8], eax
6.  mov eax, dword [ebp + 0x8]
7.  mov dword [eax], 0x0
8.  mov eax, dword [ebp + 0x8]
9.  mov [eax + 0x4], 0x0
10. mov eax, dword [ebp + 0x8]
11. mov edx, byte [ebp - 0x1c]
12. mov byte [eax + 0x8], dl
13. mov eax, dword [ebp + 0x8] 
14. jmp 0x80486a2

This is too much to handle at once so I’ll provide english translation and some dynamic analysis.

English translation:

Allocate memory of 12 bytes.
Store the pointer of the allocated chunk in ebp + 0x8.
Fill with 0s the first 4 bytes of that chunk.
Fill with 0s the next 4 bytes of that chunk.
Fill the next 4 bytes of that chunk with the hex value of the string’s byte.

Here’s a snippet from GDB:

Looks like I was right! 0x79 is ‘y’ in hex fyi.

I’ve dealt with linked lists in the past and it was clear to me that this binary was no different. Something tells me that those 0s will be soon filled with addresses. Let’s see if I was right ;).

As you can see from the bninja snippet, once the malloc branch is taken, the function exits. Sweet, let’s see what’s coming next! Obviously the function will be called again as I’ve said before until a null-byte is read from the string. By the way, since the string is kinda long, I won’t explain every single iteration of the recursion since it’s tedious and to be honest all that matters is to understand the logic of the binary. The rest is history.

Enough of the chatter! Let’s get back to the disassembly. We’ll continue with the same function, though this time a different branch will be taken since ebp + 0x8 will contain the address of the chunk we just malloc’d and not null.

The following snippets are the bread and butter of the entire task. I bet some of you already know where I’m getting at ;). This time I will include some comments for reading convenience.

Off to the english translation:

Store a malloc’d pointer in eax.
Store the value at offset malloc’d + 0x8 in eax. That is, the hex value of a character of the weird string I showed in the beginning.
Compare that value with the current byte of the string.
Branch depending on the result of the comparison.

###Forest Full Of Trees
Let’s see the branches and get our minds blown away:

With a quick look, it seems like both branches do the same thing. BUT, with a more careful look, that’s not the case. The left branch represents the branch to be taken if the comparison is greater or equal, while the right one represents the opposite. Wait a second! That pattern is familiar! Time for the recap.

We have a string of lower-case letters.
For each byte of the string, we check if its null. If yes, terminate.
For each byte of the string, we call malloc.
With malloc we create a 12-byte chunk. Let’s call it node.
In each node we store stuff. One of the stuff in them is the hex representation of one of the letters of the string.
For each newly created node, we check if the current string’s byte is greater or smaller than the current node’s value we’re at.

Do you see it? It should’ve hit you in the face by now! We are dealing with a Binary Search Tree! Remember the name of the task, “forest” ;).

Let’s investigate the disassembly further and see it in action.

1. movsx edx, byte [ebp - 0x1c]    
2. mov eax, dword [ebp + 0x8]      
3. mov eax, dword [eax || eax + 0x4]               
4. sub esp, 0x8
5. push edx
6. push eax
7. call alloc
   [ ... ]

Couple of things to note. I’ve changed the name of the function to alloc for ease and second of all have a look at line #3 of the disassembly. The main and only difference between the branches is that exact line. The left branch refers to a value stored at offset malloc’d + 0x0, while the right branch refers to a value stored at offset malloc’d + 0x4.

I believe it’s pretty clear what’s going on right now. For clarity’s sake, I’ll include one more GDB snippet to prove to you that we are dealing with a BST.

Awesome! The above snippet is taken during the 2nd iteration of the weird string, which in this case is the letter ‘u’, aka 0x75. Let me translate it a little bit more.

                          0x804a008
                          Root Node
                      +----------------+
                      |   left* node   |
                      |   right* node  |
                      |      0x79      |
                      +----------------+
                   _ /
                 /
                /
            0x804a018
            Left Node
       +----------------+
       |   left* node   |
       |   right* node  |
       |      0x75      |
       +----------------+

And:

struct Node {
    struct Node* left;
    struct Node* right;
    int32_t value;
}

So this is the current state of the tree and the structure of each node. There is no way in hell I’ll make a whole ascii art for the tree because I’ll die till then. I’m pretty sure you get the idea. Feel free to draw out the whole tree on a paper. It shouldn’t take more than 10 mins. That way you can follow along even easier.

Simple stuff. The binary creates a binary search tree depending on the hex values of the incoming characters. Root node being 0x79, on the right of it the values greater than 0x79 and on the left values less than 0x79 (0x75 in our case and so on).

Look at that! Our tree is growing!

Anyway, now that our tree is slowly made, let’s go back to main and move on with the disassembly.

###Flag Checking Algorithm

Running the binary without arguments gives the following message:

> ./forest
You have the wrong number of arguments for this forest.
./forest [password] [string]

Hmm, there is a [string] reference. What could that be? Remember that strange text that was given for the task?

DLLDLDLLLLLDLLLLRLDLLDLD
LLLRRDLLLLRDLLLLLDLLRLRR
RDLLLDLLLDLLLLLDLLRDLLLR
RLDLLLDLLLLLDLLLRLDLLDLL
RLRRDLLLDLLRLRRRDLLRDLLL
LLDLLLRLDLLDLLRLRRDLLLLL
DLLRDLLLRRLDLLLDLLLLLDLL
RDLLRLRRDLLLDLLLDLLRLRRR
DLLLLLDLLLLRLDLLLRRLRRDD
LLLRRDLLLRRLRDLLLRLDLRRD
DLLLRLDLLLRRRDLLRLRRRDLRRLD

Yep, that non sense is about to be useful to us. Assembly, assembly where you at?

Once again, I renamed some parts in order to make sense out of it easier. So our tree is made and looking cool. Now it’s time to have our password checked. The arguments being passed to the function are:

Password
Strange text
Tree pointer

Obviously the function will process the provided password against the tree but where does the strange text fit in? I will not include all the parts of the function because they are not needed for the reader’s comprehension.

Inside check there is a call to a different function (I know, my creativity is crap but I had no idea what else to name it because it also does checking!) called actual_check. This function receives as arguments the tree pointer, our password and the strange text as shown in the disassembly.

The magic in that snippet is at this line:

and dword [ebp - 0xc], eax

We’ll come back to it shortly. As always, another recap to stay updated!

We are about to investigate if our password is correct. Let’s pretend that we enter “megustapy”.
The check function will simply iterate over each byte of the strange text and our password.
For each byte it will call actual_check.

Off to actual_check!

Our password’s byte is being compared with the value of the current tree node (by the way, if you are not familiar with how BST traversing works, I suggest you looking it up because you won’t understand a single word). Now it’s time for the interesting part.

The above branch is taken if the password’s byte is equal to the node’s hex value. Since ‘y’ is the root node, that would mean that if the first character of our password was ‘y’, it would jump to that branch.

The above branch is taken if the password’s byte is NOT equal to the node’s value. So what does this code do you ask? It checks if the password’s byte value is greater/equal or less than the node’s value. Depending on the comparison, it will jump again to a certain branch. I think you get a feeling of where this is going. Let me summarize it.

Each byte of our password will get compared against the node’s value we are currently at.
Depending on if it’s equal/greater/less than the node’s value, we will jump to a certain branch.
There is another comparison though, which I’m about to explain.

As you can hopefully see, after we jump to a branch there is a “sub”-comparison. What this does is check the strange text’s current byte value. Here is a link to the text as well. It contains only 3 letters, ‘L’, ‘R’, ‘D’. If you are familiar with binary trees, you could easily guess that ‘L’ and ‘R’ are referring to the left and right nodes respectively. But what the hell is ‘D’?

I have to admit that this part drove me nuts, but after some googling and brain-storming it hit me. ‘D’ actually refers to the data of the current node. Cool, so now we know how the text is actually interpreted. But how is it connected to the password checking algorithm? In order to understand the whys and hows, let’s take a step back and analyse check a little more carefully this time.

Well, remember a “magic” line I mentioned before in the check’s body? Let me refresh your mind:

and dword [ebp - 0xc], eax

What’s going on there? Looks like actual_check’s return value is getting AND’d with the value at ebp - 0xc. But what’s the “magic” value at ebp - 0xc? I hid that part on purpose but it’s time reveal it ;).

mov dword [ebp-0xc], 0x1

So why am I showing you this and why is this important?

We enter that part of the function once we stumble upon a null-byte in our password and the weird text. Let’s look at the disassembly closer:

1.  mov eax, dword [ebp - 0x10] 
2.  mov eax, byte [eax]
3.  test al, al
4.  sete dl
5.  mov eax, dword [ebp - 0x14]
6.  movzx eax, byte [eax]
7.  test al, al
8.  sete al
9.  and eax, edx
10. movzx eal, al
11. and dword [ebp - 0xc], eax
12. mov eax, dword [ebp - 0xc]

1-4: Set dl to 1 if eax is 0, which indeed is 0 since we can’t enter that code unless we reached the null-byte.
5 - 8: Set al to 1 if eax is 0.
9 - 11: AND al with dl and the result AND it with that “magic” value.

Afterwards, check returns back to main with the “magic” value being its return value. Then, the value is compared with 0, if it is, then we screwed up. Otherwise, big win!

We get a massive hint out of the above snippet. The return value should be 1! Which means the AND result should be 1! So how can we make the “magic” value be 1?

It looks like the only way to get 1 is by meeting 2 conditions at the same time. In particular, the current strange_text’s byte should be ‘D’ and our password’s byte should be equal to the node’s value. If we are at ‘D’ and the bytes are not equal, the return value which is stored in the magic value will be 0. Resulting to the final AND operation being 0. And as I mentioned earlier, 0 will output “Nope” on the screen. With that assumption, the rest is easy. Fire up a tab with the strange_text, you’ll need it.

Here’s the plan:

We figured out that strange_text is basically a traversing path.
We also found out that in order for our password to be accepted, each of its bytes should be equal to the current node’s value IF AND ONLY IF we are at an offset of strange_text that contains the byte ‘D’ as well.
In other words, each time we meet a ‘D’, we need to make sure that the password’s byte is equal to the value of the node. We don’t care about ‘L’, ‘R’, the binary will “skip” those either way.

This might not make sense at the moment but once I make my beautiful tree drawing, you’ll see what I’m talking about. This is how I edited strange_text. I highly recommend having it opened.

Before we begin traversing the tree, let’s establish a pseudo-lang:

L - Left node/child
R - Right node/child
D - Read the value at the current node

###A Picasso Was Born

There it is, our beloved tree. The hex values represent the actual characters. I thought of doing it by hand but sometimes online tools can be helpful. Let’s move on with our solution. Let me remind you that once ‘D’ is met, actual_check exits and it’s called again for the next LDR etc sequence. For every new sequence, the tree is traversed from the start. Let’s walk through one example and the rest should make sense. If we follow the algorithm carefully, it will hint us the password by itself.

###The Finale
####1st letter

First letter of the sequence is ‘D’.
What does that mean? Well, according to our assumption, we are currently at the root node, meaning, 0x79. Since there is ‘D’ in our sequence, our password’s first byte should be equal to the node we are currently at. Meaning, ‘y’. So the first letter should be ‘y’.
Then, actual_check exits with magic value still being 1 and comes back again with a new sequence.

####2nd letter

This time, with LLD. What does this mean for our password?
The tree will be traversed from its root node as always.
This time though it will be searched as Left-Left-Data. Which means, 0x79 --> 0x75 --> 0x6f --> Stop and check.
‘o’, which is 0x6f in hex, will be compared against our 2nd password’s byte. If they are equal, actual_check exits again and the magic value remains 1.
So far so good. We’ve got “yo” so far. Something tells me the next character will be ‘u’. Let’s test it.

####3rd letter

We’ve got LD now.
Our tree will be traversed in the following manner; 0x79 --> 0x75 --> Stop and check.
‘u’, which is 0x75 in hex, will be compared against our 3rd password’s byte. If it is indeed ‘u’, magic value stays 1 and we are set.

As you see it’s like a game now. We’ve understood the meat of the task so traversing the tree should be a piece of cake. Following the same thought process we end up with the following password:

you_could_see_the_forest_for_the_trees_ckyljfxyfmsw

Ayy! Let’s check if it’s correct, shall we?

> ./forest you_could_see_the_forest_for_the_trees_ckyljfxyfmsw DLLDLDLLLLLDLLLLRLDLLDLDLLLRRDLLLLRDLLLLLDLLRLRRRDLLLDLLLDLLLLLDLLRDLLLRRLDLLLDLLLLLDLLLRLDLLDLLRLRRDLLLDLLRLRRRDLLRDLLLLLDLLLRLDLLDLLRLRRDLLLLLDLLRDLLLRRLDLLLDLLLLLDLLRDLLRLRRDLLLDLLLDLLRLRRRDLLLLLDLLLLRLDLLLRRLRRDDLLLRRDLLLRRLRDLLLRLDLRRDDLLLRLDLLLRRRDLLRLRRRDLRRLD
You did it! Submit the input as the flag

Voila!

###Conclusion

That was it folks. I hope it wasn’t too much to handle. It’s my first ever CTF write-up and I wasn’t sure about how analytical I should be. I tried my best to explain it from my point of view like when I was trying to figure out what the binary does.

Thank you for taking the time to read my write-up and I hope you learnt something new out of it. Feedback is always welcome and much appreciated.

Peace out,
@_py

0x00pf · April 19, 2017, 9:24am

Pretty cool write up!.. Looking forward to more!!!

anon79434934 · April 19, 2017, 11:51am

Unfortunately there are no CTF’s without hard stuff because otherwise I would’ve participated.

Good write up. +1

-Phoenix750

_py · April 19, 2017, 11:57am

Actually the CTF had a fair amount of beginner-friendly tasks as well.

It’s not possible to know every task’s solution just by reading it. We solved plenty of tasks as a team and at least 85% of them required us to google/research quite a lot. Especially crypto. Personally, the pwning one that I’ll be writing about took me 5 days of straight research and asking questions.

Don’t wait to feel ready! As @dtm says, “DON’T LET YOUR DREAMS BE MEMES!”

exploit · April 19, 2017, 12:01pm

Nice writeup °_° keep up

pry0cc · April 20, 2017, 4:56pm

Ha “beginner friendly tasks”

Reverse Engineering is hard. That’s definitely true when you’re just starting out.

@anon79434934, we need to figure out a way to bridge the gap between complete beginner and intermediate. I am struggling with this myself. When we break it we should share our findings.

_py · April 20, 2017, 5:02pm

Bruh, I didn’t say Reverse Engineering is beginner-friendly.

If you and @anon79434934 actually bothered to sign up you would see what I’m talking about. No need to have a passive attitude towards CTFs. They are meant to make you feel dumb in order to improve. Embrace it.

There was a task where you had to sign up on the CTF’s forum and you got the flag. There was another one where you had to set up SSH keys in order to connect to a shell. There was a task that was dealing with bash loops and much much more of the same style. There were tasks that literally took 1 minute to solve if you googled for the right online tool.

ricksanchez · April 20, 2017, 5:09pm

I’m new to CTFs as well and I’m no “pro” in all means, but just trying to figure out the solution and playing around with various approaches is fun. Sure it can be tiring and frustrating if it ain’t working but that’s where the fun lies.
With every solved question/CTF the next one will be easier ( or not ).

For myself I can say that I’ll try to start and actually finish more CTFs in the near future whenever I have the time because the small tricks and working steps you can take in order to solve something are worth learning for everybody.

I didnt start the current picoCTF but I will do so soon :). so I skipped the write up for now Sorry @_py . I’ll come back to it once I’m 110% frustrated or am done with the CTF

pry0cc · April 20, 2017, 8:10pm

Woah, no hate

You may be right, I need to push past that first hurdle. I can sit here and make excuses all day or I can just get on with it.

In my whole past of learning anything, infosec, programming and what not, I’ve been able to push past that first hurdle. Reverse Engineering + CTF’s seem to be the biggest initial hurdle I’ve ever come across.

I learn by doing, the problem with these is that you sort of have to know how, to do, but you can’t learn by doing if you can’t do it

dtm · April 21, 2017, 5:15am

But how much of Hacking: The Art of Exploitation have you read?

_py · April 21, 2017, 7:30am

@pry0cc I’m not hating. I’m peaceful and loving mayn.

Btw, you learn by reading. You master by doing. You can’t just stick your head in the sand. Once you know enough theory, you’ll have the resources to google for the right keywords/solutions and find your way through tasks.

ricksanchez · April 21, 2017, 9:09am

Tip for everyone in general here: READ THIS BOOK. It’s worth the money. It explains basics as well as advanced techniques and tricks to create chaos within a program, exploit it, change its behavior and more…
Also it’s delivered with a VM where you can try everything your own so it’s not just pure theory.

~cheers

VVid0w · April 21, 2017, 8:53pm

Ayy, I’ve actually had this book for awhile now in pdf form. Perhaps it’s time I got around to actually reading it.