Introduction
Hello folks, I hope you’re doing great! Today’s topic will be once again heap exploitation related. In fact, we’ll be going over the recent updates of glibc’s heap allocator, (pt)malloc
, which I’ve researched/reversed lately and decided to present them to you since there is barely any info out there. I emphasize glibc because in reality, heap implementations are libc/OS/hardware/browser specific. ptmalloc
is a pretty common heap allocator on Linux distros (not all) and more importantly in CTFs Specifically, today’s focus will be revolved around the so called
tcache
structure, which is a thread caching mechanism used to speed up malloc/free and is enforced by default (i.e Ubuntu 17.10 and up).
Required Knowledge
1) Pointer Gymnastics
2) ELF Loading Process
3) Linux Memory Organization
4) Familiarity with Heap Exploitation
5) Will
6) Patience
The last two are a must, the rest can be bypassed if there’s enough of the former. The 4th point however, is too hard to be bypassed. As much as I’d love to explain heap’s internals from the very beginning, the following dudes have been in the game long before I joined and I don’t want to take credits for what they’ve contributed.
They give a pretty decent high level overview of the heap management. However, what they provide can never be enough, you gotta dive into assembly with your debugger and figure it out on your own.
Setup
-
Ubuntu 17.10 x64 – I recommend using vagrant. It’s fast, minimal and gets the job done for this kind of research.
-
gdb – No gdb, no fun! I personally use peda but you’re free to use whatever flavour pleases you the most.
-
Source & Debug symbols – Even though this isn’t a must, every Reverse Engineer wakes up and goes to bed wishing that there were debug symbols available for whatever he/she might be reversing. That being said, here’s the Linux (user-space) heap debugging starter pack.
sudo apt-get install glibc-source
sudo apt-get install libc6-dbg
sudo tar xf /usr/src/glibc/glibc-2.26.tar.xz
Inside your gdb prompt type the following:
gdb-peda$ directory /usr/src/glibc/glibc-2.26/
gdb-peda$ b __libc_malloc
gdb-peda$ b _int_malloc
gdb-peda$ b _int_free
The above gdb commands will display the source code of the debugee function while you’re stepping through it. How sweet is that?! If you want to look at the full source code while debugging, open up /usr/src/glibc/glibc-2.26/malloc/malloc.c
in your favorite text editor and join in.
Note: Before, we begin I’d like to note that for the rest of this write-up I won’t be referring to the allocator as malloc
. The reason being the fact that in the glibc world, malloc
isn’t just one function, but a package of functions responsible for handling dynamically allocated chunks. You’ll see what I mean as we go through the reversing process. Even if you have no prior exposure to heap exploitation, I’ll try to make my explanations as visual as possible. Without further ado, let the fun begin!
__libc_malloc
Assuming that you’ve looked up the above resources, but even if you didn’t, you should know that when your program calls malloc
, in reality __libc_malloc
gets invoked.
void *
__libc_malloc (size_t bytes)
{
mstate ar_ptr;
void *victim;
void *(*hook) (size_t, const void *)
= atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0))
return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE
/* int_free also calls request2size, be careful to not pad twice. */
size_t tbytes;
checked_request2size (bytes, tbytes);
size_t tc_idx = csize2tidx (tbytes);
MAYBE_INIT_TCACHE ();
DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}
DIAG_POP_NEEDS_COMMENT;
#endif
if (SINGLE_THREAD_P)
{
victim = _int_malloc (&main_arena, bytes);
assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
&main_arena == arena_for_chunk (mem2chunk (victim)));
return victim;
}
arena_get (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
/* Retry with another arena only if we were able to find a usable arena
before. */
if (!victim && ar_ptr != NULL)
{
LIBC_PROBE (memory_malloc_retry, 1, bytes);
ar_ptr = arena_get_retry (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
}
if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);
assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
ar_ptr == arena_for_chunk (mem2chunk (victim)));
return victim;
}
libc_hidden_def (__libc_malloc)
The very first call to malloc
will actually go through the following code path:
if (__builtin_expect (hook != NULL, 0))
return (*hook)(bytes, RETURN_ADDRESS (0));
What __libc_malloc
does is to check for the content of the global function pointer variable know as __malloc_hook
.
gdb-peda$ x/gx &__malloc_hook
0x7ffff7dcfc10 <__malloc_hook>: 0x00007ffff7a82830
gdb-peda$ x/5i 0x00007ffff7a82830
0x7ffff7a82830 <malloc_hook_ini>: mov eax,DWORD PTR [rip+0x34ca0e] # 0x7ffff7dcf244 <__libc_malloc_initialized>
0x7ffff7a82836 <malloc_hook_ini+6>: push r12
0x7ffff7a82838 <malloc_hook_ini+8>: push rbp
0x7ffff7a82839 <malloc_hook_ini+9>: push rbx
0x7ffff7a8283a <malloc_hook_ini+10>: mov rbp,rdi
As you can see, the global hook is set to the address of malloc_hook_ini
.
static void *
malloc_hook_ini (size_t sz, const void *caller)
{
__malloc_hook = NULL;
ptmalloc_init ();
return __libc_malloc (sz);
}
malloc_hook_ini
firstly zeros out the global variable so that initialization doesn’t happen again and then triggers a sequence of function calls in order to initialize the main arena
structure. You can think of this structure as a roadmap for the heap allocator, which allows it to be able to keep track of free’d chunks and other crucial info. This calling sequence isn’t important to us but I still recommend stepping through them with a debugger and see it happening in front of your eyes.
Thread Local Cache
At this point the main arena is set up and ready to serve memory back to the user. Once the initilization phase is over, tcache_init
will take over.
# define MAYBE_INIT_TCACHE() \
if (__glibc_unlikely (tcache == NULL)) \
tcache_init();
static void
tcache_init(void)
{
mstate ar_ptr;
void *victim = 0;
const size_t bytes = sizeof (tcache_perthread_struct);
...
victim = _int_malloc (ar_ptr, bytes);
if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);
/* In a low memory situation, we may not be able to allocate memory
- in which case, we just keep trying later. However, we
typically do this very early, so either there is sufficient
memory, or there isn't enough memory to do non-trivial
allocations anyway. */
if (victim)
{
tcache = (tcache_perthread_struct *) victim;
memset (tcache, 0, sizeof (tcache_perthread_struct));
}
}
We’ve come to the meat of our discussion, the thread local caching structure. Let’s dissect the above snippet. There are plenty of tcache_perthread_struct
references. What’s that anyway?
static __thread tcache_perthread_struct *tcache = NULL;
typedef struct tcache_entry
{
struct tcache_entry *next;
} tcache_entry;
/* There is one of these for each thread, which contains the
per-thread cache (hence "tcache_perthread_struct"). Keeping
overall size low is mildly important. Note that COUNTS and ENTRIES
are redundant (we could have just counted the linked list each
time), this is for performance reasons. */
typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
So tcache_perthread_struct
consists of two arrays.
-
counts
is a byte array which is used as a fast way to indicate the number oftcache_entry*
in the corresponding index inside theentries
array. -
entries
is an array oftcache_entry*
, which aremalloc_chunk*
casted astcache_entry*
. Practically, they form a single linked list of free’d chunks. It’s important to note that each linked list can store up to 7 free’d chunks. If that number exceeds, the rest get stored in the “old fashioned” fastbin/smallbin list. Each index corresponds to a different size of chunk.
TCACHE_MAX_BINS
is 64 and the entries
array stores free chunks of sizes ranging from 24, all the way to 1032 bytes on x64. In malloc’s words, it can store fast and small chunks.
A really interesting fact from an exploit dev perspective is that the tcache
structure is stored on the heap!
victim = _int_malloc (ar_ptr, bytes);
tcache = (tcache_perthread_struct *) victim;
gdb-peda$ parseheap
addr prev size status fd bk
0x602000 0x0 0x250 Used None None
So whenever __libc_malloc
gets called for the first time, it will allocate a tcache
structure at the very beginning of the heap segment. This is very eye-opening because in the case of an inexperienced C programmer, he/she’d be blown away by the fact that one malloc
invocation led to two after all.
Tcache Internals
Theory is over, it’s time to get our hands dirty and prove our assumptions right. I’ve written a quick PoC to test our assumptions (for fast chunks that is). The reader is encouraged to implement a similar PoC but for small chunks and inspect the changes in gdb.
I’ll skip the initial allocations because they will all be taken from the top chunk/wilderness since there is no free chunk at the moment. Here’s the state of the heap right before the first call to free
.
gdb-peda$ parse
addr prev size
tcache --> 0x602000 0x0 0x250
a --> 0x602250 0x0 0x30
b --> 0x602280 0x0 0x30
c --> 0x6022b0 0x0 0x30
d --> 0x6022e0 0x0 0x30
e --> 0x602310 0x0 0x30
f --> 0x602340 0x0 0x30
g --> 0x602370 0x0 0x30
h --> 0x6023a0 0x0 0x30
i --> 0x6023d0 0x0 0x30
j --> 0x602400 0x0 0x30
k --> 0x602430 0x0 0x30
It’s important to note that __libc_malloc
will firstly try to retrieve a chunk from the tcache->entries[]
list instead of looking at the fastbin list (for performance reasons of course). Since no chunk has been free’d when the allocations were taking place, __libc_malloc
will invoke _int_malloc
to get its chunk.
/* When "x" is from chunksize(). */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
void *
__libc_malloc (size_t bytes)
{
...
#if USE_TCACHE
/* int_free also calls request2size, be careful to not pad twice. */
size_t tbytes;
checked_request2size (bytes, tbytes);
size_t tc_idx = csize2tidx (tbytes);
MAYBE_INIT_TCACHE ();
DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}
...
victim = _int_malloc (ar_ptr, bytes);
Let’s start inspecting the tcache while we free the allocated chunks one-by-one.
/* Fill in the tcache for size 0x30. */
free(a);
free(b);
free(c);
free(d);
free(e);
free(f);
free(g);
/* Place the rest in the corresponding fastbin list. */
free(h);
free(i);
free(j);
free(k);
_int_free
will try to store the recently free’d chunk in the corresponding tcache
index, as long as the following checks are true:
-
tcache
is initialized. -
The index returned by
csize2tidx(size)
needs to be below 64.csize2tidx()
is just a macro, whose definition I showed above, which uses a “formula” in order to convert a chunk’s size to an index inside thecounts
andentries
array respectively. -
The
counts
array is in an entry-to-entry association withentries
. For instancecounts[0]
will contain the number of chunks (which form a linked list as I mentioned before) insideentries[0]
. That being said,counts[idx]
needs to be less or equal to 7 since no more than 7 chunks are allowed to form the free linked list.
Here’s the corresponding assembly up until the tcache_put
call:
// rcx will contain a kernel address
mov rcx,QWORD PTR [rip+0x34f744] # 0x7ffff7dced78
lea rdx,[r13-0x11]
shr rdx,0x4
mov rcx,QWORD PTR fs:[rcx]
// Check if tcache is initialized
test rcx,rcx
# If it's not, take the fastbin route
je 0x7ffff7a7f663 <_int_free+147>
// Make sure the chunk's size is within the tcache boundaries
cmp rdx,QWORD PTR [rip+0x34fc64] # 0x7ffff7dcf2b0 <mp_+80>
jae 0x7ffff7a7f663 <_int_free+147>
movsx rdi,BYTE PTR [rcx+rdx*1]
// Make sure counts[idx] is less than 7
cmp rdi,QWORD PTR [rip+0x34fc66] # 0x7ffff7dcf2c0 <mp_+96>
mov rsi,rdi
jb 0x7ffff7a7f940 <_int_free+880>
gdb-peda$ x/gx 0x7ffff7dcf2b0
0x7ffff7dcf2b0 <mp_+80>: 0x0000000000000040
gdb-peda$ x/gx 0x7ffff7dcf2c0
0x7ffff7dcf2c0 <mp_+96>: 0x0000000000000007
And here’s the high level version:
static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
...
#if USE_TCACHE
{
size_t tc_idx = csize2tidx (size);
if (tcache
&& tc_idx < mp_.tcache_bins
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (p, tc_idx);
return;
}
}
#endif
...
tcache_put
tcache_put
is responsible for placing the free’d chunk in the corresponding entries[]
index as well as updating the counts[idx]
value.
// Make sure the chunk's size is within the tcache boundaries
cmp rdx,0x3f
// tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
lea rdi,[rbx+0x10]
ja 0x7ffff7a80334
// &counts[idx]
lea rax,[rcx+rdx*8]
add esi,0x1
// tcache->entries[tc_idx]
mov r8,QWORD PTR [rax+0x40]
// e->next = tcache->entries[tc_idx];
mov QWORD PTR [rbx+0x10],r8
// tcache->entries[tc_idx] = e
mov QWORD PTR [rax+0x40],rdi
// counts[idx]++
mov BYTE PTR [rcx+rdx*1],sil
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}
For the visual readers, including myself, here’s a hopefully not so crappy ascii-art:
Before:
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000000 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
0x602240: 0x0000000000000000 0x0000000000000000
0x602250: 0x0000000000000000 0x0000000000000031 <-- chunk a
... ...
free(a);
tcache->counts[]
0 1 2 63
+------++------++------+ +------+
| 0 || 1 || 0 | ... | 0 |
| || || | | |
+------++------++------+ +------+
tcache->entries[]
0 1 2 63
+------++------++------+ +------+
| NULL || a || NULL | ... | NULL |
| || || | | |
+------++------++------+ +------+
|
|
NULL
After:
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000100 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000602260 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
0x602240: 0x0000000000000000 0x0000000000000000
0x602250: 0x0000000000000000 0x0000000000000031 <-- chunk a
... ...
``entries[idx] and counts[idx]
have been updated. Let’s visualize a couple more free’s and I’ll let you figure out the rest.
free(b);
tcache->counts[]
0 1 2 63
+------++------++------+ +------+
| 0 || 2 || 0 | ... | 0 |
| || || | | |
+------++------++------+ +------+
tcache->entries[]
0 1 2 63
+------++------++------+ +------+
| NULL || b || NULL | ... | NULL |
| || || | | |
+------++------++------+ +------+
|
|
+------+
| a |
| |
+------+
|
|
NULL
After:
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000200 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000602290 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
0x602240: 0x0000000000000000 0x0000000000000000
0x602250: 0x0000000000000000 0x0000000000000031 <-- chunk a
... ...
gdb-peda$ x/gx 0x0000000000602290 <-- (tcache_entry *)b
0x602290: 0x0000000000602260
gdb-peda$ x/gx 0x0000000000602260 <-- a == (tcache_entry *)b->next
0x602260: 0x0000000000000000 <-- NULL
It’s worth noting that insertion happens at the head of the list. Skipping a few free’s.
...
free(g);
tcache->counts[]
0 1 2 63
+------++------++------+ +------+
| 0 || 7 || 0 | ... | 0 |
| || || | | |
+------++------++------+ +------+
tcache->entries[]
0 1 2 63
+------++------++------+ +------+
| NULL || g || NULL | ... | NULL |
| || || | | |
+------++------++------+ +------+
|
|
+------+
| f |
| |
+------+
|
|
+------+
| e |
| |
+------+
|
|
+------+
| d |
| |
+------+
|
|
+------+
| c |
| |
+------+
|
|
+------+
| b |
| |
+------+
|
|
+------+
| a |
| |
+------+
|
|
NULL
We’re at the stage where more free’s of size 0x30 would fail the tcache
checks and take the fastbin route.
static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
...
size = chunksize (p);
...
check_inuse_chunk(av, p);
#if USE_TCACHE
{
size_t tc_idx = csize2tidx (size);
if (tcache
&& tc_idx < mp_.tcache_bins
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (p, tc_idx);
return;
}
}
#endif
/*
If eligible, place chunk on a fastbin so it can be found
and used quickly in malloc.
*/
if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())
...
atomic_store_relaxed (&av->have_fastchunks, true);
unsigned int idx = fastbin_index(size);
fb = &fastbin (av, idx);
/* Atomically link P to its fastbin: P->FD = *FB; *FB = P; */
mchunkptr old = *fb, old2;
...
do
{
/* Check that the top of the bin is not the record we are going to
add (i.e., double free). */
if (__builtin_expect (old == p, 0))
malloc_printerr ("double free or corruption (fasttop)");
p->fd = old2 = old;
}
while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
!= old2);
...
free(h);
free(i);
free(j);
free(k);
fastbinsY[NFASTBINS]
0 1 2
+------++------++------+
| NULL || k || NULL | ...
| || || |
+------++------++------+
|
|
+------+
| j |
| |
+------+
|
|
+------+
| i |
| |
+------+
|
|
+------+
| h |
| |
+------+
|
|
NULL
gdb-peda$ printfastbin
(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x602430 --> 0x602400 --> 0x6023d0 --> 0x6023a0 --> 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
Now let’s suppose we’re feeling a bit crazy and want to make a few more allocations of size 0x20. That’s where tcache_get
comes into play.
// Allocate the chunks out of tcache.
// returns g
malloc(0x20);
// returns f
malloc(0x20);
// returns e
malloc(0x20);
// returns d
malloc(0x20);
// returns c
malloc(0x20);
// returns b
malloc(0x20);
// returns a
malloc(0x20);
tcache_get
As stated before, when a new allocation request occurs, __libc_malloc
will firstly check if there is any available chunk of that size inside tcache->entries[idx]
. If there is, tcache_get
will retrieve a chunk from the head of that list.
cmp rbx,0x3f
ja 0x7ffff7a840c3
// Remove chunk at the head of the list
mov rsi,QWORD PTR [rdx]
// Place its fd at the head of the list
mov QWORD PTR [rcx+0x40],rsi
// --(tcache->counts[tc_idx]);
sub BYTE PTR [rax+rbx*1],0x1
static void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
return (void *) e;
}
Before:
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000700 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000602380 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
// returns g
malloc(0x20);
tcache->counts[]
0 1 2 63
+------++------++------+ +------+
| 0 || 6 || 0 | ... | 0 |
| || || | | |
+------++------++------+ +------+
tcache->entries[]
0 1 2 63
+------++------++------+ +------+
| NULL || f || NULL | ... | NULL |
| || || | | |
+------++------++------+ +------+
|
|
+------+
| e |
| |
+------+
|
|
+------+
| d |
| |
+------+
|
|
+------+
| c |
| |
+------+
|
|
+------+
| b |
| |
+------+
|
|
+------+
| a |
| |
+------+
|
|
NULL
After:
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000600 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000602350 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
0x602380
was indeed removed from the list and the counter was updated! One more allocation to solidify the removal mechanism.
// returns f
malloc(0x20);
tcache->counts[]
0 1 2 63
+------++------++------+ +------+
| 0 || 5 || 0 | ... | 0 |
| || || | | |
+------++------++------+ +------+
tcache->entries[]
0 1 2 63
+------++------++------+ +------+
| NULL || e || NULL | ... | NULL |
| || || | | |
+------++------++------+ +------+
|
|
+------+
| d |
| |
+------+
|
|
+------+
| c |
| |
+------+
|
|
+------+
| b |
| |
+------+
|
|
+------+
| a |
| |
+------+
|
|
NULL
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000500 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000602320 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
As expected 0x602350
is gone. Now after the 7th allocation, the tcache
will be empty and __libc_malloc
will resort to _int_malloc
, which will check for available chunks in the fastbin
array.
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000000 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
0x602240: 0x0000000000000000 0x0000000000000000
0x602250: 0x0000000000000000 0x0000000000000031 <-- chunk a
... ...
gdb-peda$ printfastbin
(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x602430 --> 0x602400 --> 0x6023d0 --> 0x6023a0 --> 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
_int_malloc
There has been a new “feature” added in _int_malloc
. If there is an available chunk in the corresponding fastbin list (same rule applies for small and unsorted chunks), _int_malloc
will return the chunk at the head of the fastbin list and then allocate the rest of the chunks out of the fastbin list and place them in their corresponding tcache-entries[idx]
entry as long as there’s enough space for 7 or less chunks.
static void *
_int_malloc (mstate av, size_t bytes)
{
...
#define REMOVE_FB(fb, victim, pp) \
do \
{ \
victim = pp; \
if (victim == NULL) \
break; \
} \
while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
!= victim); \
if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
{
idx = fastbin_index (nb);
mfastbinptr *fb = &fastbin (av, idx);
mchunkptr pp;
victim = *fb;
if (victim != NULL)
{
if (SINGLE_THREAD_P)
*fb = victim->fd;
else
REMOVE_FB (fb, pp, victim);
...
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = *fb) != NULL)
{
if (SINGLE_THREAD_P)
*fb = tc_victim->fd;
else
{
REMOVE_FB (fb, pp, tc_victim);
if (__glibc_unlikely (tc_victim == NULL))
break;
}
tcache_put (tc_victim, tc_idx);
}
}
#endif
...
Let’s see the magic happening. We expect 0x602430
to be returned by _int_malloc
and the remaining chunks to end up in tcache->entries[idx]
.
/*
Retrieve chunk from fastbin.
The rest of the chunks (h, i, j, k) will be allocated
out of their fastbin list and will be placed back into tcache->entries[idx].
*/
malloc(0x20);
gdb-peda$ printfastbin
(0x20) fastbin[0]: 0x0
(0x30) fastbin[1]: 0x0
(0x40) fastbin[2]: 0x0
(0x50) fastbin[3]: 0x0
(0x60) fastbin[4]: 0x0
(0x70) fastbin[5]: 0x0
(0x80) fastbin[6]: 0x0
gdb-peda$ x/80gx 0x602000
0x602000: 0x0000000000000000 0x0000000000000251
tcache-->counts[] --> 0x602010: 0x0000000000000300 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x00000000006023b0 <-- tcache>entries[]
0x602060: 0x0000000000000000 0x0000000000000000
... ...
gdb-peda$ x/gx 0x00000000006023b0 <-- head of the linked list
0x6023b0: 0x00000000006023e0
gdb-peda$ x/gx 0x00000000006023e0 <-- (tcache_entry *)0x6023b0->next
0x6023e0: 0x0000000000602410
gdb-peda$ x/gx 0x0000000000602410 <-- (tcache_entry *)0x6023e0->next
0x602410: 0x0000000000000000
All our assumptions are proven correct! The fastbin list has been emptied out and the corresponding tcache
index is filled with the remaining fastbin
chunks. Because of the fact that fastbin
removal happens at the head of the list, you can notice that the chunk at the tail became the head of tcache->entries[idx]
since addition to the latter happens at the head as well.
Conclusion
That was a hopefully short, visual and insightful overview of the recent updates glibc malloc
has pushed. If you were already familiar with the 16.x or even the 17.04 implementation, thread local caching is just a tiny addition in terms of knowledge. However, it does change the state of the art for heap exploitation (for the better) but that’s a story for another time If there is anything I’d take away from this post is to experiment! Reverse and break the heap! Finally, I’d like to thank you for taking the time to read my write-up and if you have any questions, feel free to ask them down below or hit me up on IRC/twitter.
Take care…