Heap Safari - Thread Local Caching

Introduction

Hello folks, I hope you’re doing great! Today’s topic will be once again heap exploitation related. In fact, we’ll be going over the recent updates of glibc’s heap allocator, (pt)malloc, which I’ve researched/reversed lately and decided to present them to you since there is barely any info out there. I emphasize glibc because in reality, heap implementations are libc/OS/hardware/browser specific. ptmalloc is a pretty common heap allocator on Linux distros (not all) and more importantly in CTFs :wink: Specifically, today’s focus will be revolved around the so called tcache structure, which is a thread caching mechanism used to speed up malloc/free and is enforced by default (i.e Ubuntu 17.10 and up).


Required Knowledge

1) Pointer Gymnastics
2) ELF Loading Process
3) Linux Memory Organization
4) Familiarity with Heap Exploitation
5) Will
6) Patience

The last two are a must, the rest can be bypassed if there’s enough of the former. The 4th point however, is too hard to be bypassed. As much as I’d love to explain heap’s internals from the very beginning, the following dudes have been in the game long before I joined and I don’t want to take credits for what they’ve contributed.

They give a pretty decent high level overview of the heap management. However, what they provide can never be enough, you gotta dive into assembly with your debugger and figure it out on your own.


Setup

  • Ubuntu 17.10 x64 – I recommend using vagrant. It’s fast, minimal and gets the job done for this kind of research.

  • gdb – No gdb, no fun! I personally use peda but you’re free to use whatever flavour pleases you the most.

  • Source & Debug symbols – Even though this isn’t a must, every Reverse Engineer wakes up and goes to bed wishing that there were debug symbols available for whatever he/she might be reversing. That being said, here’s the Linux (user-space) heap debugging starter pack.

sudo apt-get install glibc-source
sudo apt-get install libc6-dbg
sudo tar xf /usr/src/glibc/glibc-2.26.tar.xz

Inside your gdb prompt type the following:

gdb-peda$ directory /usr/src/glibc/glibc-2.26/
gdb-peda$ b __libc_malloc
gdb-peda$ b _int_malloc
gdb-peda$ b _int_free

The above gdb commands will display the source code of the debugee function while you’re stepping through it. How sweet is that?! If you want to look at the full source code while debugging, open up /usr/src/glibc/glibc-2.26/malloc/malloc.c in your favorite text editor and join in.


Note: Before, we begin I’d like to note that for the rest of this write-up I won’t be referring to the allocator as malloc. The reason being the fact that in the glibc world, malloc isn’t just one function, but a package of functions responsible for handling dynamically allocated chunks. You’ll see what I mean as we go through the reversing process. Even if you have no prior exposure to heap exploitation, I’ll try to make my explanations as visual as possible. Without further ado, let the fun begin!


__libc_malloc

Assuming that you’ve looked up the above resources, but even if you didn’t, you should know that when your program calls malloc, in reality __libc_malloc gets invoked.

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

  if (SINGLE_THREAD_P)
    {
      victim = _int_malloc (&main_arena, bytes);
      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
	      &main_arena == arena_for_chunk (mem2chunk (victim)));
      return victim;
    }

  arena_get (ar_ptr, bytes);

  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}
libc_hidden_def (__libc_malloc)

The very first call to malloc will actually go through the following code path:

if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));

What __libc_malloc does is to check for the content of the global function pointer variable know as __malloc_hook.

gdb-peda$ x/gx &__malloc_hook
0x7ffff7dcfc10 <__malloc_hook>:	0x00007ffff7a82830
gdb-peda$ x/5i 0x00007ffff7a82830
   0x7ffff7a82830 <malloc_hook_ini>:	mov    eax,DWORD PTR [rip+0x34ca0e]        # 0x7ffff7dcf244 <__libc_malloc_initialized>
   0x7ffff7a82836 <malloc_hook_ini+6>:	push   r12
   0x7ffff7a82838 <malloc_hook_ini+8>:	push   rbp
   0x7ffff7a82839 <malloc_hook_ini+9>:	push   rbx
   0x7ffff7a8283a <malloc_hook_ini+10>:	mov    rbp,rdi

As you can see, the global hook is set to the address of malloc_hook_ini.

static void *
malloc_hook_ini (size_t sz, const void *caller)
{
  __malloc_hook = NULL;
  ptmalloc_init ();
  return __libc_malloc (sz);
}

malloc_hook_ini firstly zeros out the global variable so that initialization doesn’t happen again and then triggers a sequence of function calls in order to initialize the main arena structure. You can think of this structure as a roadmap for the heap allocator, which allows it to be able to keep track of free’d chunks and other crucial info. This calling sequence isn’t important to us but I still recommend stepping through them with a debugger and see it happening in front of your eyes.


Thread Local Cache

At this point the main arena is set up and ready to serve memory back to the user. Once the initilization phase is over, tcache_init will take over.

# define MAYBE_INIT_TCACHE() \
  if (__glibc_unlikely (tcache == NULL)) \
    tcache_init();

static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  ...
  victim = _int_malloc (ar_ptr, bytes);

  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }

}

We’ve come to the meat of our discussion, the thread local caching structure. Let’s dissect the above snippet. There are plenty of tcache_perthread_struct references. What’s that anyway?

static __thread tcache_perthread_struct *tcache = NULL;

typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
typedef struct tcache_perthread_struct
{
  char counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

So tcache_perthread_struct consists of two arrays.

  • counts is a byte array which is used as a fast way to indicate the number of tcache_entry* in the corresponding index inside the entries array.

  • entries is an array of tcache_entry*, which are malloc_chunk* casted as tcache_entry*. Practically, they form a single linked list of free’d chunks. It’s important to note that each linked list can store up to 7 free’d chunks. If that number exceeds, the rest get stored in the “old fashioned” fastbin/smallbin list. Each index corresponds to a different size of chunk.

TCACHE_MAX_BINS is 64 and the entries array stores free chunks of sizes ranging from 24, all the way to 1032 bytes on x64. In malloc’s words, it can store fast and small chunks.

A really interesting fact from an exploit dev perspective is that the tcache structure is stored on the heap!

victim = _int_malloc (ar_ptr, bytes);
tcache = (tcache_perthread_struct *) victim;
gdb-peda$ parseheap
addr                prev                size                 status              fd                bk                
0x602000            0x0                 0x250                Used                None              None

So whenever __libc_malloc gets called for the first time, it will allocate a tcache structure at the very beginning of the heap segment. This is very eye-opening because in the case of an inexperienced C programmer, he/she’d be blown away by the fact that one malloc invocation led to two after all.


Tcache Internals

Theory is over, it’s time to get our hands dirty and prove our assumptions right. I’ve written a quick PoC to test our assumptions (for fast chunks that is). The reader is encouraged to implement a similar PoC but for small chunks and inspect the changes in gdb.

I’ll skip the initial allocations because they will all be taken from the top chunk/wilderness since there is no free chunk at the moment. Here’s the state of the heap right before the first call to free.

gdb-peda$ parse
           addr                prev                size                                               
tcache --> 0x602000            0x0                 0x250     
a      --> 0x602250            0x0                 0x30                  
b      --> 0x602280            0x0                 0x30        
c      --> 0x6022b0            0x0                 0x30 
d      --> 0x6022e0            0x0                 0x30
e      --> 0x602310            0x0                 0x30      
f      --> 0x602340            0x0                 0x30  
g      --> 0x602370            0x0                 0x30  
h      --> 0x6023a0            0x0                 0x30 
i      --> 0x6023d0            0x0                 0x30 
j      --> 0x602400            0x0                 0x30 
k      --> 0x602430            0x0                 0x30

It’s important to note that __libc_malloc will firstly try to retrieve a chunk from the tcache->entries[] list instead of looking at the fastbin list (for performance reasons of course). Since no chunk has been free’d when the allocations were taking place, __libc_malloc will invoke _int_malloc to get its chunk.


/* When "x" is from chunksize().  */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)

void *
__libc_malloc (size_t bytes)
{
  ...
#if USE_TCACHE
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
    ...

  victim = _int_malloc (ar_ptr, bytes);

Let’s start inspecting the tcache while we free the allocated chunks one-by-one.

/* Fill in the tcache for size 0x30. */
free(a);
free(b);
free(c);
free(d);
free(e);
free(f);
free(g);
/* Place the rest in the corresponding fastbin list. */
free(h);
free(i);
free(j);
free(k);

_int_free will try to store the recently free’d chunk in the corresponding tcache index, as long as the following checks are true:

  • tcache is initialized.

  • The index returned by csize2tidx(size) needs to be below 64. csize2tidx() is just a macro, whose definition I showed above, which uses a “formula” in order to convert a chunk’s size to an index inside the counts and entries array respectively.

  • The counts array is in an entry-to-entry association with entries. For instance counts[0] will contain the number of chunks (which form a linked list as I mentioned before) inside entries[0]. That being said, counts[idx] needs to be less or equal to 7 since no more than 7 chunks are allowed to form the free linked list.

Here’s the corresponding assembly up until the tcache_put call:

// rcx will contain a kernel address
mov    rcx,QWORD PTR [rip+0x34f744]        # 0x7ffff7dced78
lea    rdx,[r13-0x11]
shr    rdx,0x4
mov    rcx,QWORD PTR fs:[rcx]
// Check if tcache is initialized
test   rcx,rcx
# If it's not, take the fastbin route
je     0x7ffff7a7f663 <_int_free+147>
// Make sure the chunk's size is within the tcache boundaries
cmp    rdx,QWORD PTR [rip+0x34fc64]        # 0x7ffff7dcf2b0 <mp_+80>
jae    0x7ffff7a7f663 <_int_free+147>
movsx  rdi,BYTE PTR [rcx+rdx*1]
// Make sure counts[idx] is less than 7
cmp    rdi,QWORD PTR [rip+0x34fc66]        # 0x7ffff7dcf2c0 <mp_+96>
mov    rsi,rdi
jb     0x7ffff7a7f940 <_int_free+880>
gdb-peda$ x/gx 0x7ffff7dcf2b0
0x7ffff7dcf2b0 <mp_+80>:	0x0000000000000040
gdb-peda$ x/gx 0x7ffff7dcf2c0
0x7ffff7dcf2c0 <mp_+96>:	0x0000000000000007

And here’s the high level version:

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
 ...
#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif
  ...

tcache_put

tcache_put is responsible for placing the free’d chunk in the corresponding entries[] index as well as updating the counts[idx] value.

// Make sure the chunk's size is within the tcache boundaries
cmp    rdx,0x3f
// tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
lea    rdi,[rbx+0x10]
ja     0x7ffff7a80334
// &counts[idx]
lea    rax,[rcx+rdx*8]
add    esi,0x1
// tcache->entries[tc_idx]
mov    r8,QWORD PTR [rax+0x40]
// e->next = tcache->entries[tc_idx];
mov    QWORD PTR [rbx+0x10],r8
// tcache->entries[tc_idx] = e
mov    QWORD PTR [rax+0x40],rdi
// counts[idx]++
mov    BYTE PTR [rcx+rdx*1],sil
/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  assert (tc_idx < TCACHE_MAX_BINS);
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

For the visual readers, including myself, here’s a hopefully not so crappy ascii-art:

Before:
                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000000	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000000000 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...
                      0x602240:	0x0000000000000000	0x0000000000000000
                      0x602250:	0x0000000000000000	0x0000000000000031 <-- chunk a
                                        ...                ...


free(a);

tcache->counts[]

   0       1       2            63
+------++------++------+     +------+ 
|   0  ||  1   ||  0   | ... |  0   |
|      ||      ||      |     |      |
+------++------++------+     +------+

tcache->entries[]

   0       1       2            63
+------++------++------+     +------+ 
| NULL ||  a   || NULL | ... | NULL |
|      ||      ||      |     |      |
+------++------++------+     +------+
           |
           |
          NULL

After:
                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251 
tcache-->counts[] --> 0x602010:	0x0000000000000100	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000602260 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...
                      0x602240:	0x0000000000000000	0x0000000000000000
                      0x602250:	0x0000000000000000	0x0000000000000031 <-- chunk a
                                        ...                ...

``entries[idx] and counts[idx] have been updated. Let’s visualize a couple more free’s and I’ll let you figure out the rest.

free(b);

tcache->counts[]

   0       1       2            63
+------++------++------+     +------+ 
|   0  ||  2   ||  0   | ... |  0   |
|      ||      ||      |     |      |
+------++------++------+     +------+

tcache->entries[]

   0       1       2            63
+------++------++------+     +------+ 
| NULL ||  b   || NULL | ... | NULL |
|      ||      ||      |     |      |
+------++------++------+     +------+
            |
            |
        +------+
        |   a  |
        |      |
        +------+
            |
            |
           NULL

After:
                   gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251 
tcache-->counts[] --> 0x602010:	0x0000000000000200	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000602290 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...
                      0x602240:	0x0000000000000000	0x0000000000000000
                      0x602250:	0x0000000000000000	0x0000000000000031 <-- chunk a
                                        ...                ...

gdb-peda$ x/gx 0x0000000000602290 <-- (tcache_entry *)b
0x602290:	0x0000000000602260
gdb-peda$ x/gx 0x0000000000602260 <-- a == (tcache_entry *)b->next
0x602260:	0x0000000000000000    <-- NULL

It’s worth noting that insertion happens at the head of the list. Skipping a few free’s.

...
free(g);


tcache->counts[]

   0       1       2            63
+------++------++------+     +------+ 
|   0  ||  7   ||  0   | ... |  0   |
|      ||      ||      |     |      |
+------++------++------+     +------+

tcache->entries[]

   0       1       2            63
+------++------++------+     +------+ 
| NULL ||  g   || NULL | ... | NULL |
|      ||      ||      |     |      |
+------++------++------+     +------+
            |
            |
        +------+
        |  f   |
        |      |
        +------+
            |
            |
        +------+
        |  e   |
        |      |
        +------+
            |
            |
        +------+
        |  d   |
        |      |
        +------+
            |
            |
        +------+
        |  c   |
        |      |
        +------+
            |
            |
        +------+
        |  b   |
        |      |
        +------+
            |
            |
        +------+
        |  a   |
        |      |
        +------+
            |
            |
           NULL

We’re at the stage where more free’s of size 0x30 would fail the tcache checks and take the fastbin route.

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  ...
  size = chunksize (p);

  ...

  check_inuse_chunk(av, p);

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)
      {
	tcache_put (p, tc_idx);
	return;
      }
  }
#endif

  /*
    If eligible, place chunk on a fastbin so it can be found
    and used quickly in malloc.
  */

  if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())

    ...

    atomic_store_relaxed (&av->have_fastchunks, true);
    unsigned int idx = fastbin_index(size);
    fb = &fastbin (av, idx);

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
    mchunkptr old = *fb, old2;

    ...
      do
	{
	  /* Check that the top of the bin is not the record we are going to
	     add (i.e., double free).  */
	  if (__builtin_expect (old == p, 0))
	    malloc_printerr ("double free or corruption (fasttop)");
	  p->fd = old2 = old;
	}
      while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2))
	     != old2);
      ...
free(h);
free(i);
free(j);
free(k);

fastbinsY[NFASTBINS]

   0       1       2            
+------++------++------+     
| NULL ||  k   || NULL | ...
|      ||      ||      |     
+------++------++------+     
            |
            |
        +------+
        |  j   |
        |      |
        +------+
            |
            |
        +------+
        |  i   |
        |      |
        +------+
            |
            |
        +------+
        |  h   |
        |      |
        +------+
            |
            |
           NULL

gdb-peda$ printfastbin 
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x602430 --> 0x602400 --> 0x6023d0 --> 0x6023a0 --> 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0

Now let’s suppose we’re feeling a bit crazy and want to make a few more allocations of size 0x20. That’s where tcache_get comes into play.

// Allocate the chunks out of tcache. 
// returns g
malloc(0x20);
// returns f
malloc(0x20); 
// returns e
malloc(0x20); 
// returns d
malloc(0x20);
// returns c
malloc(0x20);
// returns b
malloc(0x20); 
// returns a
malloc(0x20); 

tcache_get

As stated before, when a new allocation request occurs, __libc_malloc will firstly check if there is any available chunk of that size inside tcache->entries[idx]. If there is, tcache_get will retrieve a chunk from the head of that list.

cmp    rbx,0x3f
ja     0x7ffff7a840c3
// Remove chunk at the head of the list
mov    rsi,QWORD PTR [rdx]
// Place its fd at the head of the list
mov    QWORD PTR [rcx+0x40],rsi
// --(tcache->counts[tc_idx]);
sub    BYTE PTR [rax+rbx*1],0x1
static void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  return (void *) e;
}
Before:
                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000700	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000602380 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...

// returns g
malloc(0x20);

tcache->counts[]

   0       1       2            63
+------++------++------+     +------+ 
|   0  ||  6   ||  0   | ... |  0   |
|      ||      ||      |     |      |
+------++------++------+     +------+

tcache->entries[]

   0       1       2            63
+------++------++------+     +------+ 
| NULL ||  f   || NULL | ... | NULL |
|      ||      ||      |     |      |
+------++------++------+     +------+
            |
            |
        +------+
        |  e   |
        |      |
        +------+
            |
            |
        +------+
        |  d   |
        |      |
        +------+
            |
            |
        +------+
        |  c   |
        |      |
        +------+
            |
            |
        +------+
        |  b   |
        |      |
        +------+
            |
            |
        +------+
        |  a   |
        |      |
        +------+
            |
            |
           NULL

After:
                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000600	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000602350 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...

0x602380 was indeed removed from the list and the counter was updated! One more allocation to solidify the removal mechanism.

// returns f
malloc(0x20); 


tcache->counts[]

   0       1       2            63
+------++------++------+     +------+ 
|   0  ||  5   ||  0   | ... |  0   |
|      ||      ||      |     |      |
+------++------++------+     +------+

tcache->entries[]

   0       1       2            63
+------++------++------+     +------+ 
| NULL ||  e   || NULL | ... | NULL |
|      ||      ||      |     |      |
+------++------++------+     +------+
            |
            |
        +------+
        |  d   |
        |      |
        +------+
            |
            |
        +------+
        |  c   |
        |      |
        +------+
            |
            |
        +------+
        |  b   |
        |      |
        +------+
            |
            |
        +------+
        |  a   |
        |      |
        +------+
            |
            |
           NULL

                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000500	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000602320 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...

As expected 0x602350 is gone. Now after the 7th allocation, the tcache will be empty and __libc_malloc will resort to _int_malloc, which will check for available chunks in the fastbin array.

                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000000	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x0000000000000000 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...
                      0x602240:	0x0000000000000000	0x0000000000000000
                      0x602250:	0x0000000000000000	0x0000000000000031 <-- chunk a
                                        ...                ...

gdb-peda$ printfastbin 
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x602430 --> 0x602400 --> 0x6023d0 --> 0x6023a0 --> 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0

_int_malloc

There has been a new “feature” added in _int_malloc. If there is an available chunk in the corresponding fastbin list (same rule applies for small and unsorted chunks), _int_malloc will return the chunk at the head of the fastbin list and then allocate the rest of the chunks out of the fastbin list and place them in their corresponding tcache-entries[idx] entry as long as there’s enough space for 7 or less chunks.

static void *
_int_malloc (mstate av, size_t bytes)
{
...

#define REMOVE_FB(fb, victim, pp)			\
  do							\
    {							\
      victim = pp;					\
      if (victim == NULL)				\
	break;						\
    }							\
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
	 != victim);					\

  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp;
      victim = *fb;

      if (victim != NULL)
	{
	  if (SINGLE_THREAD_P)
	    *fb = victim->fd;
	  else
	    REMOVE_FB (fb, pp, victim);
          ...
#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;

		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (SINGLE_THREAD_P)
			*fb = tc_victim->fd;
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
...

Let’s see the magic happening. We expect 0x602430 to be returned by _int_malloc and the remaining chunks to end up in tcache->entries[idx].

/*
	 Retrieve chunk from fastbin.
	 The rest of the chunks (h, i, j, k) will be allocated
	 out of their fastbin list and will be placed back into tcache->entries[idx].
*/
malloc(0x20);

gdb-peda$ printfastbin 
(0x20)     fastbin[0]: 0x0
(0x30)     fastbin[1]: 0x0
(0x40)     fastbin[2]: 0x0
(0x50)     fastbin[3]: 0x0
(0x60)     fastbin[4]: 0x0
(0x70)     fastbin[5]: 0x0
(0x80)     fastbin[6]: 0x0

                  gdb-peda$ x/80gx 0x602000
                      0x602000:	0x0000000000000000	0x0000000000000251
tcache-->counts[] --> 0x602010:	0x0000000000000300	0x0000000000000000
                      0x602020:	0x0000000000000000	0x0000000000000000
                      0x602030:	0x0000000000000000	0x0000000000000000
                      0x602040:	0x0000000000000000	0x0000000000000000
                      0x602050:	0x0000000000000000	0x00000000006023b0 <-- tcache>entries[]
                      0x602060:	0x0000000000000000	0x0000000000000000
                                        ...                ...

gdb-peda$ x/gx 0x00000000006023b0 <-- head of the linked list
0x6023b0:	0x00000000006023e0
gdb-peda$ x/gx 0x00000000006023e0 <-- (tcache_entry *)0x6023b0->next
0x6023e0:	0x0000000000602410
gdb-peda$ x/gx 0x0000000000602410 <-- (tcache_entry *)0x6023e0->next
0x602410:	0x0000000000000000

All our assumptions are proven correct! The fastbin list has been emptied out and the corresponding tcache index is filled with the remaining fastbin chunks. Because of the fact that fastbin removal happens at the head of the list, you can notice that the chunk at the tail became the head of tcache->entries[idx] since addition to the latter happens at the head as well.


Conclusion

That was a hopefully short, visual and insightful overview of the recent updates glibc malloc has pushed. If you were already familiar with the 16.x or even the 17.04 implementation, thread local caching is just a tiny addition in terms of knowledge. However, it does change the state of the art for heap exploitation (for the better) but that’s a story for another time :wink: If there is anything I’d take away from this post is to experiment! Reverse and break the heap! Finally, I’d like to thank you for taking the time to read my write-up and if you have any questions, feel free to ask them down below or hit me up on IRC/twitter.

Take care…

11 Likes

This topic was automatically closed after 30 days. New replies are no longer allowed.